Quansight-Labs / czi-conda-forge-mgmt

πŸš€ Top level project management for conda-forge CZI grant
https://github.com/orgs/Quansight-Labs/projects/10
BSD 3-Clause "New" or "Revised" License
5 stars 0 forks source link

Building a maintainers dashboard with Quetz #13

Open jaimergp opened 1 year ago

jaimergp commented 1 year ago

πŸ“Œ Summary

Prepare the conda ecosystem for OCI-based storage compatibility.

πŸ“ Background

There is no straightforward way to monitor the operational status of conda-forge's infrastructure.

conda-forge.org/status offers a "maintainers dashboard" with information about:

Unfortunately, this is far from being comprehensive view of ongoing maintenance tasks, bottlenecks, or the overall health of the many bots and infrastructure pieces.

Having a detailed picture of the infrastructure and automation tools will significantly improve the maintainers' workflow and aid with identifying critical risksβ€” which is essential to keeping up with the increasing growth and demand from the community.

Quetz is chosen as an open-source server for hosting conda packages, thus allowing for increased transparency and extensibility. This would have the added benefit of centralizing the currently scattered-across-repositories packaging metadata in a canonical, API-first, performant-at-scale database, laying the foundation for further infrastructure automation and improvements to the building processes.

πŸš€ Tasks / Deliverables

See issues labeled as mission: dashboard πŸŽ›

ℹ️ References

atrawog commented 1 year ago

My suggestion would be to use the same approach as https://mybinder.readthedocs.io/en/latest/about/status.html and have a dedicated status page, but use Prometheus/Grafana as a data backend https://grafana.mybinder.org/d/fLoQvRHmk/status?orgId=1

Quetz already has the Middleware in place to do basic metric reporting via prometheus https://github.com/mamba-org/quetz/blob/main/quetz/metrics/middleware.py and I would improve and extend these reporting statistics to give better insight into the performance of a Quetz instance.

But most of the performance issues of Quetz at the moment are actual problems of things like the S3 storage backend and to catch and diagnose these issue we will need a full blown monitoring system with good Quetz integration that's capably to monitor not just Quetz, but the whole cloud infrastructure conda-forge depends on.