coder / internal

Non-community issues related to coder/coder
2 stars 0 forks source link

registry: draft RFC to stabilize project #87

Open Kira-Pilot opened 3 weeks ago

Kira-Pilot commented 3 weeks ago

Registry has been subject to some outages lately. We should draft a proposal to address the following:

bcpeinhardt commented 3 days ago

I haven't been keeping this ticket up to date with the registry RFC progress, that's my bad. The RFC is currently in draft, link here: https://www.notion.so/coderhq/registry-coder-com-Re-Architecture-120d579be59280058cc7e182f81d80f2?pvs=4

There is a meeting scheduled for tomorrow (Tuesday October 22nd 2024) to commit to an architecture and go ahead and build the thing 🤞 Both strategy's proposed in the RFC have prototypes. Here's the link to the more recent one (which is much more likely to be accepted): https://github.com/bcpeinhardt/static_registry

To answer the originally proposed questions in summary:

Do we benefit greatly from self-hosting a registry? We should draw on SEO data to understand the tradeoff we're making.

I don't think we've gotten a good answer on this from product yet. I think the useful data to draw on here would be customer feedback, not a number of page visits.

How can we decouple our dependency on GitHub?

By generating assets at build rather than fetching assets at runtime.

Our current caching strategy is 10 min - should this be increased or changed somehow?

One solution proposed making the caching a perpetual independent service. The other proposes fetching all assets at build so no caching is necessary.

Recommendations for GitHub token refresh & rate limit handling across multiple projects

Some documentation around this would be a good idea. I think right now this is falling on Kyle/Ammar, probably not something that should be on their plate. This is not addressed in the RFC but it could be, or I could start it as a separate conversation down the road.

Creation of a registry runbook so engineers outside of Blueberry can help troubleshoot (we shouldn't create this as part of the RFC; rather, it should be a layer that we can prioritize to consider the RFC fully implemented)

This is described in the RFC.

How should we implement an observability layer? Compare Prometheus vs GCP/Datadog logging

We decided on Prometheus for the simple reason that we already use it in coder/coder and adding the registry should be trivial costwise and complexity wise.