NixOS / SC-election-2024

2024 Election for the Steering Committee
32 stars 75 forks source link

Should Nix self-host the binary cache? #19

Open nyabinary opened 2 months ago

nyabinary commented 2 months ago

Question

Do you believe that Nix should transition to self-hosting the binary cache on bare-metal hardware, rather than relying on third-party services for hosting its cache?

Candidates I'd like to get an answer from

No response

Reminder of the Q&A rules

Please adhere to the Q&A guidelines and rules

cafkafk commented 2 months ago

TL;DR:

To the extend possible, yes. Currently, data center space is expensive, and at our scale, we're likely no longer looking at a hobbyist garage full of towers. We have to accept that for a time, we'll likely not have the option to say no to cloud providers, and that they can be useful tools, as they often provide rapid scaling and can be useful for disaster recovery.

That said, we should ensure that infrastructure is made in a cloud agnostic way, so as to be as independent as possible from a single cloud provider, and to make the shift to a self hosted solution as pain free as possible. This also means restricting usage of proprietary software in our infrastructure, to ensure its longevity.

At the end of the day, we can't control when we'll be ready to make the move financially, but we can be as prepared as reasonably possible for when it becomes an option. That said, I think the costs is too prohibitive for our current financial position, but it's very reasonable that it will be possible in the future.

Another thing worth mentioning is that we should consider providing cache to more projects. Ideally, I think the foundation should host a community cache, that would allow tenancy of projects compatible with our values, and meeting certain criteria such as being of a certain size. Making cache more available to smaller projects would be a way in which we can help foster a more lively ecosystem, and user our resources in a way that is useful to the community at large.

mschwaig commented 1 month ago

Long term, I think it makes sense to invest in a peer-to-peer system to distribute traffic and storage, since that fits really well with how Nix works conceptually. In that context wanting one copy of the data on site somewhere makes sense to me. That also makes it easier to distribute the financial burden of hosting the cache.

Before that, in the short term, it's primarily a matter of cost and risk to me. People calculated various options, for example on discourse. We should look at those and pick one of them, which has a realistic plan for getting it done and gets us to a good place for a few years.

tomberek commented 1 month ago

I don't know. Conceptually, I like the idea of a community-maintained peer-to-peer cache and self-hosting. Self-hosting requires even more work to be placed onto a limited infrastructure team and either appproach requires additional development to be a viable alternative. Thus I think the right approach is to allow any of these to be experimented with:

TIP: When there is an interesting problem, try to get multiple teams competing to solve it. Competition is great fun and can produce better answers than monopolized problems. You can even explicitly create competitions with prizes for the best solutions.

getchoo commented 1 month ago

Unless there is a clear and concise path for doing this long-term (i.e., funding, partnership, etc), no. The community and Foundation have already had issues in maintaining the binary cache with outside support. Moving this completely in-house would most likely exaggerate this issue without a partner, and would definitely affect people in areas not in the immediate proximity of the bare metal hardware if we were to drop Fastly as well

As mentioned by others here, some nice work has also gone towards making a P2P binary cache viable, which could very well take the burden off the Foundation/NixOS org and help with the CDN issue. I find this to be more of something to consider in the (somewhat) far future though, as currently they are not production ready and haven't been tested at the scale of our current binary cache

Scrumplex commented 1 month ago

Do you believe that Nix should transition to self-hosting the binary cache on bare-metal hardware, rather than relying on third-party services for hosting its cache?

No. Object storage is an economy of scale, especially when you factor in multi-tiered storage. Hosting the cache ourselves is not just going to be a high upfront investment of buying the hardware, but it's also going to require much more maintenance in the future.

Now I don't think AWS is ever going to be the cheapest option out there either. If we take inspiration from other distros, I think the NixOS Foundation should promote third-party mirrors of the cache. I think something like a pull through cache would be good enough to mirror the Cache. cache.nixos.org could act as a broker to redirect requests to local cache mirrors.

Another approach would be to use something decentralized, as @tomberek mentioned above. Obsidian Systems has developed IPFS support for Nix a few years ago, which seemed interesting. (See https://github.com/obsidiansystems/ipfs-nix-guide)

In general, we definitely need better tooling to share store paths locally. In my case, I am running Harmonia on a local server and copy closures to it when I plan to use them on multiple machines. Having some kind of peer to peer cache would probably work much better and reduce load on cache.nixos.org.

nyabinary commented 1 month ago

I agree with Cafkafk's approach on transitioning to self-hosting the binary cache, but I also believe we should focus on immediate steps to manage costs, such as garbage collecting the current cache to keep costs low(er). While we work toward a self-hosted solution, maintaining provider-agnosticity is key, and careful management of existing resources like the cache will help minimize financial strain. In the long term, transitioning to a self-hosted, cloud-agnostic solution will strengthen Nix's independence and better align with our principles.

yu-re-ka commented 1 month ago

I would view the binary cache as two components: A "source of truth" and a CDN.

It is crucial that we move the source of truth to self-hosting, and this is very doable even in the short term. The reason is that it is not sustainable to rely on a single cloud service sponsoring a huge monthly bill, and with no easy way to migrate away from this service. I don't want to rely on AWS sponsoring us forever, and when we move away from AWS I don't want to just change this dependence to another cloud provider. Instead, the storage should be replicated and distributed across multiple colocations sponsored by multiple Nix friendly companies (or possibly a multi-cloud solution could also fulfill this requirement, but I'm not sure that is easier to implement). I'm very confident that it is possible to build a solution that preserves the integrity of the entire history of cache.nixos.org with the help of the community. In the future, deduplication can help to reduce the needed storage, but this needs more research (e.g. how well much does reference outlining help with the growth of the binary cache).

The CDN is another story, because it requires presence in all parts of the world and there is a much larger initial cost associated with building a CDN which can serve the current cache.nixos.org users. Plus, the vendor lock-in effect is much less pronounced for a CDN, so it is realistic to switch to a different CDN with not a huge amount of effort. I think we can stay with Fastly as long as they want to sponsor as. At the same time, we should look into ways of reducing the load on the CDN by ensuring CIs and cloud installations of NixOS cache the paths they need locally instead of re-fetching them every time.

roberth commented 1 month ago

Self hosting may be a good option, when we have a good plan. Funding a self-hosted replacement for the S3 part of the solution shouldn't be a problem, considering our large user base and perhaps even the EU's "digital sovereignty" goals.

As far as I know, we're still looking into multiple options, so I see no reason to take self-hosting off the table, but it does need a good plan.

winterqt commented 1 month ago

I don't have much to add, but I agree with @yu-re-ka and @roberth's takes. It's doable, but takes planning and good execution to be done well. I know we'll be able to do it eventually with time, though.

proofconstruction commented 1 month ago

As answered in #16, I would like to eventually achieve full reproducibility and ultimately remove the need for significant caching infrastructure, but in the meantime we should consider alternatives to the present situation. Self-hosting comes with considerable challenges and would require significantly more infrastructure support which we presently have neither the labor power nor the financial capacity to address.