NixOS / foundation

This is the home of the NixOS Foundation
61 stars 16 forks source link

[Long Term Strategy and Priorities] Migration of S3 Bucket Payments to Foundation #86

Open refroni opened 1 year ago

refroni commented 1 year ago

Listing out all options/possibilities that have been brought up or being explored for the long term improvements/resolutions/options below. Please add in anything that might be of interest to bring up/discuss/alternative options on the topic. Discussion: https://discourse.nixos.org/t/the-nixos-foundations-call-to-action-s3-costs-require-community-support/28672 Thank you to joepie91 and raitobezarius for helping put this initial list together from the matrix/discourse discussions:

  1. Tahoe-LAFS (distributed storage, not S3-compatible out of the box, can support storage nodes of any size and complexity including low-trust, but is slow) + central gateway server(s) to bridge to Fastly
  2. Tahoe-LAFS but with narinfo stored directly on the central gateway server(s) for better performance
  3. Garage (distributed storage, S3-compatible, flexible in storage node size, but nodes must be reliable and trustworthy, and cluster configuration must stay reasonably stable)
  4. Minio (distributed storage-ish, S3 compatible, fairly rigid expectations in cluster layout, commercial so future FOSS status is questionable)
  5. Single Big Server (optionally with replica) serving up the entire cache ... as owned hardware, colocated at a datacenter, with or without outsourced hardware management ... as a rented dedicated server(s), so hardware issues will be taken care of by the datacenter ... supplied by one or more sponsors
  6. Running university/ISP mirror schemes like many other distros do (eg. MirrorBrain)
  7. Hosting (historical) content at an academic/research institution
  8. Deleting old items from the cache (irrecoverable)
  9. Deleting items from staging once no longer needed
  10. Aggressively deduplicating and/or compressing our storage
  11. Ceph (distributed filesystem for petabyte scale, S3 compatible, industry standard, non-trivial to operate)
nixos-discourse commented 1 year ago

This issue has been mentioned on NixOS Discourse. There might be relevant details there:

https://discourse.nixos.org/t/the-nixos-foundations-call-to-action-s3-costs-require-community-support/28672/96

7c6f434c commented 1 year ago

Maybe for 8 there is 8.1: delete some of the old non-fixed-output paths, technically irrecoverable, but rebuildable in a «close enough» manner if desired.

vcunat commented 1 year ago

8.2 I like the notion that if we had good bit-for-bit reproducibility (and we do in most packages I think), keeping the tiny *.narinfo files with signed hash could be quite beneficial, as anyone could supply (or reupload) the binary at any time later. Even in 3rd party community/distributed fashion, for those old and (wrongly-)assumed-unneeded builds.

nh2 commented 1 year ago

Request for addition:

I suggested this as a short-term option in https://github.com/NixOS/foundation/issues/82#issuecomment-1576697509 but it is of course also a long-term possibility, and an alternative to Tahoe-LAFS or Minio.

RaitoBezarius commented 1 year ago
  1. Ceph (distributed filesystem for petabyte scale, S3 compatible, industry standard, non-trivial to operate)
jtolio commented 1 year ago
  1. Self-run Storj (open source, can be run on private instances in addition to the public service offering, we have a number of folks switching to us after tearing their hair out with Ceph. Works with self-run storage nodes or community contributed storage nodes. Does not require any involvement with cryptocurrency or blockchains.)
nixos-discourse commented 1 year ago

This issue has been mentioned on NixOS Discourse. There might be relevant details there:

https://discourse.nixos.org/t/s3-update-and-recap-of-community-call/28942/1

abathur commented 1 year ago

I'm not sure if they belong here, in the short-term thread, or in a dedicated thread, but I imagine there's a category of things that might help reduce the egress charges (if we aren't moving somewhere that offers to cover them).

Some of them overlap with items already mentioned for general cost reduction (like deduplication, deleting items), but I imagine there are more. (https://github.com/NixOS/foundation/issues/82#issuecomment-1576697509 in the short-term thread mentions two.)

Maybe also:

If a new cache was in place early and the extra latency of using two backends at fastly (only hitting S3 for things that aren't already in the new store) was tolerable, I guess the above could also be paired with deleting the corresponding paths from S3 to reduce storage costs there? (I guess this would also make it progressively easier/cheaper to understand what's left, perform deduplication, etc.)

misuzu commented 1 year ago

Running university/ISP mirror schemes like many other distros do (eg. MirrorBrain) Deleting old items from the cache (irrecoverable) Deleting items from staging once no longer needed Aggressively deduplicating and/or compressing our storage

Do we even have the tools for doing stuff like this? If we have, are they documented? How easy are they to use?

Nabile-Rahmani commented 1 year ago

Could we extend the binary cache concept to include a streamlined peer-to-peer service that any Nix machine could opt into ?

Users could add services.nix-serve-p2p.{enable,max-upload-speed} to their configuration and their machine's store would become part of the cache substituters. The manual steps would be automated, and a self-hosted daemon would serve data without the need to configure a nginx proxy.

Automatic peer discovery has to be taken into account:

(I apologise if I said a bunch of useless nonsense.)

RaitoBezarius commented 1 year ago

Could we extend the binary cache concept to include a streamlined peer-to-peer service that any Nix machine could opt into ?

Users could add services.nix-serve-p2p.{enable,max-upload-speed} to their configuration and their machine's store would become part of the cache substituters. The manual steps would be automated, and a self-hosted daemon would serve data without the need to configure a nginx proxy.

Automatic peer discovery has to be taken into account:

  • It could be implemented directly into the way Nix finds substituters, separately from the HTTP protocol.
  • The nix-serve-p2p service could signal its availability on start/stop (transmitting its address/port and trusted key) to a master server which would redirect to one of those random substituters at https://p2p-cache.nixos.org, and this would be added to nix.settings.substituters, though it certainly doesn't sound ideal to hit a centralised server, especially if there's a lot of trial and error to find a mirror that contains the store paths we want.

(I apologise if I said a bunch of useless nonsense.)

IMHO, this is a distribution question, not a storage question. And anyone is free to open a long term issue / exploration on better distribution layers for the Nix store :).

Nabile-Rahmani commented 1 year ago

IMHO, this is a distribution question, not a storage question. And anyone is free to open a long term issue / exploration on better distribution layers for the Nix store :).

Got it, though I guess if we were to go all in with community load sharing, storage could purge the binary caches (reproducible output, not "valuable" sources) and reduce costs on that front as a result.

This avenue would make sense if there is a large amount of seeders able to mostly take over the existing hosting solution, or its benefits outweigh the costs of S3/Fastly.

RaitoBezarius commented 1 year ago

IMHO, this is a distribution question, not a storage question. And anyone is free to open a long term issue / exploration on better distribution layers for the Nix store :).

Got it, though I guess if we were to go all in with community load sharing, storage could purge the binary caches (reproducible output, not "valuable" sources) and reduce costs on that front as a result.

This avenue would make sense if there is a large amount of seeders able to mostly take over the existing hosting solution, or its benefits outweigh the costs of S3/Fastly.

I think a lot of people would argue there's no real incentive for community to maintain a certain a QoS and then we are back to the centralized problem anyway, so I would not focus my own time there too quickly before we get a centralized system that is sustainable.

wmertens commented 1 year ago
  1. Aggressively deduplicating and/or compressing our storage

Ideally, we'd store builds in large chunk stores after pre-processing them to move store paths out of the files (*). A frontend can pretend to be a NAR store.

Then, we'd make nix-store aware of this format, and instead of requesting NARs it would fetch the needed chunks and combine them. This way, less data is transferred to the client, and the frontend is no longer needed.

I believe this will reduce the amount of data stored and transferred by several-fold.


(*) stripping store paths, here's how I see that happening:

Given that rolling chunks for deduplication are still quite big, and I suspect that many files only change after building by their inclusion of nix store paths that changed, how about pre-processing all stored files as follows:

The idea is to move the nix store paths (only up to the first part) into a separate list, and remove them from the file. So then you would replace a file F with a tuple (Fx, L). Fx is the binary contents of the file with every sequence matching /nix/store/[^/]+ removed, and L is a list of (position, path) tuples, that store the removed paths.

This can be encoded in a streaming manner, and decoded in a streaming manner provided you have access to the tuples L.

L can be compressed better by making position be relative to the end of the last match, and making path an index of a list of found paths. So then we get Lrel being a list of (relPosition, pathIndex) tuples, and P a list of paths, so F becomes (Fx, Lrel, P).

This result should be way better at being chunked. I am hoping that many rebuilt files will have the same Fx and Lrel, and only P will differ.

The /nix/store/ part should be configurable during encoding.

nixos-discourse commented 1 year ago

This issue has been mentioned on NixOS Discourse. There might be relevant details there:

https://discourse.nixos.org/t/nixos-s3-short-term-resolution/29413/1

nixos-discourse commented 1 year ago

This issue has been mentioned on NixOS Discourse. There might be relevant details there:

https://discourse.nixos.org/t/nixos-s3-short-term-resolution/29413/3

chkno commented 1 year ago

I think a lot of people would argue there's no real incentive for community to maintain a certain a QoS

When clients fetch from multiple redundant sources in parallel, slow or unavailable sources have very little impact on performance; Bittorrent solved this problem twenty years ago. See also The Tail at Scale.

Could we extend the binary cache concept to include a streamlined peer-to-peer service that any Nix machine could opt into ?

I think a distributed storage model like this is the best path to a long-term-sustainable, no-monetary-cost solution. There is so much good will in the general user community. If it's as simple as responding to an "uncomment this line to help support this community" note in the nixos-generate-config default config, I think we'd easily get distributed storage serving capacity adequate to handle the binary cache. And if I'm wrong about this and we only get, say 50% of the needed capacity from volunteers, that's a 50% cost reduction on whatever service picks up the rest of the load.

Nabile-Rahmani commented 1 year ago

I think a distributed storage model like this is the best path to a long-term-sustainable, no-monetary-cost solution. There is so much good will in the general user community. If it's as simple as responding to an "uncomment this line to help support this community" note in the nixos-generate-config default config, I think we'd easily get distributed storage serving capacity adequate to handle the binary cache. And if I'm wrong about this and we only get, say 50% of the needed capacity from volunteers, that's a 50% cost reduction on whatever service picks up the rest of the load.

The only risk I see in participating is potentially leaking private/secret derivation data, is it not ?

Currently, it looks like servers help clients know in advance if cached results exist by exposing a listing of store paths (curl https://releases.nixos.org/nixos/23.05/nixos-23.05.1272.ecb441f2206/store-paths.xz | xzless).

But in a peer-to-peer system, could leechers instead query on-demand paths they know about (i.e. public packages) to seeders to reduce attacks ?

Attackers would have to bruteforce the private hash + derivation name & version, but I don't know how risky this still is.

Additionally, the service implementation could rate limit too many failed queries, but only if they're not part of public store-paths from registered substituters since we don't want to rate limit legitimate queries.

zimbatm commented 1 year ago

What I would suggest is to start a P2P Nix cache working group and discuss the implementation details there. The best way to do this is to announce it on Discourse and gather interested members. And then start implementing a prototype to demonstrate the feasibility.

What's nice is that we have multiple concurrent efforts, and all of them are complementary AFAIK. And the exact shape of the new solution will mostly depend on your participation.

nixos-discourse commented 1 year ago

This issue has been mentioned on NixOS Discourse. There might be relevant details there:

https://discourse.nixos.org/t/peer-to-peer-binary-cache-rfc-working-group-poll/29568/1

nixos-discourse commented 1 year ago

This issue has been mentioned on NixOS Discourse. There might be relevant details there:

https://discourse.nixos.org/t/peer-to-peer-binary-cache-rfc-working-group-poll/29568/5

7c6f434c commented 1 year ago

And if I'm wrong about this and we only get, say 50% of the needed capacity from volunteers, that's a 50% cost reduction on whatever service picks up the rest of the load.

The question is not so much about getting enough capacity, it is about having a good understanding of availability projections, and also of people having or not having time to perform active maintenance (like support software version updates) on their chunks of distributed storage.

See also: OfBorg community-provided builders (where Graham complained about people having time to update the builder code not being too predictable — the updates were smooth BTW, and eventually it all ended up with a centrally-managed infrastructure)

nixos-discourse commented 1 year ago

This issue has been mentioned on NixOS Discourse. There might be relevant details there:

https://discourse.nixos.org/t/nixos-s3-short-term-resolution/29413/14

endgame commented 1 year ago

Observation: one of the big challenges with moving off of AWS is the egress costs for the S3 bucket, whether via outbound bandwidth or Snow Family products. If we had a job somewhere that downloaded only items that were in the fastly cache, we'd be able to accumulate a subset of built derivations without incurring egress charges. This seems like it'd converge to a decent subset of the files currently in S3, and more importantly it will put an upper bound on the amount of "old stuff" we'd have to egress.

rjpcasalino commented 7 months ago

I'm trying to keep track of this and dropping this here for others; last meeting was on 11-21-2023 https://pad.lassul.us/nixos-cache-gc# - let me know if this is the wrong place