flokli / nix-casync

A more efficient way to store and substitute Nix store paths
80 stars 4 forks source link

Implement Garbage Collection #10

Open flokli opened 2 years ago

flokli commented 2 years ago

There should be a way to Remove castr chunks that are not referenced by any of the caibx.

Mic92 commented 2 years ago

An easy to implement alternative would be to rotate s3 buckets, by having older buckets read-only and only upload to the latest one. This should not be the end goal but at least this is better having no gc, which makes the project not really usable without infinite storage.

bbigras commented 2 years ago

Does nix-casync use a database? Could it try to delete files that were not modified or accessed recently, like cachix does?

flokli commented 2 years ago

There's multiple layers of GC here, starting from the bottom to the top:

Garbage Collection of unreferenced Chunks

To do this, we need to assemble the list of all referred chunks in all caibx files in the store. Chunks that are not part of that list, but that exist in the chunk store can be safely removed.

Garbage collection of Narfiles

To do this, we need to assemble the list of all Narfiles referred in all Narinfo files. Narfiles that are not referred in any Narinfo file can be safely removed.

Removal of Narinfo files

We can only remove Narinfo files that are not referred by any other Narinfo file.

We can start with files that are not referred by any other Narinfo file, check their last-access time, if it's too old, remove, and add all References to the next iteration (so we slowly walk our way up).

Asking the "referred by" question, as well as tracking access times requires some sort of database (so this is something for https://github.com/flokli/nix-casync/issues/9).

--

A locally deployed "cache" would probably not need to do the complicated "safe Narinfo removal", if we silently fetch the Narinfo again if it's requested.