Open flokli opened 2 years ago
An easy to implement alternative would be to rotate s3 buckets, by having older buckets read-only and only upload to the latest one. This should not be the end goal but at least this is better having no gc, which makes the project not really usable without infinite storage.
Does nix-casync use a database? Could it try to delete files that were not modified or accessed recently, like cachix does?
There's multiple layers of GC here, starting from the bottom to the top:
To do this, we need to assemble the list of all referred chunks in all caibx files in the store. Chunks that are not part of that list, but that exist in the chunk store can be safely removed.
To do this, we need to assemble the list of all Narfiles referred in all Narinfo files. Narfiles that are not referred in any Narinfo file can be safely removed.
We can only remove Narinfo files that are not referred by any other Narinfo file.
We can start with files that are not referred by any other Narinfo file, check their last-access time, if it's too old, remove, and add all References to the next iteration (so we slowly walk our way up).
Asking the "referred by" question, as well as tracking access times requires some sort of database (so this is something for https://github.com/flokli/nix-casync/issues/9).
--
A locally deployed "cache" would probably not need to do the complicated "safe Narinfo removal", if we silently fetch the Narinfo again if it's requested.
There should be a way to Remove castr chunks that are not referenced by any of the caibx.