we might not really need historical records of inventory to achieve a "full backup" of S3. Inventory dumps themselves are quite large! I am still fetching (to facilitate analysis etc, but might stop doing that) and so far fetched 14TB. As such, it is a notable amount of storage . Here is how they grew through the years (per day)
(dandisets-2) dandi@drogon:/mnt/backup/dandi/dandiarchive-inventory$ code/print-manifest-summary dump/202*-01-01T*/manifest.json
dump/2020-01-01T00-00Z/manifest.json : 1 entries, 197K total size
dump/2021-01-01T00-00Z/manifest.json : 1 entries, 3.8M total size
dump/2022-01-01T00-00Z/manifest.json : 1 entries, 17M total size
dump/2023-01-01T01-00Z/manifest.json : 384 entries, 36G total size
dump/2024-01-01T01-00Z/manifest.json : 406 entries, 38G total size
and this year grew to 39G per day(!) which would amount 14TB per year just for the dumps (so I expect to fetch then 40TB... may be should interrupt and fetch specific days and their data only).
Mostly it is due to all the zarr/s. But it remains the case that we might want to prune some old inventory listings soonish. (attn @satra with whom we briefly discussed some bucket GCing to do)
As "discovered" in
we might not really need historical records of inventory to achieve a "full backup" of S3. Inventory dumps themselves are quite large! I am still fetching (to facilitate analysis etc, but might stop doing that) and so far fetched 14TB. As such, it is a notable amount of storage . Here is how they grew through the years (per day)
and this year grew to 39G per day(!) which would amount 14TB per year just for the dumps (so I expect to fetch then 40TB... may be should interrupt and fetch specific days and their data only).
Mostly it is due to all the
zarr/
s. But it remains the case that we might want to prune some old inventory listings soonish. (attn @satra with whom we briefly discussed some bucket GCing to do)