CCBR / spacesavers2

https://ccbr.github.io/spacesavers2
0 stars 1 forks source link

spacesavers finds far less disk usage than `df` and `du` #102

Closed kelly-sovacool closed 2 months ago

kelly-sovacool commented 3 months ago

For /data/CCBR as of July 2024, df reports a disk usage of 197.2 TiB, while spacesavers2 reports a disk usage of 161 TiB.

I checked 3 project directories with spacesavers2 and compared the results to du -s. Note: df and du are not interchangeable. df cannot run on project directories (it just reports the overall usage for data mounts such as /data/CCBR), but du can. Also du -s /data/CCBR returns zero. See code here: https://github.com/CCBR/spacesavers2/blob/98b1527c795fe97e9adb848d1a2109414fc4aec7/tests/debug_102/bin/main.sh

FolderPath usage_TiB_spacesavers usage_TiB_du
/data/CCBR/projects/ccbr1332 10.865996 10.869083
/data/CCBR/projects/ccbr783 4.829858 5.012015
/data/CCBR/projects/ccbr984 5.751336 5.945362

I ran find to audit group permissions in these projects, and found one directory that was not readable:

find: ‘/data/CCBR/projects/ccbr1332/exome/tumor_only/.singularity/cache’: Permission denied

So it is possible that singularity cache directories are taking up disk space but are not findable by spacesavers. But this was only the case for ccbr1332, and doesn't explain the discrepancy for the other projects.

kopardev commented 3 months ago

Update: found a possible bug ... when spacesavers2 finds hardlinks ... it is supposed to designate 1 file as the original file ... count its size as nonduplicate bytes ... and also count files as non-duplicate files. But as of v0.13.0 it counts files as non-duplicate files but does not add non-duplicates bytes to the folder size. This may be the reason it is undercalling the folder sizes. Will be fixing this in v0.13.1