Open szhorvat opened 7 years ago
Yeah, I've been meaning to do something about that output display which is way too complex and "TMI". Part of the reason I didn't is that I don't want to get drawn into the debate whether to use 1024 or 1000 as a kilo unit, and how to abbreviate that. To me, a megabyte is 1024 x 1024 bytes, abbreviated as MB (or Mb), a gigabyte is 1024 times more and the 1000 base scheme nothing but a cheap trick used by disk makers ;)
It's cosmetic, so obviously not a big deal.
The biggest reason to fix it is that when someone notices it for the first time, it is not immediately obvious that it's cosmetic, and people may get worried about possible data loss (afsctool
can be scary to use). I was certainly a bit worried until I realized that the byte count is correct.
As for 1024 vs 1000, it's really not a big deal. Most other command line tools, like ls -lh
, use 1024, so why not go with that? (The Finder uses 1000, which is annoying.)
Can you check and close the issue if this now gets your approval? :)
Sorry for the late response. I am not sure what is happening, but it still doesn't report sizes correctly. I double-checked that I am using the version I just built (and the source I just downloaded).
For example, for the folder where I built it, it says
Folder size (uncompressed; reported size by Mac OS 10.6+ Finder): 1387738 bytes / 276.8 MB (megabytes, base-10)
Folder size (compressed - decmpfs xattr; reported size by Mac OS 10.0-10.5 Finder): 1387738 bytes / 276.8 MB (megabytes, base-10)
Folder size (compressed): 1387738 bytes / 264 MiB
So basically it's a factor 1000 (1024) off? I don't understand this, and in fact I remember that I couldn't even reproduce your original observation.
Looking at the function that does the conversion my first reaction was to rewrite it entirely using a different approach. I cannot really wrap my head around how it's being done currently, can you? But then I noticed that it works for me, so I went the easier route.
How do you build, i.e. with what compiler and options? I've tried with Apple's clang from Xcode 6.2 (Clang 600.0.57 or so) as well as with a stock clang 4.0, in both cases using -O3 -march=native and (evidently) 64bit mode. Are you by any chance building in 32bit mode?
I'm not really familiar with cmake, so all I did was
cmake .
make
It would complain about not finding sparsehash/dense_hash_map
, so I added /opt/local/include
to target_include_directories
, which seemed to work.
I built it on OS X 10.13.1 with the latest dev tools ("Apple LLVM version 9.0.0 (clang-900.0.38)")
But since it's just a misreporting, it's really not a big deal.
It is until we understand what's going on. If it miscompiles this code, who knows what else it miscompiles?
Weird indeed. I can't see the problem in the function. Will look more later. Here the output is fine: http://coliru.stacked-crooked.com/a/e04c6526167906df
I won't have more time for this today, but I put a printf
at the beginning of getSizeStr
to see what arguments it's getting, and I see this:
size: 1382668, size_rounded: 276824064
This doesn't look right. The problem seems to be with the size_rounded
value that's passed in.
size: 1382668, size_rounded: 276824064
This doesn't look right. The problem seems to be with the
size_rounded
value that's passed in.
No, that doesn't look right, esp. since size
and size_rounded
are supposed to be equal in at least the last call, "Folder size (compressed):" (I am claiming this from memory!).
A thought: what filesystem are you seeing this with? Is that by any chance on APFS?
Yes, APFS.
I just tried on HFS+ (external drive) and it looks fine there:
size: 15329972233, size_rounded: 15330181120
Folder size (uncompressed; reported size by Mac OS 10.6+ Finder): 15329972233 bytes / 15.33 GB (gigabytes, base-10)
Thanks for confirming.
Still a considerable difference (over 200000 bytes) but this does look more reasonable, no?
Apparently we have to find the HFS-specific contribution to the rounded file size estimate that goes awry on APFS. I've already seen 1 candidate (something using HFS attributes). But maybe I should just
st_blksize
(for one file) is 4096 on the HFS+ volume I tried. On the APFS one it is 4194304 = 2^22. That explains the large reported sizes, but I do not know much about file systems, so I don't know what this value means in practice (I prefer not to guess in public :) )
I find the code a bit too noisy to follow with confidence, but there are several repeated bits that seem to round up filesize
to a multiple of fileinfo->st_blksize
, like here:
filesize_rounded = filesize = fileinfo->st_size;
filesize_rounded += (filesize_rounded % fileinfo->st_blksize) ? fileinfo->st_blksize - (filesize_rounded % fileinfo->st_blksize) : 0;
You're right that the code is very noisy and repetitive. I don't feel very much like giving it the required overhaul, though ...
Could I ask you to try to determine whether the Finder still requires any special size calculations? If we can avoid that whole rounding step at least for files on APFS the problem is "solved" ...
I don't know the background context.
Do you mean that the whole purpose of this complicated calculation and rounding to "block sizes" (whatever those may be) is just to reproduce the Finder's results?
So you are asking to look at what the Finder reports on 10.13 / APFS and see which calculation it matches with?
Oh, and if it really interests you, try a disk benchmarking utility that allows you to use 4Mb (and larger) block sizes. I don't know what kind of drive you got that value from nor what size the file had (the value can be file-specific) but if the system is right you should see optimum throughput with 4Mb blocks.
This is of course unrelated to solving the issue at hand.
Meanwhile, I pushed a couple of changes (including one which should let cmake find sparsehash if you build for installing into the same prefix (/opt/local, right?).
I think I found the bug that was biting you. The rounded file sizes are in fact the sizes in terms of disk blocks. For instance, a file that's 1024 bytes small will still take up 4096 bytes if the disk uses that block size. The previous algorithm would calculate (size % st_blksize) and then add the complement of that to the file size; the a 1024 byte file was thus counted as (1024 + (4096 - 1024)). A small error for small blocksizes, but a possibly signifant error for a 4Mb block size.
There is of course also the possibility that we shouldn't be doing this rounding at all on APFS. That 4Mb value is probably a value that has little to do with the underlying disk block size, I really hope that APFS doesn't make you lose 4Mb per file!
Here's another small experiment: compare fileinfo->st_blocks with the actual filesize (and fileinfo->st_blksize) on HFS+ and APFS, and then to the disk sector size (printed by smartmontools, for instance). Many modern disks have a physical sector size of 4096 (coinciding with st_blksize on HFS+) but 512 bytes logical sector size (the legacy value). As far as I can tell fileinfo->st_blocks gives the filesize in multiples of 512 byte on the disks I have here. I've added a bit of (outcommented) debug printf to roundToBlkSize() which makes this easier.
Do you mean that the whole purpose of this complicated calculation and rounding to "block sizes" (whatever those may be) is just to reproduce the Finder's results?
Yes. I think the reason is that the Finder includes size information from resource forks, metadata etc.
So you are asking to look at what the Finder reports on 10.13 / APFS and see which calculation it matches with?
Yes. Take a couple of files and/or directories, see what the Finder reports before compression and what after compression and how that compares to the "unrounded" sizes afsctool reports. Comparison to the rounded sizes is a bonus, but probably not a very useful one since we already know that the rounding operation is flawed.
Oh, and may I also ask you to take a quick look at the man lstat
page to see if it says anything interesting about st_blksize that explains the difference? It's supposed to be the blocksize for optimum file I/O; the value you report seems improbable in that context...
When I run
afsctool -v .
in a large folder, I usually get output like this:Notice the mismatch between the byte count and the human readable form.
10175898032 bytes is actually 9.48 GiB or 10.18 GB.
The byte count is correct (as compared with e.g. the Finder's reporting). The human readable form is wrong.