RJVB / afsctool

This is a version of "brkirch"'s afsctool utility that allows end-users to leverage HFS+ compression.
https://brkirch.wordpress.com/afsctool
GNU General Public License v3.0
187 stars 18 forks source link

Compress, decompress, compress results in lots of mismatch errors #62

Open gingerbeardman opened 1 year ago

gingerbeardman commented 1 year ago

Repro

  1. afsctool -c /Applications/PlayOnMac.app (no errors)
  2. afsctool -d /Applications/PlayOnMac.app (no errors)
  3. afsctool -c -T LZFSE /Applications/PlayOnMac.app (errors!)

Comparison

  1. download new version of the app here
  2. afsctool -c -T LZFSE /Applications/PlayOnMac.app (no errors)

Errors

    size mismatch=0 read=57405 failure=0 content mismatch=1 (Undefined error: 0)
/Applications/PlayOnMac.app/Contents/Resources/unix/wine/lib32on64/wine/ext-ms-win-ntuser-draw-l1-1-0.dll: Compressed file check failed, reverting file changes
    size mismatch=0 read=52611 failure=0 content mismatch=1 (Undefined error: 0)
/Applications/PlayOnMac.app/Contents/Resources/unix/wine/lib32on64/wine/api-ms-win-core-libraryloader-l1-1-0.dll: Compressed file check failed, reverting file changes
    size mismatch=0 read=57535 failure=0 content mismatch=1 (Undefined error: 0)
/Applications/PlayOnMac.app/Contents/Resources/unix/wine/lib32on64/wine/ext-ms-win-security-credui-l1-1-0.dll: Compressed file check failed, reverting file changes
    size mismatch=0 read=57414 failure=0 content mismatch=1 (Undefined error: 0)
/Applications/PlayOnMac.app/Contents/Resources/unix/wine/lib32on64/wine/api-ms-win-security-lsalookup-l2-1-1.dll: Compressed file check failed, reverting file changes
    size mismatch=0 read=57513 failure=0 content mismatch=1 (Undefined error: 0)
/Applications/PlayOnMac.app/Contents/Resources/unix/wine/lib32on64/wine/ext-ms-win-gdi-render-l1-1-0.dll: Compressed file check failed, reverting file changes
    size mismatch=0 read=57411 failure=0 content mismatch=1 (Undefined error: 0)
Dr-Emann commented 1 year ago

This reproduces consistently for me with version of playonmac 4.4.3 which is the current most recent version for me.

Dr-Emann commented 1 year ago

Set a breakpoint on the message output: before afsctool reverts the compression, the file ends up containing all zeros

Dr-Emann commented 1 year ago

Focusing on one of the files which show this behavior: Contents/Resources/unix/wine/lib64/wine/api-ms-win-core-registry-l2-2-0.dll

The file fits in one block. It seems that when compressed with zlib, the file is compressed into the resource fork (it isn't compressed enough to fit in the decmpfs xattr). However, when compressed with lzfse, it does compress enough to fit into the decmpfs xattr.

It's not clear why this is a problem though. I've verified that the decmpfs xattr written is exactly the same when it's compressing a fresh version of the dll and when compressing after it's already been compressed and decompressed, it's not clear what could be carrying over to change things: with the same decmpfs xattr contents, enabling the compressed flag in one case works, and in the other case results in a file which reports to contain all zeros.

It really feels like a macos bug, I don't see any differences I can observe in the file before/after a compress+decompress cycle, but we're getting different behavior.

Dr-Emann commented 1 year ago

I believe this is a macos bug: if a file has been compressed into a resource fork, uncompressing the file, and recompressing directly into the decmpfs xattr, the file will appear as all zeros (of the correct len).

I've created a minimal reproduction: https://gist.github.com/Dr-Emann/3d0f272d08c68a889fa368597a10cea7. If SKIP_RFORK_COMPRESSION_FINISH is defined (with -DSKIP_RFORK_COMPRESSION_FINISH=1 when compiling the c code), the file will be correct. Without the define, the file will instead be filled with zeros.

RJVB commented 1 year ago

It's not clear why this is a problem though. I've verified that the decmpfs xattr written is exactly the same when it's compressing a fresh version of the dll and when compressing after it's already been compressed and decompressed

So you have verified that the compressed content is correct when you read it back back in? Theoretically that should mean that it decompresses to the expected original content but did you verify that?

Sadly I can't work on this myself because 10.9 doesn't support LZFSE compressed files (so it's also a bit hard to wrap my head about what's going on). We could implement the decompression inside afsctool but that won't arrange anything if we can still produce compressed files that appear to be filled with zeroes to anyone else.

Unless of course that implementation reproduces the issue...

What I don't get is that the verification step doesn't catch this problem and rewrites the file from the backup...

Anyway, if you do feel that this is an OS bug: how up-to-date is your OS? If you're on a still supported version you could file a radar about this, and see what Apple have to say. My hardware is limited to 10.13 even if I wanted to upgrade so there's no point for me to file a ticket when the 1st request is going to be that I install an up-to-date OS version...

Should we just disable compression into the decmpfs xattr?

I'm a bit surprised that LZFSE can compress better than zlib ... did you try the ultimate zlib compression (-9)?!

gingerbeardman commented 1 year ago

Results are compressing an app copied fresh out of the DMG.

❯ time afsctool -c -T ZLIB -9 PlayOnMac.app
/Users/matt/Downloads/2023-04-26/PlayOnMac.app:
Number of HFS+/APFS compressed files: 4561
================
CPU 98%
TIME    2:26.47
❯ time afsctool -c -T LZFSE PlayOnMac.app
/Users/matt/Downloads/2023-04-26/PlayOnMac.app:
Number of HFS+/APFS compressed files: 4570
================
CPU 97%
TIME    1:05.79

I'm using a MacBook Pro (M1 Pro CPU, 16GB RAM).

RJVB commented 1 year ago

OK, my bad, I don't ever use that compression...

gingerbeardman commented 1 year ago

No worries! It's an eye-opener, for sure.

RJVB commented 1 year ago

No worries! It's an eye-opener, for sure.

Especially the part where a straight-C compressor outperforms (speed-wise) zlib which does have SIMD optimisations. I'll try to compare the timing difference on my system which has CloudFlare's highly optimised version.

gingerbeardman commented 1 year ago

Please do, I love a good benchmark.

I wonder how much ARM optimisation Apple have done to these algos. I'm betting "a lot"!

kapitainsky commented 1 year ago

LZFSE is highly optimized for ARM so not big surprise

RJVB commented 1 year ago

LZFSE is highly optimized for ARM so not big surprise

You mean such that even plain C compiles to something highly efficient?

kapitainsky commented 1 year ago

on Intel mac:

❯ time afsctool -c -T ZLIB -9 PlayOnMac.app 2m52s 650 MB

❯ time afsctool -c -T LZFSE PlayOnMac.app 1m40s 599 MB

so looks like LZFSE is just well done:)

Dr-Emann commented 1 year ago

I think there's some confusion.

After a file has had compression turned on with a resource fork, compressing the same file into the decmpfs header and turning the compression flag on results in a file that appears to be filled with zeros. The verification step DOES catch this (the original content is not all zeros), and correctly reverts compression.

I verified that the same decmpfs header content works if the file has not already been compressed to a resource fork in the past, but fails if the file has been compressed to a resource fork in the past.

There's nothing compressor specific: it should be possible to find a file which compresses to the resource fork at zlib -1, and fits in the decmpfs header at zlib -9, and it would exhibit the same behavior, compressing with zlib -1, uncompressing, then compressing with zlib -9 would end up with a file containing all zeros (and validation would revert, if enabled).

I'm running macos 13.3.1, the newest version. Feedback reported through the feedback assistant as FB12146617

RJVB commented 1 year ago

I verified that the same decmpfs header content works if the file has not already been compressed to a resource fork in the past, but fails if the file has been compressed to a resource fork in the past.

So the issue is triggered by the following sequence?

1) compress a "new" (uncompressed) file such that its contents end up in the resource fork (e.g. with ZIP) 2) uncompress the file 3) recompress the resulting file with LZFSE so that its contents go into the xattr

(Please confirm that you can NOT skip step 2; afsctool should reject files that are already HFS-compressed.)

What happens if you replace step 2 with something like cat $file1 > $file2, i.e. an operation that should remove the entire resource fork? I have never checked if afsctool removes the resource fork if it doesn't contain anything else (and I'm not at my Mac right now).

Indeed, it should be possible to create a file that will just be too large when zipped with a lower compression level, to see if that triggers the same bug.

Dr-Emann commented 1 year ago

Yes, those steps reproduce the issue. I can confirm that skipping step two does not reproduce the problem.

Replacing step 2 with cat $file1 > $file2, then compressing $file2 with lzfse does not reproduce the problem. Inserting it as a step between 2 and 3 (or using cp between step 2 and 3) also does not reproduce the problem: The sha256sum of the file is the same after each step and it seems to be in some way actually tied to the file identity.

gingerbeardman commented 1 year ago

sha256sum of the file is the same after each step and it seems to be in some way actually tied to the file identity.

This smells of APFS copy in place. I'll try to test on an HFS Extended partition.

RJVB commented 1 year ago

This smells of APFS copy in place.

Yes, but it could also hint at my hunch that it is somehow tied to something to do with the resource fork, possibly something as stupid as not cleaning it up entirely. Except that removeResourceFork does seem to get set when it should, so the xattr and the resourcefork bits always get removed when needed. Do you know of tools to inspect those aspects of files? The xattr command on my OS doesn't list any attributes on files that I know to be compressed...

RJVB commented 1 year ago

See the commit message of #8b89d8f20e8dbb3efddbbca1f9f7852318727d87

It looks like inodes have some kind of memory?!?!