RJVB / afsctool

This is a version of "brkirch"'s afsctool utility that allows end-users to leverage HFS+ compression.
https://brkirch.wordpress.com/afsctool
GNU General Public License v3.0
191 stars 18 forks source link

compressing breaks app (workaround: restart mac) #63

Open gingerbeardman opened 1 year ago

gingerbeardman commented 1 year ago

I have not looked into this but will do soon.

  1. OpenEMU 2.3.3 https://openemu.org
  2. afsctool -c -T LZFSE

App breaks and will not launch.

RJVB commented 1 year ago

Does the same thing happen with ZLIB compression?

Dr-Emann commented 1 year ago

It does appear to reproduce with zlib for me (afsctool -c -T ZLIB ~/Applications/OpenEmu.app). It only seems to happen after I've opened it once (download, unzip, move to applications folder, compress, open works fine; open first, then compress, and opens fail).

RJVB commented 1 year ago

It does appear to reproduce with zlib for me (afsctool -c -T ZLIB ~/Applications/OpenEmu.app). It only seems to happen after I've opened it once (download, unzip, move to applications folder, compress, open works fine; open first, then compress, and opens fail).

So if you compress first and then open everything is OK that does suggest that a file in the bundle remains open and is corrupted by the fact of being rewritten (in compressed form). That could be an executable too.

Combined with the issue from #62 I'm beginning to wonder if we should provide an option of compressing to a new file rather than rewriting the file (and maintaining its inode assocation). FWIW, I think that the old Unix trick where you can unlink ("delete") an open file e.g. to make it temporary (cleaned up at process exit) or to update a shared library hinges on the inode(s) holding the file. I keep forgetting that when I cp new shared library versions manually, which replaces the file contents rather than replacing the entire file. Each time I forget I have a face-palm moment when all applications that had that library open crash. This doesn't happen on Mac (AFAIK), but messing with an open file still isn't a good idea. Sadly there is no easy/fast way to determine if a file is open...

gingerbeardman commented 1 year ago

Sorry I had to post this issue in a hurry.

It is an app I've had installed for a while, temporarily tried out a different version, but could not restore from my backup (at the time). So I installed it fresh and it got compressed using my daemon and broke. I can exclude from daemon by compressing only the Info.plist which has been my recent temporary workaround for things that break.

Now that I am back home, I was able to restore my backup and I have a compressed version from January 2021 that works just fine.

old compressed backup that is working fine

❯ afsctool -fvvvvvvv OpenEmu.app
/Applications/OpenEmu.app:
Number of HFS+/APFS compressed files: 563
Total number of files: 614
Total number of file hard links: 0
Total number of folders: 272
Total number of folder hard links: 0
Total number of items (number of files + number of folders): 886
Folder size (uncompressed; reported size by Mac OS 10.6+ Finder): 51926573 bytes / 53.5 MB (megabytes, base-10)
Folder size (compressed): 37078610 bytes / 35.5 MiB
Compression savings: 28.6% over 563 of 614 files
Approximate total folder size (files + file overhead + folder overhead): 37240832 bytes / 35.5 MiB
gingerbeardman commented 1 year ago

Interesting:

  1. install the app
  2. run and quit the app
  3. (compressing app at this point would break it)
  4. restart Mac
  5. compress app
  6. app runs ok!
RJVB commented 1 year ago
  1. restart Mac
  2. compress app
  3. runs ok!

That would confirm my hypothesis that a file remains open, or something similar which explains why (I presume) you can't run the app from the DMG or its download location. Did you try logging off and back in before doing the restart?

Also, can we have some timing of getting the list of open files?

> time sh -c "lsof | wc -l"

Takes between 5 and 7 seconds on my 2011 i7. Annoyingly, every code example I've seen to check if a file is open in an application consists of the equivalent of lsof | fgrep $filename so that's not something you want to do for each file. But we could get the list once, just before we start compressing, and then use it as a filter...

(Also interesting: the list counting in the plain text version of the notification email ;) )

gingerbeardman commented 1 year ago

Did you try logging off and back in before doing the restart? I just tried, the app was corrupted by compression.

  1. run and quit
  2. logout
  3. compress app
  4. open is broken

Also, can we have some timing of getting the list of open files?

first one took 10 seconds, subsequent takes 5 after logout and back in, takes 0.1 seconds!

(Also interesting: the list counting in the plain text version of the notification email ;) )

yes, it's a shortcut I use when i'm duplicating items in the list or not sure of the final list order as it saves retyping the list order

RJVB commented 1 year ago

On Monday May 01 2023 04:40:06 Matt Sephton wrote:

Also, can we have some timing of getting the list of open files?

first one took 10 seconds, subsequent takes 5

Interesting that that hasn't become faster, given that your hardware must be newer than mine if you can run 10.14 ... Unless you have many more open files (I forgot to mention I have about 9800). Still, the additional overhead could be acceptable so the question is

gingerbeardman commented 1 year ago

After logout and login list of open files took 0.1 seconds. Just now it took 0.2 seconds for 17,000 open files.

what happens if you decompress a runnable compressed OpenEmu.app after running it?

it decompresses and corrupts!

RJVB commented 1 year ago

As I expected. This still hints to an open file. Which could be mmap'ed, in which case it might not show in lsof.

Sadly it happens regularly to me that an external drive "eject" is refused because files are open on it, which I cannot find via lsof.

But this reminds me of another way to the hypothesis. If OpenEMU can be run from other locations than /Applications you can make a fresh copy on an external, or on a diskimage (writable if needed), run it from there, quit it, and then try to eject that drive.

gingerbeardman commented 1 year ago

A handy app to see open files: https://sveinbjorn.org/sloth

Dr-Emann commented 1 year ago

The "right" thing is probably to compress to another file anyway, and then atomically rename on top of the original, so there's never a time when the file is corrupted, since right now, interrupting afsctool at the wrong time (with at least sigkill, probably others, I think it does ignore/handle SIGINT/SIGHUP), it can leave a corrupt/empty file.

gingerbeardman commented 1 year ago

IIRC it compresses in place to reduce file space use, but there may be other historic reasons.

RJVB commented 1 year ago

I've thought about that, but always decided against it, but it'd be opening a can of worms. I don't know of a safe way to do an atomic rename that will always work. 1 border case: you're compressing a file you have the relevant permissions for, in a directory where you don't have write permissions. You'll have to build the compressed file "somewhere" else (say $TMPDIR), but you run the risk that this is on a different device. The rename(2) syscall will fail in that case (I learnt that recently) so even if the syscall itself is atomic and non-interruptible (is it?) you can still get a race condition in your fallback code (which AFAIK means reading the source and writing the destination). Also, remember that the compressed data is not "in" the file but in "appendices". I have to assume that the current (original) code preserves any xattrs that were already present because the functions usd do (I think I never checked that, tbh) but if we're going to replace the original file we'll have to ensure we copy all that data.

I've found it better to accept the fact that rewriting files always comes at a risk and that killing the process with the wrong signal is not a good idea.

That's not to say that we cannot offer an option to replace the file, as I already suggested in #62 .

Meanwhile, I couldn't reproduce the issue with the latest OpenEMU build I can run on my system - but that one is over 7 years old so it's not surprising some things have changed since then...

gingerbeardman commented 1 year ago

Once I remember the other app this kind of breakage happens with I will post it here.

RJVB commented 1 year ago

IIRC it compresses in place to reduce file space use

Not really, because the backup copy made takes up as much space as you'd need for building the new file elsewhere...

I think the reason could have to do with not having to bother about other attributes that might be associated with the inode.

gingerbeardman commented 1 year ago

OK, so I made a discovery.

If an app is compressed and starts crashing at launch... simply restart your Mac. After that, the compressed app in question will launch just fine.

Was does this tell us? That it's some sort of caching issue?

RJVB commented 1 year ago

Was does this tell us? That it's some sort of caching issue?

Sure sounds like it. man dyld will show a list of env. variables to print out all kinds of things, and modify the loader's behaviour. Maybe something in there can help shed more light on the situation.

What does the crash reporter tell about the crash? Or what happens if you launch the application through lldb?

gingerbeardman commented 1 year ago

What does the crash reporter tell about the crash? Or what happens if you launch the application through lldb?

Not as much as expected, because:

Error Formulating Crash Report:
dyld_process_snapshot_get_shared_cache failed

HP Easy Scan-2023-05-15-150709.ips.txt

RJVB commented 1 year ago

On Monday May 15 2023 09:09:16 Matt Sephton wrote:

Error Formulating Crash Report: dyld_process_snapshot_get_shared_cache failed

Also when you start the app through lldb (instead of having it do post-mortem debugging)?

Note how rebuilding the shared dyld cache requires rebooting the machine according to https://obsigna.com/articles/1545759477.html

Does this happen only with applications that contain shared libraries (or frameworks), or that depend on such libraries that were also compressed?

The manpage for my version of update_dyld_shared_cache reads as if only shared libraries that are part of the OS are cached, but maybe that has changed, That would certainly explain why I cannot reproduce the issue with compressed app bundles... Still, it would be stupid if there are no means to update the cache of user libraries.

What happens if after the compression of a runnable application you do

find /path/to/foo.app -depth -exec touch -h -m '{}' ";"

(idem for any shared library dependencies outside the appbundle that you compressed).

That way dyld is at least warned that there's been a change to the files it loads...

gingerbeardman commented 1 year ago

What happens if after the compression of a runnable application you do

find /path/to/foo.app -depth -exec touch -h -m '{}' ";"

still crashes

gingerbeardman commented 1 year ago

I'm fairly confident this is an instance of the same root cause: https://developer.apple.com/forums/thread/696460

more info: https://developer.apple.com/documentation/security/updating_mac_software

and the solution is to replace files with new ones (eg. using ditto) rather than rewritten in-place (eg. using cp) which avoids the cache mismatch:

@RJVB implies that files are currently rewritten to maintain inode number in this comment: https://github.com/RJVB/afsctool/issues/63#issuecomment-1529588496

RJVB commented 1 year ago

On Thursday September 28 2023 16:47:37 Matt Sephton wrote:

I'm fairly confident this is the root cause:

https://developer.apple.com/forums/thread/696460

and the solution is to replace files with new ones (eg. using ditto) rather than replace them in-place (eg. using cp) which avoids the cache mismatch:

Maybe you're right, but

1) I'd expect Apple to update the correct way - and we're talking about files provided by them here 2) I fail to see how this could be a problem with headerfiles unless maybe you update one the wrong way while it's being included.

FWIW, I get bitten by this regularly on Linux; update a shared library using cp without the --remove-destination argument and all applications currently using the library will probably crash. (Instead of continuing to use the old version they have open.)

Dr-Emann commented 1 year ago

I think you think this is in reply to #66, this is on a different issue

RJVB commented 1 year ago

I think you think this is in reply to #66, this is on a different issue

OOps :)