AppImage orphan survives main process exit

mehmetakbulut commented 3 days ago

I am seeing an orphaned process get left behind under some conditions after main process exits. This orphaned process seems to stay around forever in sleep state until reboot. I originally noticed this when appimage was run by a process running as a systemd service similar to AppImage/AppImageKit#1362 so I have put together the following example without systemd.

Running as root to demonstrate that it should not be a permissions issue.

Using following as entrypoint from appimage wrapper:

root@host:~# cat test.py
#!/usr/bin/env python3

print("Hello world!")

Running with strace to trace processes. We can see that main process forks pid 30812 (which I believe is this fork call) which then forks again with pid 30813. Main process then eventually execs our "Hello world!" script which exits properly. However we can see that a detached process is left behind which is pid 30813.

root@host:~# strace -f -e waitpid,fork,execve bazel-bin/test.AppImage
execve("bazel-bin/test.AppImage", ["bazel-bin/test.AppImage"], 0x7ffe197ee268 /* 21 vars */) = 0
fork()                                  = 30812
strace: Process 30812 attached
[pid 30812] fork()                      = 30813
strace: Process 30813 attached
[pid 30812] +++ exited with 0 +++
[pid 30811] --- SIGCHLD {si_signo=SIGCHLD, si_code=CLD_EXITED, si_pid=30812, si_uid=0, si_status=0, si_utime=0, si_stime=0} ---
strace: Process 30815 attached
[pid 30811] execve("/tmp/.mount_test.AALCfdi/AppRun", ["bazel-bin/test.AppImage"], 0x7ff6783cc440 /* 25 vars */) = 0
strace: Process 30816 attached
[pid 30816] execve("/usr/bin/dirname", ["dirname", "/tmp/.mount_test.AALCfdi/AppRun"], 0x558cc611ea28 /* 26 vars */) = 0
[pid 30816] +++ exited with 0 +++
[pid 30811] --- SIGCHLD {si_signo=SIGCHLD, si_code=CLD_EXITED, si_pid=30816, si_uid=0, si_status=0, si_utime=0, si_stime=0} ---
strace: Process 30817 attached
[pid 30817] execve("/usr/bin/dirname", ["dirname", "/tmp/.mount_test.AALCfdi/test.ru"...], 0x558cc611ea98 /* 26 vars */) = 0
[pid 30817] +++ exited with 0 +++
[pid 30811] --- SIGCHLD {si_signo=SIGCHLD, si_code=CLD_EXITED, si_pid=30817, si_uid=0, si_status=0, si_utime=0, si_stime=0} ---
[pid 30811] execve("./test", ["./test"], 0x558cc5034bc8 /* 28 vars */) = 0
[pid 30811] execve("/usr/local/sbin/python3", ["python3", "./test"], 0x7ffd39964728 /* 28 vars */) = -1 ENOENT (No such file or directory)
[pid 30811] execve("/usr/local/bin/python3", ["python3", "./test"], 0x7ffd39964728 /* 28 vars */) = -1 ENOENT (No such file or directory)
[pid 30811] execve("/usr/sbin/python3", ["python3", "./test"], 0x7ffd39964728 /* 28 vars */) = -1 ENOENT (No such file or directory)
[pid 30811] execve("/usr/bin/python3", ["python3", "./test"], 0x7ffd39964728 /* 28 vars */) = 0
strace: Process 30818 attached
[pid 30818] execve("/usr/local/sbin/uname", ["uname", "-p"], 0x7ffee55c89b0 /* 28 vars */) = -1 ENOENT (No such file or directory)
[pid 30818] execve("/usr/local/bin/uname", ["uname", "-p"], 0x7ffee55c89b0 /* 28 vars */) = -1 ENOENT (No such file or directory)
[pid 30818] execve("/usr/sbin/uname", ["uname", "-p"], 0x7ffee55c89b0 /* 28 vars */) = -1 ENOENT (No such file or directory)
[pid 30818] execve("/usr/bin/uname", ["uname", "-p"], 0x7ffee55c89b0 /* 28 vars */) = 0
[pid 30818] +++ exited with 0 +++
[pid 30811] --- SIGCHLD {si_signo=SIGCHLD, si_code=CLD_EXITED, si_pid=30818, si_uid=0, si_status=0, si_utime=0, si_stime=0} ---
[pid 30811] execve("/tmp/.mount_test.AALCfdi/test.runfiles/python_3_9_x86_64-unknown-linux-gnu/python_wrapper.sh", ["/tmp/.mount_test.AALCfdi/test.ru"..., "/tmp/.mount_test.AALCfdi/test.ru"...], 0x1afd4c0 /* 30 vars */) = 0
strace: Process 30819 attached
[pid 30819] execve("/usr/bin/dirname", ["dirname", "/tmp/.mount_test.AALCfdi/test.ru"...], 0x556d655cd510 /* 30 vars */) = 0
[pid 30819] +++ exited with 0 +++
[pid 30811] --- SIGCHLD {si_signo=SIGCHLD, si_code=CLD_EXITED, si_pid=30819, si_uid=0, si_status=0, si_utime=0, si_stime=0} ---
strace: Process 30820 attached
[pid 30820] execve("/tmp/.mount_test.AALCfdi/test.runfiles/python_3_9_x86_64-unknown-linux-gnu/bin/python3", ["/tmp/.mount_test.AALCfdi/test.ru"..., "-s", "/tmp/.mount_test.AALCfdi/test.ru"...], 0x556d655cd510 /* 30 vars */) = 0
Hello world!
[pid 30820] +++ exited with 0 +++
[pid 30811] --- SIGCHLD {si_signo=SIGCHLD, si_code=CLD_EXITED, si_pid=30820, si_uid=0, si_status=0, si_utime=2, si_stime=1} ---
[pid 30811] +++ exited with 0 +++
[pid 30815] --- SIGPIPE {si_signo=SIGPIPE, si_code=SI_USER, si_pid=30813, si_uid=0} ---
[pid 30815] --- SIGTERM {si_signo=SIGTERM, si_code=SI_USER, si_pid=30813, si_uid=0} ---
[pid 30815] +++ exited with 0 +++

^Cstrace: Process 30813 detached

30813 appears to be in sleep state while reading fuse device.

root@host:~# ps aux | grep 30813
root       30813  0.2  0.0   2632  1108 ?        Ss   11:25   0:00 bazel-bin/test.AppImage

root@host:~# cat /proc/30813/stack
[<0>] fuse_dev_do_read.isra.0+0x7f0/0x840
[<0>] fuse_dev_read+0x5f/0x90
[<0>] new_sync_read+0x110/0x1a0
[<0>] vfs_read+0xfe/0x190
[<0>] ksys_read+0x67/0xe0
[<0>] __x64_sys_read+0x1a/0x20
[<0>] do_syscall_64+0x38/0xc0
[<0>] entry_SYSCALL_64_after_hwframe+0x44/0xae

We can further confirm this. It is currently trying to read fd 5 which is /dev/fuse.

root@host:~# strace -p 30813
strace: Process 30813 attached
read(5, ^Cstrace: Process 30813 detached
 <detached ...>

root@host:~# ls -lah /proc/30813/fd/5
lrwx------ 1 root root 64 Nov 27 11:25 /proc/30813/fd/5 -> /dev/fuse

However if we run in an interactive environment (like most people would be doing by launching an application on their desktop or shell) using bash -i, then everything works properly with no detached processes left behind.

root@host:~# strace -f -e waitpid,fork,execve bash -i -c 'bazel-bin/test.AppImage'
execve("/usr/bin/bash", ["bash", "-i", "-c", "bazel-bin/test.AppImage"], 0x7ffe09a3c2d0 /* 21 vars */) = 0
strace: Process 31928 attached
[pid 31928] execve("/usr/bin/groups", ["groups"], 0x55e9349308c0 /* 21 vars */) = 0
[pid 31928] +++ exited with 0 +++
--- SIGCHLD {si_signo=SIGCHLD, si_code=CLD_EXITED, si_pid=31928, si_uid=0, si_status=0, si_utime=0, si_stime=0} ---
strace: Process 31929 attached
[pid 31929] execve("/usr/bin/lesspipe", ["lesspipe"], 0x55e934931170 /* 21 vars */) = 0
strace: Process 31930 attached
[pid 31930] execve("/usr/bin/basename", ["basename", "/usr/bin/lesspipe"], 0x56084be5a8b8 /* 21 vars */) = 0
[pid 31930] +++ exited with 0 +++
[pid 31929] --- SIGCHLD {si_signo=SIGCHLD, si_code=CLD_EXITED, si_pid=31930, si_uid=0, si_status=0, si_utime=0, si_stime=0} ---
strace: Process 31931 attached
strace: Process 31932 attached
[pid 31932] execve("/usr/bin/dirname", ["dirname", "/usr/bin/lesspipe"], 0x56084be65b28 /* 21 vars */) = 0
[pid 31932] +++ exited with 0 +++
[pid 31931] --- SIGCHLD {si_signo=SIGCHLD, si_code=CLD_EXITED, si_pid=31932, si_uid=0, si_status=0, si_utime=0, si_stime=0} ---
[pid 31931] +++ exited with 0 +++
[pid 31929] --- SIGCHLD {si_signo=SIGCHLD, si_code=CLD_EXITED, si_pid=31931, si_uid=0, si_status=0, si_utime=0, si_stime=0} ---
[pid 31929] +++ exited with 0 +++
--- SIGCHLD {si_signo=SIGCHLD, si_code=CLD_EXITED, si_pid=31929, si_uid=0, si_status=0, si_utime=0, si_stime=0} ---
strace: Process 31933 attached
[pid 31933] execve("/usr/bin/dircolors", ["dircolors", "-b"], 0x55e9349337c0 /* 21 vars */) = 0
[pid 31933] +++ exited with 0 +++
--- SIGCHLD {si_signo=SIGCHLD, si_code=CLD_EXITED, si_pid=31933, si_uid=0, si_status=0, si_utime=0, si_stime=0} ---
execve("bazel-bin/test.AppImage", ["bazel-bin/test.AppImage"], 0x55e9349337c0 /* 21 vars */) = 0
fork()                                  = 31934
strace: Process 31934 attached
[pid 31934] fork(strace: Process 31936 attached
)                      = 31936
[pid 31934] +++ exited with 0 +++
[pid 31927] --- SIGCHLD {si_signo=SIGCHLD, si_code=CLD_EXITED, si_pid=31934, si_uid=0, si_status=0, si_utime=0, si_stime=0} ---
strace: Process 31937 attached
[pid 31927] execve("/tmp/.mount_test.AcAadlB/AppRun", ["bazel-bin/test.AppImage"], 0x7ffb20309440 /* 25 vars */) = 0
strace: Process 31938 attached
[pid 31938] execve("/usr/bin/dirname", ["dirname", "/tmp/.mount_test.AcAadlB/AppRun"], 0x56524fd17a28 /* 26 vars */) = 0
[pid 31938] +++ exited with 0 +++
[pid 31927] --- SIGCHLD {si_signo=SIGCHLD, si_code=CLD_EXITED, si_pid=31938, si_uid=0, si_status=0, si_utime=0, si_stime=0} ---
strace: Process 31939 attached
[pid 31939] execve("/usr/bin/dirname", ["dirname", "/tmp/.mount_test.AcAadlB/test.ru"...], 0x56524fd17a98 /* 26 vars */) = 0
[pid 31939] +++ exited with 0 +++
[pid 31927] --- SIGCHLD {si_signo=SIGCHLD, si_code=CLD_EXITED, si_pid=31939, si_uid=0, si_status=0, si_utime=0, si_stime=0} ---
[pid 31927] execve("./test", ["./test"], 0x56524f1e7bc8 /* 28 vars */) = 0
[pid 31927] execve("/usr/local/sbin/python3", ["python3", "./test"], 0x7ffd6d5bf0a8 /* 28 vars */) = -1 ENOENT (No such file or directory)
[pid 31927] execve("/usr/local/bin/python3", ["python3", "./test"], 0x7ffd6d5bf0a8 /* 28 vars */) = -1 ENOENT (No such file or directory)
[pid 31927] execve("/usr/sbin/python3", ["python3", "./test"], 0x7ffd6d5bf0a8 /* 28 vars */) = -1 ENOENT (No such file or directory)
[pid 31927] execve("/usr/bin/python3", ["python3", "./test"], 0x7ffd6d5bf0a8 /* 28 vars */) = 0
strace: Process 31940 attached
[pid 31940] execve("/usr/local/sbin/uname", ["uname", "-p"], 0x7ffdc13b4570 /* 28 vars */) = -1 ENOENT (No such file or directory)
[pid 31940] execve("/usr/local/bin/uname", ["uname", "-p"], 0x7ffdc13b4570 /* 28 vars */) = -1 ENOENT (No such file or directory)
[pid 31940] execve("/usr/sbin/uname", ["uname", "-p"], 0x7ffdc13b4570 /* 28 vars */) = -1 ENOENT (No such file or directory)
[pid 31940] execve("/usr/bin/uname", ["uname", "-p"], 0x7ffdc13b4570 /* 28 vars */) = 0
[pid 31940] +++ exited with 0 +++
[pid 31927] --- SIGCHLD {si_signo=SIGCHLD, si_code=CLD_EXITED, si_pid=31940, si_uid=0, si_status=0, si_utime=0, si_stime=0} ---
[pid 31927] execve("/tmp/.mount_test.AcAadlB/test.runfiles/python_3_9_x86_64-unknown-linux-gnu/python_wrapper.sh", ["/tmp/.mount_test.AcAadlB/test.ru"..., "/tmp/.mount_test.AcAadlB/test.ru"...], 0x1839a90 /* 30 vars */) = 0
strace: Process 31941 attached
[pid 31941] execve("/usr/bin/dirname", ["dirname", "/tmp/.mount_test.AcAadlB/test.ru"...], 0x561ca7958590 /* 30 vars */) = 0
[pid 31941] +++ exited with 0 +++
[pid 31927] --- SIGCHLD {si_signo=SIGCHLD, si_code=CLD_EXITED, si_pid=31941, si_uid=0, si_status=0, si_utime=0, si_stime=0} ---
strace: Process 31942 attached
[pid 31942] execve("/tmp/.mount_test.AcAadlB/test.runfiles/python_3_9_x86_64-unknown-linux-gnu/bin/python3", ["/tmp/.mount_test.AcAadlB/test.ru"..., "-s", "/tmp/.mount_test.AcAadlB/test.ru"...], 0x561ca7958590 /* 30 vars */) = 0
Hello world!
[pid 31942] +++ exited with 0 +++
[pid 31927] --- SIGCHLD {si_signo=SIGCHLD, si_code=CLD_EXITED, si_pid=31942, si_uid=0, si_status=0, si_utime=3, si_stime=1} ---
[pid 31927] +++ exited with 0 +++
[pid 31937] --- SIGPIPE {si_signo=SIGPIPE, si_code=SI_USER, si_pid=31936, si_uid=0} ---
[pid 31936] --- SIGTERM {si_signo=SIGTERM, si_code=SI_USER, si_pid=31936, si_uid=0} ---
[pid 31937] +++ exited with 0 +++
+++ exited with 0 +++

In this case, detached process would have been pid 31936 (2nd forked process after appimage is exec'd) however it sends itself a SIGTERM after main process exits.

I have walked through appimage runtime, squashfuse and libfuse sources without much progress. I think I am missing some insight. Maybe this is intended behavior or some user error on my end.

probonopd commented 3 days ago

Thanks @mehmetakbulut. Most likely so far never ever thought about running an AppImage in a non-interactive environment. But your use case (running an AppImage using systemd) is a valid one, and maybe there are others.

@TheAssassin I believe this needs some debugger wizardry.

lalten commented 2 days ago

We have this problem a lot, mostly it becomes a problem when an appimage is restarted over and over with a failing systemd unit. At some point you run into the mount_max limitation of Fuse (default is 1000).

One fix is to enable auto-unmount in squashfuse, like in https://github.com/lalten/appimage-runtime-rs/blob/03911dd353085bfd15e57e5a0cd1211f633ed656/appimage-mount/src/mount.rs#L24-L25 appimage-runtime-rs isn't quite ready for primetime though.

Squashfuse's -o timeout=N means "idle N seconds for automatic unmount" Fuse's auto_unmount "enables automatic release of the mountpoint if filesystem terminates for any reason"

lalten commented 2 days ago

The reason this happens is basically https://github.com/AppImage/type2-runtime/issues/90

(which appimage-runtime-rs also does differently, in https://github.com/lalten/appimage-runtime-rs/blob/03911dd353085bfd15e57e5a0cd1211f633ed656/src/bin/runtime.rs#L117)

TheAssassin commented 2 days ago

@lalten we're looking into rewriting the runtime in Rust anyway, but neither of us is fluent enough (yet, working on that). In the meantime, do you think you could contribute a similar fix to this codebase?

mehmetakbulut commented 2 days ago

Thank you guys for the quick response!

I can patch runtime with timeout flags on my end as a workaround. I think that should allow mount & forked process to remain as long as main process exists? (since it is loaded from the mount?)

@lalten Do you have insight into why the behavior is different with bash -i?

# without bash -i
[pid 30815] --- SIGPIPE {si_signo=SIGPIPE, si_code=SI_USER, si_pid=30813, si_uid=0} ---
[pid 30815] --- SIGTERM {si_signo=SIGTERM, si_code=SI_USER, si_pid=30813, si_uid=0} ---    # detached process sends sigterm to some other child process
[pid 30815] +++ exited with 0 +++
^Cstrace: Process 30813 detached

vs

# with bash -i
[pid 31937] --- SIGPIPE {si_signo=SIGPIPE, si_code=SI_USER, si_pid=31936, si_uid=0} ---
[pid 31936] --- SIGTERM {si_signo=SIGTERM, si_code=SI_USER, si_pid=31936, si_uid=0} ---   # detached process sends sigterm to itself
[pid 31937] +++ exited with 0 +++
+++ exited with 0 +++

I thought maybe it has to do with the foreground flag but I don't quite follow how it would do this in the source.

probonopd commented 2 days ago

Looks like my PR is solving this now. Thanks everyone!

AppImage / type2-runtime

AppImage orphan survives main process exit #92