AppImage / type2-runtime

The runtime is the executable part of every AppImage. It mounts the payload via FUSE and executes the entrypoint.
Other
22 stars 18 forks source link

AppImage orphan survives main process exit #92

Open mehmetakbulut opened 3 days ago

mehmetakbulut commented 3 days ago

I am seeing an orphaned process get left behind under some conditions after main process exits. This orphaned process seems to stay around forever in sleep state until reboot. I originally noticed this when appimage was run by a process running as a systemd service similar to AppImage/AppImageKit#1362 so I have put together the following example without systemd.

Running as root to demonstrate that it should not be a permissions issue.

Using following as entrypoint from appimage wrapper:

root@host:~# cat test.py
#!/usr/bin/env python3

print("Hello world!")

Running with strace to trace processes. We can see that main process forks pid 30812 (which I believe is this fork call) which then forks again with pid 30813. Main process then eventually execs our "Hello world!" script which exits properly. However we can see that a detached process is left behind which is pid 30813.

root@host:~# strace -f -e waitpid,fork,execve bazel-bin/test.AppImage
execve("bazel-bin/test.AppImage", ["bazel-bin/test.AppImage"], 0x7ffe197ee268 /* 21 vars */) = 0
fork()                                  = 30812
strace: Process 30812 attached
[pid 30812] fork()                      = 30813
strace: Process 30813 attached
[pid 30812] +++ exited with 0 +++
[pid 30811] --- SIGCHLD {si_signo=SIGCHLD, si_code=CLD_EXITED, si_pid=30812, si_uid=0, si_status=0, si_utime=0, si_stime=0} ---
strace: Process 30815 attached
[pid 30811] execve("/tmp/.mount_test.AALCfdi/AppRun", ["bazel-bin/test.AppImage"], 0x7ff6783cc440 /* 25 vars */) = 0
strace: Process 30816 attached
[pid 30816] execve("/usr/bin/dirname", ["dirname", "/tmp/.mount_test.AALCfdi/AppRun"], 0x558cc611ea28 /* 26 vars */) = 0
[pid 30816] +++ exited with 0 +++
[pid 30811] --- SIGCHLD {si_signo=SIGCHLD, si_code=CLD_EXITED, si_pid=30816, si_uid=0, si_status=0, si_utime=0, si_stime=0} ---
strace: Process 30817 attached
[pid 30817] execve("/usr/bin/dirname", ["dirname", "/tmp/.mount_test.AALCfdi/test.ru"...], 0x558cc611ea98 /* 26 vars */) = 0
[pid 30817] +++ exited with 0 +++
[pid 30811] --- SIGCHLD {si_signo=SIGCHLD, si_code=CLD_EXITED, si_pid=30817, si_uid=0, si_status=0, si_utime=0, si_stime=0} ---
[pid 30811] execve("./test", ["./test"], 0x558cc5034bc8 /* 28 vars */) = 0
[pid 30811] execve("/usr/local/sbin/python3", ["python3", "./test"], 0x7ffd39964728 /* 28 vars */) = -1 ENOENT (No such file or directory)
[pid 30811] execve("/usr/local/bin/python3", ["python3", "./test"], 0x7ffd39964728 /* 28 vars */) = -1 ENOENT (No such file or directory)
[pid 30811] execve("/usr/sbin/python3", ["python3", "./test"], 0x7ffd39964728 /* 28 vars */) = -1 ENOENT (No such file or directory)
[pid 30811] execve("/usr/bin/python3", ["python3", "./test"], 0x7ffd39964728 /* 28 vars */) = 0
strace: Process 30818 attached
[pid 30818] execve("/usr/local/sbin/uname", ["uname", "-p"], 0x7ffee55c89b0 /* 28 vars */) = -1 ENOENT (No such file or directory)
[pid 30818] execve("/usr/local/bin/uname", ["uname", "-p"], 0x7ffee55c89b0 /* 28 vars */) = -1 ENOENT (No such file or directory)
[pid 30818] execve("/usr/sbin/uname", ["uname", "-p"], 0x7ffee55c89b0 /* 28 vars */) = -1 ENOENT (No such file or directory)
[pid 30818] execve("/usr/bin/uname", ["uname", "-p"], 0x7ffee55c89b0 /* 28 vars */) = 0
[pid 30818] +++ exited with 0 +++
[pid 30811] --- SIGCHLD {si_signo=SIGCHLD, si_code=CLD_EXITED, si_pid=30818, si_uid=0, si_status=0, si_utime=0, si_stime=0} ---
[pid 30811] execve("/tmp/.mount_test.AALCfdi/test.runfiles/python_3_9_x86_64-unknown-linux-gnu/python_wrapper.sh", ["/tmp/.mount_test.AALCfdi/test.ru"..., "/tmp/.mount_test.AALCfdi/test.ru"...], 0x1afd4c0 /* 30 vars */) = 0
strace: Process 30819 attached
[pid 30819] execve("/usr/bin/dirname", ["dirname", "/tmp/.mount_test.AALCfdi/test.ru"...], 0x556d655cd510 /* 30 vars */) = 0
[pid 30819] +++ exited with 0 +++
[pid 30811] --- SIGCHLD {si_signo=SIGCHLD, si_code=CLD_EXITED, si_pid=30819, si_uid=0, si_status=0, si_utime=0, si_stime=0} ---
strace: Process 30820 attached
[pid 30820] execve("/tmp/.mount_test.AALCfdi/test.runfiles/python_3_9_x86_64-unknown-linux-gnu/bin/python3", ["/tmp/.mount_test.AALCfdi/test.ru"..., "-s", "/tmp/.mount_test.AALCfdi/test.ru"...], 0x556d655cd510 /* 30 vars */) = 0
Hello world!
[pid 30820] +++ exited with 0 +++
[pid 30811] --- SIGCHLD {si_signo=SIGCHLD, si_code=CLD_EXITED, si_pid=30820, si_uid=0, si_status=0, si_utime=2, si_stime=1} ---
[pid 30811] +++ exited with 0 +++
[pid 30815] --- SIGPIPE {si_signo=SIGPIPE, si_code=SI_USER, si_pid=30813, si_uid=0} ---
[pid 30815] --- SIGTERM {si_signo=SIGTERM, si_code=SI_USER, si_pid=30813, si_uid=0} ---
[pid 30815] +++ exited with 0 +++

^Cstrace: Process 30813 detached

30813 appears to be in sleep state while reading fuse device.

root@host:~# ps aux | grep 30813
root       30813  0.2  0.0   2632  1108 ?        Ss   11:25   0:00 bazel-bin/test.AppImage

root@host:~# cat /proc/30813/stack
[<0>] fuse_dev_do_read.isra.0+0x7f0/0x840
[<0>] fuse_dev_read+0x5f/0x90
[<0>] new_sync_read+0x110/0x1a0
[<0>] vfs_read+0xfe/0x190
[<0>] ksys_read+0x67/0xe0
[<0>] __x64_sys_read+0x1a/0x20
[<0>] do_syscall_64+0x38/0xc0
[<0>] entry_SYSCALL_64_after_hwframe+0x44/0xae

We can further confirm this. It is currently trying to read fd 5 which is /dev/fuse.

root@host:~# strace -p 30813
strace: Process 30813 attached
read(5, ^Cstrace: Process 30813 detached
 <detached ...>

root@host:~# ls -lah /proc/30813/fd/5
lrwx------ 1 root root 64 Nov 27 11:25 /proc/30813/fd/5 -> /dev/fuse

However if we run in an interactive environment (like most people would be doing by launching an application on their desktop or shell) using bash -i, then everything works properly with no detached processes left behind.

root@host:~# strace -f -e waitpid,fork,execve bash -i -c 'bazel-bin/test.AppImage'
execve("/usr/bin/bash", ["bash", "-i", "-c", "bazel-bin/test.AppImage"], 0x7ffe09a3c2d0 /* 21 vars */) = 0
strace: Process 31928 attached
[pid 31928] execve("/usr/bin/groups", ["groups"], 0x55e9349308c0 /* 21 vars */) = 0
[pid 31928] +++ exited with 0 +++
--- SIGCHLD {si_signo=SIGCHLD, si_code=CLD_EXITED, si_pid=31928, si_uid=0, si_status=0, si_utime=0, si_stime=0} ---
strace: Process 31929 attached
[pid 31929] execve("/usr/bin/lesspipe", ["lesspipe"], 0x55e934931170 /* 21 vars */) = 0
strace: Process 31930 attached
[pid 31930] execve("/usr/bin/basename", ["basename", "/usr/bin/lesspipe"], 0x56084be5a8b8 /* 21 vars */) = 0
[pid 31930] +++ exited with 0 +++
[pid 31929] --- SIGCHLD {si_signo=SIGCHLD, si_code=CLD_EXITED, si_pid=31930, si_uid=0, si_status=0, si_utime=0, si_stime=0} ---
strace: Process 31931 attached
strace: Process 31932 attached
[pid 31932] execve("/usr/bin/dirname", ["dirname", "/usr/bin/lesspipe"], 0x56084be65b28 /* 21 vars */) = 0
[pid 31932] +++ exited with 0 +++
[pid 31931] --- SIGCHLD {si_signo=SIGCHLD, si_code=CLD_EXITED, si_pid=31932, si_uid=0, si_status=0, si_utime=0, si_stime=0} ---
[pid 31931] +++ exited with 0 +++
[pid 31929] --- SIGCHLD {si_signo=SIGCHLD, si_code=CLD_EXITED, si_pid=31931, si_uid=0, si_status=0, si_utime=0, si_stime=0} ---
[pid 31929] +++ exited with 0 +++
--- SIGCHLD {si_signo=SIGCHLD, si_code=CLD_EXITED, si_pid=31929, si_uid=0, si_status=0, si_utime=0, si_stime=0} ---
strace: Process 31933 attached
[pid 31933] execve("/usr/bin/dircolors", ["dircolors", "-b"], 0x55e9349337c0 /* 21 vars */) = 0
[pid 31933] +++ exited with 0 +++
--- SIGCHLD {si_signo=SIGCHLD, si_code=CLD_EXITED, si_pid=31933, si_uid=0, si_status=0, si_utime=0, si_stime=0} ---
execve("bazel-bin/test.AppImage", ["bazel-bin/test.AppImage"], 0x55e9349337c0 /* 21 vars */) = 0
fork()                                  = 31934
strace: Process 31934 attached
[pid 31934] fork(strace: Process 31936 attached
)                      = 31936
[pid 31934] +++ exited with 0 +++
[pid 31927] --- SIGCHLD {si_signo=SIGCHLD, si_code=CLD_EXITED, si_pid=31934, si_uid=0, si_status=0, si_utime=0, si_stime=0} ---
strace: Process 31937 attached
[pid 31927] execve("/tmp/.mount_test.AcAadlB/AppRun", ["bazel-bin/test.AppImage"], 0x7ffb20309440 /* 25 vars */) = 0
strace: Process 31938 attached
[pid 31938] execve("/usr/bin/dirname", ["dirname", "/tmp/.mount_test.AcAadlB/AppRun"], 0x56524fd17a28 /* 26 vars */) = 0
[pid 31938] +++ exited with 0 +++
[pid 31927] --- SIGCHLD {si_signo=SIGCHLD, si_code=CLD_EXITED, si_pid=31938, si_uid=0, si_status=0, si_utime=0, si_stime=0} ---
strace: Process 31939 attached
[pid 31939] execve("/usr/bin/dirname", ["dirname", "/tmp/.mount_test.AcAadlB/test.ru"...], 0x56524fd17a98 /* 26 vars */) = 0
[pid 31939] +++ exited with 0 +++
[pid 31927] --- SIGCHLD {si_signo=SIGCHLD, si_code=CLD_EXITED, si_pid=31939, si_uid=0, si_status=0, si_utime=0, si_stime=0} ---
[pid 31927] execve("./test", ["./test"], 0x56524f1e7bc8 /* 28 vars */) = 0
[pid 31927] execve("/usr/local/sbin/python3", ["python3", "./test"], 0x7ffd6d5bf0a8 /* 28 vars */) = -1 ENOENT (No such file or directory)
[pid 31927] execve("/usr/local/bin/python3", ["python3", "./test"], 0x7ffd6d5bf0a8 /* 28 vars */) = -1 ENOENT (No such file or directory)
[pid 31927] execve("/usr/sbin/python3", ["python3", "./test"], 0x7ffd6d5bf0a8 /* 28 vars */) = -1 ENOENT (No such file or directory)
[pid 31927] execve("/usr/bin/python3", ["python3", "./test"], 0x7ffd6d5bf0a8 /* 28 vars */) = 0
strace: Process 31940 attached
[pid 31940] execve("/usr/local/sbin/uname", ["uname", "-p"], 0x7ffdc13b4570 /* 28 vars */) = -1 ENOENT (No such file or directory)
[pid 31940] execve("/usr/local/bin/uname", ["uname", "-p"], 0x7ffdc13b4570 /* 28 vars */) = -1 ENOENT (No such file or directory)
[pid 31940] execve("/usr/sbin/uname", ["uname", "-p"], 0x7ffdc13b4570 /* 28 vars */) = -1 ENOENT (No such file or directory)
[pid 31940] execve("/usr/bin/uname", ["uname", "-p"], 0x7ffdc13b4570 /* 28 vars */) = 0
[pid 31940] +++ exited with 0 +++
[pid 31927] --- SIGCHLD {si_signo=SIGCHLD, si_code=CLD_EXITED, si_pid=31940, si_uid=0, si_status=0, si_utime=0, si_stime=0} ---
[pid 31927] execve("/tmp/.mount_test.AcAadlB/test.runfiles/python_3_9_x86_64-unknown-linux-gnu/python_wrapper.sh", ["/tmp/.mount_test.AcAadlB/test.ru"..., "/tmp/.mount_test.AcAadlB/test.ru"...], 0x1839a90 /* 30 vars */) = 0
strace: Process 31941 attached
[pid 31941] execve("/usr/bin/dirname", ["dirname", "/tmp/.mount_test.AcAadlB/test.ru"...], 0x561ca7958590 /* 30 vars */) = 0
[pid 31941] +++ exited with 0 +++
[pid 31927] --- SIGCHLD {si_signo=SIGCHLD, si_code=CLD_EXITED, si_pid=31941, si_uid=0, si_status=0, si_utime=0, si_stime=0} ---
strace: Process 31942 attached
[pid 31942] execve("/tmp/.mount_test.AcAadlB/test.runfiles/python_3_9_x86_64-unknown-linux-gnu/bin/python3", ["/tmp/.mount_test.AcAadlB/test.ru"..., "-s", "/tmp/.mount_test.AcAadlB/test.ru"...], 0x561ca7958590 /* 30 vars */) = 0
Hello world!
[pid 31942] +++ exited with 0 +++
[pid 31927] --- SIGCHLD {si_signo=SIGCHLD, si_code=CLD_EXITED, si_pid=31942, si_uid=0, si_status=0, si_utime=3, si_stime=1} ---
[pid 31927] +++ exited with 0 +++
[pid 31937] --- SIGPIPE {si_signo=SIGPIPE, si_code=SI_USER, si_pid=31936, si_uid=0} ---
[pid 31936] --- SIGTERM {si_signo=SIGTERM, si_code=SI_USER, si_pid=31936, si_uid=0} ---
[pid 31937] +++ exited with 0 +++
+++ exited with 0 +++

In this case, detached process would have been pid 31936 (2nd forked process after appimage is exec'd) however it sends itself a SIGTERM after main process exits.

I have walked through appimage runtime, squashfuse and libfuse sources without much progress. I think I am missing some insight. Maybe this is intended behavior or some user error on my end.

probonopd commented 3 days ago

Thanks @mehmetakbulut. Most likely so far never ever thought about running an AppImage in a non-interactive environment. But your use case (running an AppImage using systemd) is a valid one, and maybe there are others.

@TheAssassin I believe this needs some debugger wizardry.

lalten commented 2 days ago

We have this problem a lot, mostly it becomes a problem when an appimage is restarted over and over with a failing systemd unit. At some point you run into the mount_max limitation of Fuse (default is 1000).

One fix is to enable auto-unmount in squashfuse, like in https://github.com/lalten/appimage-runtime-rs/blob/03911dd353085bfd15e57e5a0cd1211f633ed656/appimage-mount/src/mount.rs#L24-L25 appimage-runtime-rs isn't quite ready for primetime though.

Squashfuse's -o timeout=N means "idle N seconds for automatic unmount" Fuse's auto_unmount "enables automatic release of the mountpoint if filesystem terminates for any reason"

lalten commented 2 days ago

The reason this happens is basically https://github.com/AppImage/type2-runtime/issues/90

(which appimage-runtime-rs also does differently, in https://github.com/lalten/appimage-runtime-rs/blob/03911dd353085bfd15e57e5a0cd1211f633ed656/src/bin/runtime.rs#L117)

TheAssassin commented 2 days ago

@lalten we're looking into rewriting the runtime in Rust anyway, but neither of us is fluent enough (yet, working on that). In the meantime, do you think you could contribute a similar fix to this codebase?

mehmetakbulut commented 2 days ago

Thank you guys for the quick response!

I can patch runtime with timeout flags on my end as a workaround. I think that should allow mount & forked process to remain as long as main process exists? (since it is loaded from the mount?)

@lalten Do you have insight into why the behavior is different with bash -i?

# without bash -i
[pid 30815] --- SIGPIPE {si_signo=SIGPIPE, si_code=SI_USER, si_pid=30813, si_uid=0} ---
[pid 30815] --- SIGTERM {si_signo=SIGTERM, si_code=SI_USER, si_pid=30813, si_uid=0} ---    # detached process sends sigterm to some other child process
[pid 30815] +++ exited with 0 +++
^Cstrace: Process 30813 detached

vs

# with bash -i
[pid 31937] --- SIGPIPE {si_signo=SIGPIPE, si_code=SI_USER, si_pid=31936, si_uid=0} ---
[pid 31936] --- SIGTERM {si_signo=SIGTERM, si_code=SI_USER, si_pid=31936, si_uid=0} ---   # detached process sends sigterm to itself
[pid 31937] +++ exited with 0 +++
+++ exited with 0 +++

I thought maybe it has to do with the foreground flag but I don't quite follow how it would do this in the source.

probonopd commented 2 days ago

Looks like my PR is solving this now. Thanks everyone!