bobrofon / easysshfs

SSHFS for Android
MIT License
97 stars 16 forks source link

sshfs processes keeps running even after unmounting, exiting and force stopping app #33

Closed kattjevfel closed 2 years ago

kattjevfel commented 2 years ago

Everytime I close EasySSHFS when I'm done accessing my remote files, my phone starts getting really warm, and draining battery like mad. I've been dealing with this by just restarting my phone, but I finally looked into it and pulled up htop via adb, which shows 6 sshfs processes maxing out CPU cores: konsole_2022-04-13_10-40-36

Running EasySSHFS 0.5.6 from Google Play, lineageos 18.1 (android 11) on a OnePlus 5T.

bobrofon commented 2 years ago

OK, what I have figured out so far:

  1. Fortunately, I can reproduce this problem on my Xiaomi Mi A1 with LinageOS 18.1.
  2. This problem can be reproduced from a command line without easysshfs application (you can just run sshfs -o 'ssh_command=ssh' host:/remote/path /local/path and umount /local/path from shell). It means the problem is not related to APK itself, embedded busybox or libsu.
  3. Additional umount options like -l or -f don't change anything. Also, you can use the busybox/toybox version of umount with the same result. So it is not the problem with umount implementation.
  4. Reproducibility of this problem is not 100%. Sometimes, after unmounting the directory, sshfs and ssh processes terminate and sometimes they don't. I don't see a clear pattern in this behavior. When I try to connect to sshfs with strace to see what is happening on unmounting, the problem stops reproducing and sshfs terminates every time, like it should. (Some kind of race condition?)

It looks like the problem is somewhere in sshfs/ssh binary executable files. Version 0.5.6 is the current open beta on Google Play. But it shouldn't be important because sshfs/ssh versions have not changed since 2021/02/14. TODO(@bobrofon): test executable files from previous versions to be sure. @kattjevfel when did you notice this behavior for the first time?

Steps to reproduce:

  1. Mount some directory
    ./sshfs -o 'ssh_command=./ssh,UserKnownHostsFile=/dev/null,StrictHostKeyChecking=no,IdentityFile=/sdcard/id_rsa' user@host:/ /mnt/runtime/default/emulated/0/mnt
  2. Unmount this directory
    umount /mnt/runtime/default/emulated/0/mnt
  3. Check that it was unmounted
    mount | grep sshfs
    // should be empty

    but ssh/sshfs processes are still running.

    ps -A | grep ssh                                                                                                                                
    root          25452      1    2500   2040 do_select           0 S ssh
    root          25453      1    8592    212 futex_wait_queue_me 0 S sshfs

    Also, one of the sshfs threads should consume a full single CPU core.

    top -H
    TID USER         PR  NI VIRT  RES  SHR S[%CPU] %MEM     TIME+ THREAD          PROCESS                                                                      
    25455 root         20   0 8.3M 216K 4.0K R  100   0.0  11:39.46 sshfs           sshfs
  4. (Optional) Kill ssh process. Simple kill -INT -p $PID will be enough. The ssh process will die but it will not change anything.

The most suspicious part is the sshfs executable. The sshfs executable contains: sshfs itself, libfuse, uclibc. When this bug happens, sshfs process sticks with two threads: one thread consumes 100% CPU and another thread is waiting on futex (most likely it is waiting for the first thread to terminate). This is the status of signals of the thread consuming 100% CPU:

grep Sig /proc/25455/status
SigQ: 4/13124
SigPnd: 0000000000000000
SigBlk: 0000000080004007
SigIgn: 0000002000001000
SigCgt: 0000000180004003

SIGHUP, SIGINT, SIGQUIT, SIGTERM are blocked. It looks like a worker thread of libfuse:

/* Disallow signal reception in worker threads */
sigemptyset(&newset);
sigaddset(&newset, SIGTERM);
sigaddset(&newset, SIGINT);
sigaddset(&newset, SIGHUP);
sigaddset(&newset, SIGQUIT);

But right now I have no idea what this thread is doing.

The most suspicious parts are libfuse and uclibc (or something in the middle). Need further debugging.

kattjevfel commented 2 years ago

@kattjevfel when did you notice this behavior for the first time?

I've had this problem since I started using the app, so for probably a month. So at least once release prior.

And as for reproducing with your steps, I'm not sure I can really follow those, I do not have busybox etc, and the latter example only gave me a read: Connection reset by peer

The exact command I tried was /data/user/0/ru.nsu.bobrofon.easysshfs/files/sshfs -o 'ssh_command=/data/user/0/ru.nsu.bobrofon.easysshfs/files/ssh,UserKnownHostsFile=/dev/null,StrictHostKeyChecking=no,IdentityFile=/storage/emulated/0/test' katt@xxx:/mnt/jupiter/ /mnt/runtime/default/emulated/0/jupiter

I tried looking up what exact command the app performs but /proc/xxx/cmdline seems to rip out spaces making it hard to read. I can't figure out what part would make such a difference here: /data/user/0/ru.nsu.bobrofon.easysshfs/files/sshfs-ossh_command=/data/user/0/ru.nsu.bobrofon.easysshfs/files/ssh,password_stdin,UserKnownHostsFile=/dev/null,StrictHostKeyChecking=no,rw,dirsync,nosuid,nodev,noexec,umask=0,allow_other,uid=9997,gid=9997,IdentityFile=/storage/emulated/0/test,port=22katt@xxx:/mnt/jupiter//mnt/runtime/default/emulated/0/jupiter

bobrofon commented 2 years ago

I tried to update sshfs and libfuse to the latest stable releases and it looks like the problem was already fixed upstream. At least I cannot reproduce it anymore. I made an APK with new executable files https://github.com/bobrofon/easysshfs/raw/tmp-file-holder/easysshfs-0.5.6-buildroot-2022.02.1-release-signed.apk . @kattjevfel can you try it on your device?

I noticed that new ssh executable files are segfaulting on x86 architecture. And now I need to figure out why it is happening before I will merge the PR :(.

kattjevfel commented 2 years ago

Tried the linked apk and can no longer reproduce the issue, thanks for quick fix (even if it breaks on x86)!

bobrofon commented 2 years ago

Well, unexpectedly but it looks like the problem with x86 is not something new, it was here for a while. As I understand it is a problem in uClibc implementation of libpthread. When it links statically it can leave some weak references unresolved in some scenarios. But libgcc has some expectations about accessibility of symbols like __pthread_mutex_unlock if the application uses posix mutexes. Fortunately switching from uClibc to musl resolves this problem. And it also reduces the size of result executable files for a few megabytes (in total). @kattjevfel Thanks for a good report and assistance.