Alluxio / alluxio

Alluxio, data orchestration for analytics and machine learning in the cloud
https://www.alluxio.io
Apache License 2.0
6.86k stars 2.94k forks source link

alluxio-fuse get fuse mount process id incorrectly #14379

Closed kevincai closed 3 years ago

kevincai commented 3 years ago

Alluxio Version: v2.6.2

Describe the bug when doing csi umount, alluxio-fuse was trying to get process pid by ps aux, script function fuse_stat returned process user instead its pid, and incorrectly kill it. which caused crash during exit

To Reproduce

W1102 09:17:35.391828 1 mount_helper_common.go:129] Warning: "/var/lib/kubelet/pods/50810713-80f9-4a9e-8ff1-75e179ce6334/volumes/kubernetes.io~csi/alluxio-csi-inline-vol/mount" is not a mountpoint, deleting


**Expected behavior**
Clean umount, no crash

**Urgency**
Describe the impact and urgency of the bug.

**Are you planning to fix it**
Please indicate if you are already working on a PR. 

**Additional context**

ps aux command output

PID USER TIME COMMAND 1 root 0:00 /usr/local/bin/alluxio-csi --v=5 --nodeid=minikube-m02 --endpoint=unix:///csi/csi.sock 741 root 0:02 /usr/lib/jvm/java-1.8-openjdk/bin/java -cp :/opt/alluxio/integration/fuse/bin/../alluxio-fuse-2.6.2.jar -Dalluxio.logger.type=FUSE_LOGGER -server -Xms1G -Xmx1G -XX:MaxDirectMemorySize=4g -Dalluxio.home=/opt/alluxio-2.6.2 -Dalluxio.conf.dir=/opt/alluxio-2.6.2/conf -Dalluxio.logs.dir=/opt/alluxio-2.6.2/logs -Dalluxio.user.logs.dir=/opt/alluxio-2.6.2/logs/user -Dlog4j.configuration=file:/opt/alluxio-2.6.2/conf/log4j.properties -Dorg.apache.jasper.compiler.disablejsr199=true -Djava.net.preferIPv4Stack=true -Dorg.apache.ratis.thirdparty.io.netty.allocator.useCacheForAllThreads=false -Dalluxio.master.hostname=11.166.85.199 alluxio.fuse.AlluxioFuse -o big_writes,allow_other,kernel_cache,ro -m /var/lib/kubelet/pods/54c6865d-1df2-47fd-a3e0-d4c7907e355e/volumes/kubernetes.io~csi/alluxio-csi-inline-vol/mount -r /zdfs_test 768 root 0:00 bash 781 root 0:00 ps -elf

pid is the first column, not the second one

diff --git a/integration/fuse/bin/alluxio-fuse b/integration/fuse/bin/alluxio-fuse index ac4dd91c14..63a32f7978 100755 --- a/integration/fuse/bin/alluxio-fuse +++ b/integration/fuse/bin/alluxio-fuse @@ -116,12 +116,12 @@ fuse_stat() { local fuse_info=$(ps aux | grep [A]lluxioFuse) if [[ -n ${fuse_info} ]]; then echo -e "pid\tmount_point\talluxio_path"

kevincai commented 3 years ago

csi nodeplugin use alpine:3.10.2 as base image, the ps inside the image is busybox-1.30.1 on normal dev image which uses centos.

ps aux command output looks like

USER        PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND
root          1  0.0  0.0 118948  2560 pts/0    Ss+  Sep01   0:00 bash

different version of ps has different output format, better has a more robust parser to get the pid.