Open ftlynx opened 5 years ago
This command does not have any timeout configured. We probably should call exec.CommandContext
instead.
Dfget server process is created by dfget process by StartPeerServerProcess
.At the beginning and ending of this function, checkPeerServerExist
will be invoken to check whether dfget http server is ready.
But in concurrent situation dfget server may be created many times and generate many dfget server processes. Only one of them can listen on the peer port, other processes will exit for address already in use
before deget process finish. These processes will become defunc process.
When dfget server process does not exist, pulling a multi-layers image can reproduce the problem.
root 14626 14599 2 21:00 pts/1 00:00:00 /data/app/src/github.com/dragonflyoss/Dragonfly/cmd/dfdaemon/dfget -u https://hub.bilibili.co/v2/zhouchencheng/airflow/blobs/sha256:8f601293b2d86141c418eae05f224e0188ed5cb39336d52ac09cfd5556f076cc -o /data/docker/.small-dragonfly/dfdaemon/data/a6c3be83-9dc6-41e9-80a7-a547d91c2677 --home /data/docker/.small-dragonfly --dfdaemon -s 200MB --totallimit 200MB --node 172.16.38.93 --header User-Agent:docker/18.06.3-ce go/go1.10.3 git-commit/d7080c1 kernel/4.9.0-0.bpo.5-amd64 os/linux arch/amd64 UpstreamClient(Docker-Client/18.06.3-ce \(linux\)) --header Authorization:Bearer U27TkYtfTiSiMajMSG25zul11PLoCVrxGcjaUwGLrmFjIyGEescHRxg2oDL1zRh4dhlysIWZBXF_Mk-e_0lmy3Y1YFhi0OmMqO2TVSmULY_M50q3vRClbkpLKRCokNESUewj7TyOEGiXBaUuCWuI --header X-Forwarded-For:127.0.0.1 --insecure --cacerts /etc/docker/certs.d/hub.bilibili.co/ca.crt --cacerts /etc/docker/certs.d/hub.bilibili.co/ca.crt
root 14628 14599 2 21:00 pts/1 00:00:00 /data/app/src/github.com/dragonflyoss/Dragonfly/cmd/dfdaemon/dfget -u https://hub.bilibili.co/v2/zhouchencheng/airflow/blobs/sha256:b4aa2612cd306f180973ea3b6e0c151d6e7c3b0f45b0bcb7dcc8f705fdd1ec6f -o /data/docker/.small-dragonfly/dfdaemon/data/2708adfc-2606-4fe5-bdb4-c16ce19c0269 --home /data/docker/.small-dragonfly --dfdaemon -s 200MB --totallimit 200MB --node 172.16.38.93 --header Authorization:Bearer 1PLoCVrxGcjKIP6EaUwGLrmFjIyGEescHRxg2oDL1zRh4dhlysIWZBXF_Mk-e_0lmy3Y1YFhi0OmMwytMHDuo6AQkXzN6MuvIjYqO2TVSmULY_M50q3vRClbkpLKRCokNESUewj7TyOEGiXBaUuCWuI --header X-Forwarded-For:127.0.0.1 --header User-Agent:docker/18.06.3-ce go/go1.10.3 git-commit/d7080c1 kernel/4.9.0-0.bpo.5-amd64 os/linux arch/amd64 UpstreamClient(Docker-Client/18.06.3-ce \(linux\)) --insecure --cacerts /etc/docker/certs.d/hub.bilibili.co/ca.crt --cacerts /etc/docker/certs.d/hub.bilibili.co/ca.crt
root 14633 14599 2 21:00 pts/1 00:00:00 /data/app/src/github.com/dragonflyoss/Dragonfly/cmd/dfdaemon/dfget -u https://hub.bilibili.co/v2/zhouchencheng/airflow/blobs/sha256:df634dfeea0efe695a3fc05109e0e9c2b9d2296560cf2e442e77114300cd2cab -o /data/docker/.small-dragonfly/dfdaemon/data/6041561f-348c-48aa-a024-e8708d3bca5e --home /data/docker/.small-dragonfly --dfdaemon -s 200MB --totallimit 200MB --node 172.16.38.93 --header X-Forwarded-For:127.0.0.1 --header User-Agent:docker/18.06.3-ce go/go1.10.3 git-commit/d7080c1 kernel/4.9.0-0.bpo.5-amd64 os/linux arch/amd64 UpstreamClient(Docker-Client/18.06.3-ce \(linux\)) --header Authorization:Bearer DL1zRh4dhlysIWZBXF_Mk-e_0lmy3Y1YFhi0OmMwytMHDuo6AQkXzN6MuvIjYqO2TVSmULY_M50q3vRClbkpLKRCokNESUewj7TyOEGiXBaUuCWuI --insecure --cacerts /etc/docker/certs.d/hub.bilibili.co/ca.crt --cacerts /etc/docker/certs.d/hub.bilibili.co/ca.crt
root 14644 14599 16 21:00 pts/1 00:00:00 /data/app/src/github.com/dragonflyoss/Dragonfly/cmd/dfdaemon/dfget -u https://hub.bilibili.co/v2/zhouchencheng/airflow/blobs/sha256:cc1a78bfd46becbfc3abb8a74d9a70a0e0dc7a5809bbd12e814f9382db003707 -o /data/docker/.small-dragonfly/dfdaemon/data/532aa339-914c-4395-99c7-9da77481f2ec --home /data/docker/.small-dragonfly --dfdaemon -s 200MB --totallimit 200MB --node 172.16.38.93 --header X-Forwarded-For:127.0.0.1 --header User-Agent:docker/18.06.3-ce go/go1.10.3 git-commit/d7080c1 kernel/4.9.0-0.bpo.5-amd64 os/linux arch/amd64 UpstreamClient(Docker-Client/18.06.3-ce \(linux\)) --header Authorization:Bearer o6AQkXzbkpLKRCokNESUewj7TyOEGiXBaUuCWuI --insecure --cacerts /etc/docker/certs.d/hub.bilibili.co/ca.crt --cacerts /etc/docker/certs.d/hub.bilibili.co/ca.crt
root 14659 14628 2 21:00 pts/1 00:00:00 /data/app/src/github.com/dragonflyoss/Dragonfly/cmd/dfdaemon/dfget server --ip 172.16.38.93 --port 0 --meta /data/docker/.small-dragonfly/meta/host.meta --data /data/docker/.small-dragonfly/data --home /data/docker/.small-dragonfly --expiretime 3m0s --alivetime 5m0s
root 14665 14626 2 21:00 pts/1 00:00:00 [dfget] <defunct>
root 14667 14633 2 21:00 pts/1 00:00:00 [dfget] <defunct>
root 14685 14644 2 21:00 pts/1 00:00:00 [dfget] <defunct>
root 14721 5908 0 21:00 pts/3 00:00:00 /bin/grep --color=auto dfget
@zhouhaibing089 @Starnop
Nice findings!
I'm also curious on whether dfget
should wait for the exit status from the spawned dfget server
processes?(if not, how could such defunct process happen..) I assume those child processes always live longer than their parent.
I'm also curious on whether
dfget
should wait for the exit status from the spawneddfget server
processes?(if not, how could such defunct process happen..) I assume those child processes always live longer than their parent.
If dfget server process is not exist, dfget process will create it and it will stay alive for over alivetime
. This period is usually much longer than dfget process's alive time. Once dfget process destory, dfget server process's parent will be Pid 1.
/reopen
encounter the same issue when using ver-1.0.0 release.
I used the docker image of 1.0.0 and encountered the same problem. Does this matter with the size of the image?Can i modify the parameters to solve?
I encountered the similar problem with 1.0.6 docker image on k8s. when I use dfget command to download files, and after the specific alivetime, the dfget server exit, but it's become a zombie process.
you can reproduce this problem just follow simple steps:
dfget -u <some-file> --node <supernode-ip>:8002 -p p2p --totallimit 10G --locallimit 10G --alivetime 3s --expiretime 1s
top
, a zombie process [dfget] spawnsfollowing is the dfserver logs with default 5m alivetime and 3m expiretime:
cat /root/.small-dragonfly/logs/dfserver.log
2021-04-19 03:06:06.184 INFO sign:35-1618801566.184 : ********************
2021-04-19 03:06:06.184 INFO sign:35-1618801566.184 : start peer server...
2021-04-19 03:06:06.190 INFO sign:35-1618801566.184 : start peer server success, host:<ip>, port:61005
2021-04-19 03:06:06.190 INFO sign:35-1618801566.184 : monitor peer server whether is alive, aliveTime:5m0s
2021-04-19 03:06:06.190 INFO sign:35-1618801566.184 : start server gc, expireTime:3m0s
2021-04-19 03:06:06.191 INFO sign:35-1618801566.184 : update total limit to 8589934592
2021-04-19 03:06:49.576 INFO sign:35-1618801566.184 : update total limit to 8589934592
2021-04-19 03:09:51.203 INFO sign:35-1618801566.184 : server gc, delete file:/root/.small-dragonfly/data/<some-file>.service
2021-04-19 03:09:51.204 INFO sign:35-1618801566.184 : server gc, delete file:/root/.small-dragonfly/data/<some-file>.service
2021-04-19 03:11:49.651 INFO sign:35-1618801566.184 : no more task, peer server will stop...
2021-04-19 03:11:49.651 INFO sign:35-1618801566.184 : peer server is shutdown.
Question
docker pull 后,dfget出现了僵尸进程。采用dragonflyoss/dfclient:0.4.3 镜像部署 [root@host-192-168-55-118 logs]# ps -ef | grep dfget root 8717 18498 0 13:58 ? 00:00:00 [dfget]
root 8725 18498 0 13:58 ? 00:00:00 [dfget]
root 8727 18498 0 13:58 ? 00:00:00 [dfget]
root 8738 18498 0 13:58 ? 00:00:00 [dfget]
root 18198 22070 0 14:20 pts/1 00:00:00 grep --color=auto dfget
root 18910 18498 0 13:02 ? 00:00:00 [dfget]
root 18917 18498 0 13:02 ? 00:00:00 [dfget]
root 18926 18498 0 13:02 ? 00:00:00 [dfget]
root 18933 18498 0 13:02 ? 00:00:00 [dfget]
root 21832 18498 0 13:10 ? 00:00:00 [dfget]