Open KevinLiangX opened 2 years ago
It doesn't usually cause this problem, can you show me the logs?
hello this is just our test case for DFX 1、find out the fluent-operator docker process [root@k8s-4 ~]# docker ps |grep fluentbit-operator 892bf193483f rvm:5100/kubesphere/fluentbit-operator "/manager" 2 days ago Up 2 days k8s_fluentbit-operator_fluentbit-operator-85855568c6-6ng9f_kubesphere-logging-system_89dfe4d6-a84a-463e-980d-48c88801fe37_0 5c72dc88ab71 rvm:5100/fitcontainer/pause:3.2 "/pause" 2 days ago Up 2 days k8s_POD_fluentbit-operator-85855568c6-6ng9f_kubesphere-logging-system_89dfe4d6-a84a-463e-980d-48c88801fe37_0 2、 check the docker process [root@k8s-4 ~]# docker top 892bf193483f UID PID PPID C STIME TTY TIME CMD 65532 14995 14978 0 Dec13 ? 00:04:25 /manager [root@k8s-4 ~]# [root@k8s-4 ~]# ps aux |grep 14995 65532 14995 0.1 0.0 743776 52756 ? Ssl Dec13 4:25 /manager root 36638 0.0 0.0 112716 960 pts/0 S+ 01:31 0:00 grep --color=auto 14995 [root@k8s-4 ~]# 3、Simulate this zombie scenario [root@k8s-4 ~]# kill -STOP 14995 [root@k8s-4 ~]# [root@k8s-4 ~]# [root@k8s-4 ~]# ps aux |grep 14995 65532 14995 0.1 0.0 743776 52756 ? Tsl Dec13 4:25 /manager root 38195 0.0 0.0 112716 960 pts/0 S+ 01:32 0:00 grep --color=auto 14995 [root@k8s-4 ~]#
4、 our recover benchmark is less than 10min 。after 10min this process still Tsl
@519859716 Currently, No liveness probe added to deployment's YAML
Are you interested in collaborating on this?
It's pleasure to involve in our project . we have put it in our development plan. if it work fine ,i will update it in our project . @wenchajun
if fluent-operator process is in zombie status , it can not recover by itself. can not add liveness probe or do something for hearbeat