Closed hakaka closed 5 years ago
This is the first I've heard of this. Is the process stuck in the D state or something similar? Can you run strace
against it to see where it is stuck?
I have not seen the kill -9 issue, but for the systemctl case, do you have this patch
https://github.com/ceph/ceph-iscsi-config/commit/0532325e71abeb4808baac3a7aa40c93b0f39def
?
Without that patch then rbd-target-gw will wait forever for the kernel/tcmu-runner to shutdown. If tcmu-runner is dead or we hit some issues talking to the ceph cluster while cleaning/closing up images during shutdown then we will wait forever or for a very long time that seems like forever.
With that patch we will still wait for 600 seconds so it could be stuck for a while.
Besides the strace output could you provide the /var/log/messages and /var/log/rbd-target-api/rbd-target-api.log so we can see where it hung?
I have not seen the kill -9 issue, but for the systemctl case, do you have this patch
?
Without that patch then rbd-target-gw will wait forever for the kernel/tcmu-runner to shutdown. If tcmu-runner is dead or we hit some issues talking to the ceph cluster while cleaning/closing up images during shutdown then we will wait forever or for a very long time that seems like forever.
With that patch we will still wait for 600 seconds so it could be stuck for a while.
Besides the strace output could you provide the /var/log/messages and /var/log/rbd-target-api/rbd-target-api.log so we can see where it hung?
Oh,I don't have this patch yet, thanks, I will have a try.
This is the first I've heard of this. Is the process stuck in the D state or something similar? Can you run
strace
against it to see where it is stuck?
I use: sudo strace /usr/bin/rbd-target-gw > 1.log 2>&1 & and find it was stcuk in reading "/sys/kernel/config/target/iscsi/iqn.2018-07.com.redhat.iscsi-gw:iscsi-igw/tpgt_2/acls" , and I try to ls the file, is also stuck.
I have never seen that before.
Can you attach the /var/log/messages, /var/log/rbd-target-api/rbd-target-gw.log (or /var/log/rbd-target-gw.log if that is what your version used) and the strace output?
tcmu-runner.log
messages-20190127.zip
Here is my tcmu-runner.log and /var/log/messages.
We restart tcmu-runner at the time: 2019-01-23 21:27:56
And we found that the iscsi gateway 10.142.90.34 can't be logged in, and log file shows the error:
Jan 23 21:30:00 kernel: iSCSI Login timeout on Network Portal 10.142.90.34:3260 Jan 23 21:31:44 NM-ITC-NF8460M3-BSS-034 kernel: INFO: task iscsi_trx:1531888 blocked for more than 120 seconds. Jan 23 21:31:44 NM-ITC-NF8460M3-BSS-034 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. Jan 23 21:31:44 NM-ITC-NF8460M3-BSS-034 kernel: iscsi_trx D ffff9eff09390fd0 0 1531888 2 0x00000080
Then we try to restart rbd-target-gw, and it hungs.
Wait, you never said you restarted runner. What kernel are you using? If you are using a kernel without these patches
then this is expedited because IO will get stuck, the iscsi target will then wait forever and you will get these hung task messages.
Wait, you never said you restarted runner. What kernel are you using? If you are using a kernel without these patches
then this is expedited because IO will get stuck, the iscsi target will then wait forever and you will get these hung task messages.
Thanks! My os kernel is: 3.10.0-862.6.3.el7.x86_64 which without those patches.
hello. when i try to restart rbd-target-gw with: sudo systemctl restart rbd-target-gw,i found it can't stop the old service , it just start a new process. And the service process can't stop by kill -9 either. How can i stop the service? thanks.