ceph / ceph-iscsi-config

NOTICE: moved to https://github.com/ceph/ceph-iscsi
GNU General Public License v3.0
22 stars 28 forks source link

rbd-target-gw can't stop #101

Closed hakaka closed 5 years ago

hakaka commented 5 years ago

hello. when i try to restart rbd-target-gw with: sudo systemctl restart rbd-target-gw,i found it can't stop the old service , it just start a new process. And the service process can't stop by kill -9 either. How can i stop the service? thanks.

dillaman commented 5 years ago

This is the first I've heard of this. Is the process stuck in the D state or something similar? Can you run strace against it to see where it is stuck?

mikechristie commented 5 years ago

I have not seen the kill -9 issue, but for the systemctl case, do you have this patch

https://github.com/ceph/ceph-iscsi-config/commit/0532325e71abeb4808baac3a7aa40c93b0f39def

?

Without that patch then rbd-target-gw will wait forever for the kernel/tcmu-runner to shutdown. If tcmu-runner is dead or we hit some issues talking to the ceph cluster while cleaning/closing up images during shutdown then we will wait forever or for a very long time that seems like forever.

With that patch we will still wait for 600 seconds so it could be stuck for a while.

Besides the strace output could you provide the /var/log/messages and /var/log/rbd-target-api/rbd-target-api.log so we can see where it hung?

hakaka commented 5 years ago

I have not seen the kill -9 issue, but for the systemctl case, do you have this patch

0532325

?

Without that patch then rbd-target-gw will wait forever for the kernel/tcmu-runner to shutdown. If tcmu-runner is dead or we hit some issues talking to the ceph cluster while cleaning/closing up images during shutdown then we will wait forever or for a very long time that seems like forever.

With that patch we will still wait for 600 seconds so it could be stuck for a while.

Besides the strace output could you provide the /var/log/messages and /var/log/rbd-target-api/rbd-target-api.log so we can see where it hung?

Oh,I don't have this patch yet, thanks, I will have a try.

hakaka commented 5 years ago

This is the first I've heard of this. Is the process stuck in the D state or something similar? Can you run strace against it to see where it is stuck?

I use: sudo strace /usr/bin/rbd-target-gw > 1.log 2>&1 & and find it was stcuk in reading "/sys/kernel/config/target/iscsi/iqn.2018-07.com.redhat.iscsi-gw:iscsi-igw/tpgt_2/acls" , and I try to ls the file, is also stuck.

mikechristie commented 5 years ago

I have never seen that before.

Can you attach the /var/log/messages, /var/log/rbd-target-api/rbd-target-gw.log (or /var/log/rbd-target-gw.log if that is what your version used) and the strace output?

hakaka commented 5 years ago

tcmu-runner.log messages-20190127.zip Here is my tcmu-runner.log and /var/log/messages.
We restart tcmu-runner at the time: 2019-01-23 21:27:56 And we found that the iscsi gateway 10.142.90.34 can't be logged in, and log file shows the error:

Jan 23 21:30:00 kernel: iSCSI Login timeout on Network Portal 10.142.90.34:3260 Jan 23 21:31:44 NM-ITC-NF8460M3-BSS-034 kernel: INFO: task iscsi_trx:1531888 blocked for more than 120 seconds. Jan 23 21:31:44 NM-ITC-NF8460M3-BSS-034 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. Jan 23 21:31:44 NM-ITC-NF8460M3-BSS-034 kernel: iscsi_trx D ffff9eff09390fd0 0 1531888 2 0x00000080

Then we try to restart rbd-target-gw, and it hungs.

mikechristie commented 5 years ago

Wait, you never said you restarted runner. What kernel are you using? If you are using a kernel without these patches

https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/drivers/target/target_core_user.c?id=892782caf19a97ccc95df51b3bb659ecacff986a

then this is expedited because IO will get stuck, the iscsi target will then wait forever and you will get these hung task messages.

hakaka commented 5 years ago

Wait, you never said you restarted runner. What kernel are you using? If you are using a kernel without these patches

https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/drivers/target/target_core_user.c?id=892782caf19a97ccc95df51b3bb659ecacff986a

then this is expedited because IO will get stuck, the iscsi target will then wait forever and you will get these hung task messages.

Thanks! My os kernel is: 3.10.0-862.6.3.el7.x86_64 which without those patches.