Open Jazz9 opened 6 years ago
A little bit more info... Running - tcmu-runner-1.3.0-89.g7d0023e.el7.x86_64
When the lockup occurs you can't run systemctl restart tcmu-runner as it never returns. I believe that is what is preventing reboot also.
Also noted that I am running the Centos 7.5 standard install of targetcli. Its version is slightly lower then the requirements on the ceph master docs. Do you think this is the issue and do you know where a pre compiled newer version may be found?
targetcli.noarch 2.1.fb46-4.el7_5
You are using the RHEL/Centos 7.5 kernel right?
It looks like you are hitting a bug in that kernel. It is missing this patch:
We are working on getting a fix released. It is merged in our internal repos and being QA'd right now. It should be released in about a month.
I will update this issue with the updated kernel version.
Ok, that sounds good. If you want me to test it - I can. It seems that both iscsi nodes crash every few days.
and yes - Centos 7.5 with latest updates.
Did this get merged?
It is in kernel-3.10.0-862.5 which should be released soon.
I do not know how long it takes to go from a RHEL z stream to Centos kernel though.
Seems the latest Centos patches helped. No more crashes.
Hello,
We have the same error with a 4.18.9-1 kernel with a single gateway (without multipathing) and ceph 13.2.0 . Has this error been identified? Is there a patch?
Sep 25 07:19:00 src-ceph-gwiscsi1 tcmu-runner: 2018-09-25 07:19:00.359 1204 [DEBUG] alua_implicit_transition:568 rbd/rbd.VMW_CEPH_DS08: lock state 1 Sep 25 07:19:09 src-ceph-gwiscsi1 kernel: ABORT_TASK: Found referenced iSCSI task_tag: 71303168 Sep 25 07:20:01 src-ceph-gwiscsi1 systemd: Started Session 190 of user root. Sep 25 07:20:01 src-ceph-gwiscsi1 systemd: Starting Session 190 of user root. Sep 25 07:20:27 src-ceph-gwiscsi1 kernel: iSCSI Login timeout on Network Portal 10.61.0.45:3260 Sep 25 07:20:27 src-ceph-gwiscsi1 kernel: iSCSI Login negotiation failed. Sep 25 07:20:43 src-ceph-gwiscsi1 kernel: iSCSI Login timeout on Network Portal 10.61.0.45:3260 Sep 25 07:20:43 src-ceph-gwiscsi1 kernel: iSCSI Login negotiation failed. Sep 25 07:20:58 src-ceph-gwiscsi1 kernel: iSCSI Login timeout on Network Portal 10.61.0.45:3260 Sep 25 07:20:58 src-ceph-gwiscsi1 kernel: iSCSI Login negotiation failed. Sep 25 07:21:14 src-ceph-gwiscsi1 kernel: iSCSI Login timeout on Network Portal 10.61.0.45:3260 Sep 25 07:21:14 src-ceph-gwiscsi1 kernel: iSCSI Login negotiation failed. Sep 25 07:21:29 src-ceph-gwiscsi1 kernel: iSCSI Login timeout on Network Portal 10.61.0.45:3260 Sep 25 07:21:29 src-ceph-gwiscsi1 kernel: iSCSI Login negotiation failed. Sep 25 07:21:44 src-ceph-gwiscsi1 kernel: iSCSI Login timeout on Network Portal 10.61.0.45:3260 Sep 25 07:21:44 src-ceph-gwiscsi1 kernel: iSCSI Login negotiation failed. Sep 25 07:21:50 src-ceph-gwiscsi1 kernel: INFO: task kworker/u256:2:95199 blocked for more than 120 seconds. Sep 25 07:21:50 src-ceph-gwiscsi1 kernel: Not tainted 4.18.9-1.el7.elrepo.x86_64 #1 Sep 25 07:21:50 src-ceph-gwiscsi1 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. Sep 25 07:21:50 src-ceph-gwiscsi1 kernel: kworker/u256:2 D 0 95199 2 0x80000080 Sep 25 07:21:50 src-ceph-gwiscsi1 kernel: Workqueue: tmr-user target_tmr_work [target_core_mod] Sep 25 07:21:50 src-ceph-gwiscsi1 kernel: Call Trace: Sep 25 07:21:50 src-ceph-gwiscsi1 kernel: schedule+0x2ab/0x880 Sep 25 07:21:50 src-ceph-gwiscsi1 kernel: ? kvasprintf+0x7f/0xd0 Sep 25 07:21:50 src-ceph-gwiscsi1 kernel: schedule+0x36/0x80 Sep 25 07:21:50 src-ceph-gwiscsi1 kernel: schedule_timeout+0x161/0x300 Sep 25 07:21:50 src-ceph-gwiscsi1 kernel: ? __next_timer_interrupt+0xe0/0xe0 Sep 25 07:21:50 src-ceph-gwiscsi1 kernel: wait_for_completion_timeout+0x130/0x1a0 Sep 25 07:21:50 src-ceph-gwiscsi1 kernel: ? wake_up_q+0x80/0x80 Sep 25 07:21:50 src-ceph-gwiscsi1 kernel: transport_wait_for_tasks+0xdd/0x140 [target_core_mod] Sep 25 07:21:50 src-ceph-gwiscsi1 kernel: transport_wait_for_tasks+0x53/0x90 [target_core_mod] Sep 25 07:21:50 src-ceph-gwiscsi1 kernel: core_tmr_abort_task+0xfb/0x180 [target_core_mod] Sep 25 07:21:50 src-ceph-gwiscsi1 kernel: target_tmr_work+0x128/0x140 [target_core_mod] Sep 25 07:21:50 src-ceph-gwiscsi1 kernel: process_one_work+0x179/0x390 Sep 25 07:21:50 src-ceph-gwiscsi1 kernel: worker_thread+0x4f/0x3e0 Sep 25 07:21:50 src-ceph-gwiscsi1 kernel: kthread+0x105/0x140 Sep 25 07:21:50 src-ceph-gwiscsi1 kernel: ? max_active_store+0x80/0x80 Sep 25 07:21:50 src-ceph-gwiscsi1 kernel: ? kthread_bind+0x20/0x20 Sep 25 07:21:50 src-ceph-gwiscsi1 kernel: ret_from_fork+0x35/0x40 Sep 25 07:22:00 src-ceph-gwiscsi1 kernel: iSCSI Login timeout on Network Portal 10.61.0.45:3260 Sep 25 07:22:00 src-ceph-gwiscsi1 kernel: iSCSI Login negotiation failed. Sep 25 07:22:15 src-ceph-gwiscsi1 kernel: iSCSI Login timeout on Network Portal 10.61.0.45:3260 Sep 25 07:22:15 src-ceph-gwiscsi1 kernel: iSCSI Login negotiation failed. Sep 25 07:22:30 src-ceph-gwiscsi1 kernel: iSCSI Login timeout on Network Portal 10.61.0.45:3260 Sep 25 07:22:30 src-ceph-gwiscsi1 kernel: iSCSI Login negotiation failed. Sep 25 07:22:46 src-ceph-gwiscsi1 kernel: iSCSI Login timeout on Network Portal 10.61.0.45:3260 Sep 25 07:22:46 src-ceph-gwiscsi1 kernel: iSCSI Login negotiation failed. Sep 25 07:23:01 src-ceph-gwiscsi1 kernel: iSCSI Login timeout on Network Portal 10.61.0.45:3260 Sep 25 07:23:01 src-ceph-gwiscsi1 kernel: iSCSI Login negotiation failed. Sep 25 07:23:16 src-ceph-gwiscsi1 kernel: iSCSI Login timeout on Network Portal 10.61.0.45:3260 Sep 25 07:23:16 src-ceph-gwiscsi1 kernel: iSCSI Login negotiation failed. Sep 25 07:23:32 src-ceph-gwiscsi1 kernel: iSCSI Login timeout on Network Portal 10.61.0.45:3260 Sep 25 07:23:32 src-ceph-gwiscsi1 kernel: iSCSI Login negotiation failed. Sep 25 07:23:47 src-ceph-gwiscsi1 kernel: iSCSI Login timeout on Network Portal 10.61.0.45:3260 Sep 25 07:23:47 src-ceph-gwiscsi1 kernel: iSCSI Login negotiation failed. Sep 25 07:23:53 src-ceph-gwiscsi1 kernel: INFO: task kworker/0:0:111368 blocked for more than 120 seconds.
@VRAseb
You are hitting a similar issue in that a command is taking a long time which leads to the initiator timing it out and sending an abort.
You not hitting the same issue that was being discussed in this GH issue though. The other user was hitting a issue that only occurred in the RHEL kernel because it was missing a patch. Because of this even when the command finally unjammed it self and completed the kernel hit a bug and would never unjam.
In your case from the logs above it looks like the command just never completed. You would want to look in the ceph logs to check if something happened to the cluster or if tcmu-runner crashed so it could not complete the command.
Hello, the incident we encounter blocks the service rbd-target-gw. We can not restart the rbd-target-gw service. We can not restart OS, the kernel is blocked. We have no alerts in CEPH journals (HEALTH_OK)
We tried with the tcmu-runner 1.4.0. We have the same incident.
Can you help us?
The rbd-target-gw service and kernel are blocked on the stuck command.
Is tcmu-runner running (just do a 'ps -u root | grep runner')?
Hello, tcmu-runner does not run after crash.
tcmu-runner does not run after crash.
Do you mean when you see the kernel message about the hung_task_timeout_sec, tcmu-runner is not running still? Has it crashed? If so, could you get me the core dump if there is one or get the /var/log/messages around the time tcmu-runner crashes so we can see if there is a stack dump from that? Could you also get me the /var/log/tcmu-runner.log for around the same time?
Hello,
messages : Oct 4 05:45:01 src-ceph-gwiscsi1 systemd: Started Session 283 of user root. Oct 4 05:45:01 src-ceph-gwiscsi1 systemd: Starting Session 283 of user root. Oct 4 05:49:25 src-ceph-gwiscsi1 kernel: ABORT_TASK: Found referenced iSCSI task_tag: 97406060 Oct 4 05:49:35 src-ceph-gwiscsi1 kernel: ABORT_TASK: Found referenced iSCSI task_tag: 97406062 Oct 4 05:50:01 src-ceph-gwiscsi1 systemd: Started Session 284 of user root. Oct 4 05:50:01 src-ceph-gwiscsi1 systemd: Starting Session 284 of user root. Oct 4 05:51:02 src-ceph-gwiscsi1 kernel: iSCSI Login timeout on Network Portal 10.61.0.45:3260 Oct 4 05:51:02 src-ceph-gwiscsi1 kernel: tx_data returned -32, expecting 48. Oct 4 05:51:02 src-ceph-gwiscsi1 kernel: iSCSI Login negotiation failed. Oct 4 05:51:17 src-ceph-gwiscsi1 kernel: iSCSI Login timeout on Network Portal 10.61.0.45:3260 Oct 4 05:51:17 src-ceph-gwiscsi1 kernel: tx_data returned -32, expecting 48. Oct 4 05:51:17 src-ceph-gwiscsi1 kernel: iSCSI Login negotiation failed. Oct 4 05:51:33 src-ceph-gwiscsi1 kernel: iSCSI Login timeout on Network Portal 10.61.0.45:3260 Oct 4 05:51:33 src-ceph-gwiscsi1 kernel: tx_data returned -32, expecting 48. Oct 4 05:51:33 src-ceph-gwiscsi1 kernel: iSCSI Login negotiation failed. Oct 4 05:51:48 src-ceph-gwiscsi1 kernel: iSCSI Login timeout on Network Portal 10.61.0.45:3260 Oct 4 05:51:48 src-ceph-gwiscsi1 kernel: tx_data returned -32, expecting 48. Oct 4 05:51:48 src-ceph-gwiscsi1 kernel: iSCSI Login negotiation failed. Oct 4 05:52:03 src-ceph-gwiscsi1 kernel: iSCSI Login timeout on Network Portal 10.61.0.45:3260 Oct 4 05:52:03 src-ceph-gwiscsi1 kernel: tx_data returned -32, expecting 48. Oct 4 05:52:03 src-ceph-gwiscsi1 kernel: iSCSI Login negotiation failed. Oct 4 05:52:19 src-ceph-gwiscsi1 kernel: iSCSI Login timeout on Network Portal 10.61.0.45:3260 Oct 4 05:52:19 src-ceph-gwiscsi1 kernel: tx_data returned -32, expecting 48. Oct 4 05:52:19 src-ceph-gwiscsi1 kernel: iSCSI Login negotiation failed. Oct 4 05:52:34 src-ceph-gwiscsi1 kernel: iSCSI Login timeout on Network Portal 10.61.0.45:3260 Oct 4 05:52:34 src-ceph-gwiscsi1 kernel: tx_data returned -32, expecting 48. Oct 4 05:52:34 src-ceph-gwiscsi1 kernel: iSCSI Login negotiation failed. Oct 4 05:52:50 src-ceph-gwiscsi1 kernel: iSCSI Login timeout on Network Portal 10.61.0.45:3260 Oct 4 05:52:50 src-ceph-gwiscsi1 kernel: INFO: task iscsi_trx:2513 blocked for more than 120 seconds. Oct 4 05:52:50 src-ceph-gwiscsi1 kernel: Not tainted 4.18.9-1.el7.elrepo.x86_64 #1 Oct 4 05:52:50 src-ceph-gwiscsi1 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. Oct 4 05:52:50 src-ceph-gwiscsi1 kernel: iscsi_trx D 0 2513 2 0x80000080 Oct 4 05:52:50 src-ceph-gwiscsi1 kernel: Call Trace: Oct 4 05:52:50 src-ceph-gwiscsi1 kernel: schedule+0x2ab/0x880 Oct 4 05:52:50 src-ceph-gwiscsi1 kernel: ? kvasprintf+0x7f/0xd0 Oct 4 05:52:50 src-ceph-gwiscsi1 kernel: schedule+0x36/0x80 Oct 4 05:52:50 src-ceph-gwiscsi1 kernel: schedule_timeout+0x161/0x300 Oct 4 05:52:50 src-ceph-gwiscsi1 kernel: ? __next_timer_interrupt+0xe0/0xe0 Oct 4 05:52:50 src-ceph-gwiscsi1 kernel: wait_for_completion_timeout+0x130/0x1a0 Oct 4 05:52:50 src-ceph-gwiscsi1 kernel: ? wake_up_q+0x80/0x80 Oct 4 05:52:50 src-ceph-gwiscsi1 kernel: transport_wait_for_tasks+0xdd/0x140 [target_core_mod] Oct 4 05:52:50 src-ceph-gwiscsi1 kernel: target_wait_free_cmd+0x56/0x90 [target_core_mod] Oct 4 05:52:50 src-ceph-gwiscsi1 kernel: transport_generic_free_cmd+0x103/0x140 [target_core_mod] Oct 4 05:52:50 src-ceph-gwiscsi1 kernel: iscsit_free_cmd+0x43/0x90 [iscsi_target_mod] Oct 4 05:52:50 src-ceph-gwiscsi1 kernel: iscsit_close_connection+0x586/0x8f0 [iscsi_target_mod] Oct 4 05:52:50 src-ceph-gwiscsi1 kernel: iscsit_take_action_for_connection_exit+0x83/0x110 [iscsi_target_mod] Oct 4 05:52:50 src-ceph-gwiscsi1 kernel: iscsi_target_rx_thread+0xe2/0xf0 [iscsi_target_mod] Oct 4 05:52:50 src-ceph-gwiscsi1 kernel: kthread+0x105/0x140 Oct 4 05:52:50 src-ceph-gwiscsi1 kernel: ? iscsi_target_tx_thread+0x1f0/0x1f0 [iscsi_target_mod] Oct 4 05:52:50 src-ceph-gwiscsi1 kernel: ? kthread_bind+0x20/0x20 Oct 4 05:52:50 src-ceph-gwiscsi1 kernel: ret_from_fork+0x35/0x40 Oct 4 05:52:50 src-ceph-gwiscsi1 kernel: INFO: task kworker/0:0:164935 blocked for more than 120 seconds. Oct 4 05:52:50 src-ceph-gwiscsi1 kernel: Not tainted 4.18.9-1.el7.elrepo.x86_64 #1 Oct 4 05:52:50 src-ceph-gwiscsi1 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. Oct 4 05:52:50 src-ceph-gwiscsi1 kernel: kworker/0:0 D 0 164935 2 0x80000080 Oct 4 05:52:50 src-ceph-gwiscsi1 kernel: Workqueue: events iscsi_target_do_login_rx [iscsi_target_mod] Oct 4 05:52:50 src-ceph-gwiscsi1 kernel: Call Trace: Oct 4 05:52:50 src-ceph-gwiscsi1 kernel: __schedule+0x2ab/0x880 Oct 4 05:52:50 src-ceph-gwiscsi1 kernel: schedule+0x36/0x80 Oct 4 05:52:50 src-ceph-gwiscsi1 kernel: schedule_timeout+0x1dc/0x300 Oct 4 05:52:50 src-ceph-gwiscsi1 kernel: wait_for_completion+0x121/0x180 Oct 4 05:52:50 src-ceph-gwiscsi1 kernel: ? wake_up_q+0x80/0x80 Oct 4 05:52:50 src-ceph-gwiscsi1 kernel: iscsit_stop_session+0x1b1/0x1c0 [iscsi_target_mod] Oct 4 05:52:50 src-ceph-gwiscsi1 kernel: iscsi_check_for_session_reinstatement+0x1ed/0x280 [iscsi_target_mod] Oct 4 05:52:50 src-ceph-gwiscsi1 kernel: iscsi_target_check_for_existing_instances+0x30/0x40 [iscsi_target_mod] Oct 4 05:52:50 src-ceph-gwiscsi1 kernel: iscsi_target_do_login+0x31d/0x5f0 [iscsi_target_mod] Oct 4 05:52:50 src-ceph-gwiscsi1 kernel: iscsi_target_do_login_rx+0x26f/0x340 [iscsi_target_mod] Oct 4 05:52:50 src-ceph-gwiscsi1 kernel: ? iscsi_target_check_login_request+0x1a0/0x1a0 [iscsi_target_mod] Oct 4 05:52:50 src-ceph-gwiscsi1 kernel: process_one_work+0x179/0x390 Oct 4 05:52:50 src-ceph-gwiscsi1 kernel: worker_thread+0x4f/0x3e0 Oct 4 05:52:50 src-ceph-gwiscsi1 kernel: kthread+0x105/0x140 Oct 4 05:52:50 src-ceph-gwiscsi1 kernel: ? max_active_store+0x80/0x80 Oct 4 05:52:50 src-ceph-gwiscsi1 kernel: ? kthread_bind+0x20/0x20 Oct 4 05:52:50 src-ceph-gwiscsi1 kernel: ret_from_fork+0x35/0x40 Oct 4 05:52:50 src-ceph-gwiscsi1 kernel: tx_data returned -32, expecting 48. Oct 4 05:52:50 src-ceph-gwiscsi1 kernel: iSCSI Login negotiation failed.
The tcmu log stops at the time of the crash.
I think we are talking about different things. The log snippet you posted above is not a crash. It is a warning indicating that a command is stuck.
I am asking if when you see this is tcmu-runner running? If you run
systemctl status tcmu-runner
does it show it is running?
If it is then the command is could be stuck in the ceph cluster somewhere. You then should check the /var/log/ceph logs on the OSDs to see if there are any errors in there.
Also what tools have you used to set this up? Did you use the ceph-iscsi tools or did you run targetlci manually?
Oh yeah, if you are setting it up yourself with targetcli set the osd_op_timeout to something like 15 seconds. This will force most commands to fail if they have not completed within that timeout. We can then verify that they are getting sent to the OSD and getting stuck there.
Hi @mikechristie, I investigated this case and this is what I can tell :
From what I see, I agree with your first idea that tcmu crashed and never complete the command.
Could you give instruction to get the core dump of tcmu when the problem happen ? What can we provide to help ?
Hello,
Centos 7.5 VMware 6.0 U3 Ceph 12.2.5 with all SSD bluestore pool. ( only 20 OSD in SSD pool ) Active/standby iscsi
I have started using ceph-iscsi-cli in a small - non important production environment and last night both iscsi nodes crashed. A reboot fixed them ( they wouldn't reboot without a hard reset. The OS seemed fine but a hung task is my guess )
I am not 100% sure where to look for the logs that relate to the issue. I will put more monitoring on the system.
May 25 15:54:51 SVR-AUBUN-SWD-ISCSI2 kernel: ABORT_TASK: Found referenced iSCSI task_tag: 45 May 25 15:55:11 SVR-AUBUN-SWD-ISCSI2 tcmu-runner: 2018-05-25 15:55:11.794 856 [ERROR] tcmu_notify_conn_lost:187 rbd/AUBUN-VMW-CLUSTER01-SSD.AUBUN-VMW-Cluster01: Handler connection lost (lock state 0) May 25 15:55:11 SVR-AUBUN-SWD-ISCSI2 tcmu-runner: tcmu_notify_conn_lost:187 rbd/AUBUN-VMW-CLUSTER01-SSD.AUBUN-VMW-Cluster01: Handler connection lost (lock state 0) May 25 15:55:11 SVR-AUBUN-SWD-ISCSI2 kernel: ABORT_TASK: Sending TMR_FUNCTION_COMPLETE for ref_tag: 45 May 25 15:55:11 SVR-AUBUN-SWD-ISCSI2 kernel: ABORT_TASK: Sending TMR_TASK_DOES_NOT_EXIST for ref_tag: 45 May 25 15:55:11 SVR-AUBUN-SWD-ISCSI2 tcmu-runner: 2018-05-25 15:55:11.798 856 [INFO] tgt_port_grp_recovery_thread_fn:253: Disabled iscsi/iqn.2003-01.com.redhat.iscsi-gw:iscsi-igw/tpgt_2. May 25 15:55:11 SVR-AUBUN-SWD-ISCSI2 tcmu-runner: tgt_port_grp_recovery_thread_fn:253: Disabled iscsi/iqn.2003-01.com.redhat.iscsi-gw:iscsi-igw/tpgt_2. May 25 15:55:14 SVR-AUBUN-SWD-ISCSI2 kernel: Unable to locate Target Portal Group on iqn.2003-01.com.redhat.iscsi-gw:iscsi-igw May 25 15:55:14 SVR-AUBUN-SWD-ISCSI2 kernel: iSCSI Login negotiation failed. May 25 15:55:14 SVR-AUBUN-SWD-ISCSI2 kernel: Unable to locate Target Portal Group on iqn.2003-01.com.redhat.iscsi-gw:iscsi-igw May 25 15:55:14 SVR-AUBUN-SWD-ISCSI2 kernel: iSCSI Login negotiation failed. May 25 15:55:15 SVR-AUBUN-SWD-ISCSI2 kernel: Unable to locate Target Portal Group on iqn.2003-01.com.redhat.iscsi-gw:iscsi-igw May 25 15:55:15 SVR-AUBUN-SWD-ISCSI2 kernel: iSCSI Login negotiation failed. May 25 15:55:17 SVR-AUBUN-SWD-ISCSI2 kernel: Unable to locate Target Portal Group on iqn.2003-01.com.redhat.iscsi-gw:iscsi-igw May 25 15:55:17 SVR-AUBUN-SWD-ISCSI2 kernel: iSCSI Login negotiation failed. May 25 15:55:17 SVR-AUBUN-SWD-ISCSI2 kernel: Unable to locate Target Portal Group on iqn.2003-01.com.redhat.iscsi-gw:iscsi-igw May 25 15:55:17 SVR-AUBUN-SWD-ISCSI2 kernel: iSCSI Login negotiation failed. May 25 15:55:18 SVR-AUBUN-SWD-ISCSI2 kernel: Unable to locate Target Portal Group on iqn.2003-01.com.redhat.iscsi-gw:iscsi-igw May 25 15:55:18 SVR-AUBUN-SWD-ISCSI2 kernel: iSCSI Login negotiation failed. May 25 15:55:20 SVR-AUBUN-SWD-ISCSI2 kernel: Unable to locate Target Portal Group on iqn.2003-01.com.redhat.iscsi-gw:iscsi-igw May 25 15:55:20 SVR-AUBUN-SWD-ISCSI2 kernel: iSCSI Login negotiation failed. May 25 15:55:20 SVR-AUBUN-SWD-ISCSI2 kernel: Unable to locate Target Portal Group on iqn.2003-01.com.redhat.iscsi-gw:iscsi-igw May 25 15:55:20 SVR-AUBUN-SWD-ISCSI2 kernel: iSCSI Login negotiation failed. May 25 15:55:21 SVR-AUBUN-SWD-ISCSI2 kernel: Unable to locate Target Portal Group on iqn.2003-01.com.redhat.iscsi-gw:iscsi-igw May 25 15:55:21 SVR-AUBUN-SWD-ISCSI2 kernel: iSCSI Login negotiation failed. May 25 15:55:23 SVR-AUBUN-SWD-ISCSI2 kernel: Unable to locate Target Portal Group on iqn.2003-01.com.redhat.iscsi-gw:iscsi-igw May 25 15:55:23 SVR-AUBUN-SWD-ISCSI2 kernel: iSCSI Login negotiation failed. May 25 15:55:23 SVR-AUBUN-SWD-ISCSI2 kernel: Unable to locate Target Portal Group on iqn.2003-01.com.redhat.iscsi-gw:iscsi-igw May 25 15:55:23 SVR-AUBUN-SWD-ISCSI2 kernel: iSCSI Login negotiation failed. May 25 15:55:24 SVR-AUBUN-SWD-ISCSI2 kernel: Unable to locate Target Portal Group on iqn.2003-01.com.redhat.iscsi-gw:iscsi-igw May 25 15:55:24 SVR-AUBUN-SWD-ISCSI2 kernel: iSCSI Login negotiation failed. May 25 15:55:26 SVR-AUBUN-SWD-ISCSI2 kernel: Unable to locate Target Portal Group on iqn.2003-01.com.redhat.iscsi-gw:iscsi-igw May 25 15:55:26 SVR-AUBUN-SWD-ISCSI2 kernel: iSCSI Login negotiation failed. May 25 15:55:26 SVR-AUBUN-SWD-ISCSI2 kernel: Unable to locate Target Portal Group on iqn.2003-01.com.redhat.iscsi-gw:iscsi-igw May 25 15:55:26 SVR-AUBUN-SWD-ISCSI2 kernel: iSCSI Login negotiation failed. May 25 15:55:27 SVR-AUBUN-SWD-ISCSI2 kernel: Unable to locate Target Portal Group on iqn.2003-01.com.redhat.iscsi-gw:iscsi-igw May 25 15:55:27 SVR-AUBUN-SWD-ISCSI2 kernel: iSCSI Login negotiation failed. May 25 15:55:29 SVR-AUBUN-SWD-ISCSI2 kernel: Unable to locate Target Portal Group on iqn.2003-01.com.redhat.iscsi-gw:iscsi-igw
May 25 16:27:47 SVR-AUBUN-SWD-ISCSI2 kernel: Unable to locate Target Portal Group on iqn.2003-01.com.redhat.iscsi-gw:iscsi-igw May 25 16:27:47 SVR-AUBUN-SWD-ISCSI2 kernel: iSCSI Login negotiation failed. May 25 16:27:49 SVR-AUBUN-SWD-ISCSI2 kernel: Unable to locate Target Portal Group on iqn.2003-01.com.redhat.iscsi-gw:iscsi-igw May 25 16:27:49 SVR-AUBUN-SWD-ISCSI2 kernel: iSCSI Login negotiation failed. May 25 16:27:50 SVR-AUBUN-SWD-ISCSI2 kernel: INFO: task fn-radosclient:1555 blocked for more than 120 seconds. May 25 16:27:50 SVR-AUBUN-SWD-ISCSI2 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. May 25 16:27:50 SVR-AUBUN-SWD-ISCSI2 kernel: fn-radosclient D ffff9df8b2218fd0 0 1555 1 0x00000080 May 25 16:27:50 SVR-AUBUN-SWD-ISCSI2 kernel: Call Trace: May 25 16:27:50 SVR-AUBUN-SWD-ISCSI2 kernel: [] schedule+0x29/0x70
May 25 16:27:50 SVR-AUBUN-SWD-ISCSI2 kernel: [] schedule_timeout+0x239/0x2c0
May 25 16:27:50 SVR-AUBUN-SWD-ISCSI2 kernel: [] ? __send_signal+0x18e/0x490
May 25 16:27:50 SVR-AUBUN-SWD-ISCSI2 kernel: [] wait_for_completion+0xfd/0x140
May 25 16:27:50 SVR-AUBUN-SWD-ISCSI2 kernel: [] ? wake_up_state+0x20/0x20
May 25 16:27:50 SVR-AUBUN-SWD-ISCSI2 kernel: [] iscsit_cause_connection_reinstatement+0x9e/0x100 [iscsi_target_mod]
May 25 16:27:50 SVR-AUBUN-SWD-ISCSI2 kernel: [] iscsit_free_session+0x109/0x180 [iscsi_target_mod]
May 25 16:27:50 SVR-AUBUN-SWD-ISCSI2 kernel: [] iscsit_release_sessions_for_tpg+0x123/0x1e0 [iscsi_target_mod]
May 25 16:27:50 SVR-AUBUN-SWD-ISCSI2 kernel: [] iscsit_tpg_disable_portal_group+0xcf/0x1e0 [iscsi_target_mod]
May 25 16:27:50 SVR-AUBUN-SWD-ISCSI2 kernel: [] lio_target_tpg_enable_store+0x6e/0xf0 [iscsi_target_mod]
May 25 16:27:50 SVR-AUBUN-SWD-ISCSI2 kernel: [] configfs_write_file+0x107/0x140
May 25 16:27:50 SVR-AUBUN-SWD-ISCSI2 kernel: [] vfs_write+0xc0/0x1f0
May 25 16:27:50 SVR-AUBUN-SWD-ISCSI2 kernel: [] ? system_call_after_swapgs+0xc8/0x160
May 25 16:27:50 SVR-AUBUN-SWD-ISCSI2 kernel: [] SyS_write+0x7f/0xf0
May 25 16:27:50 SVR-AUBUN-SWD-ISCSI2 kernel: [] ? system_call_after_swapgs+0xc8/0x160
May 25 16:27:50 SVR-AUBUN-SWD-ISCSI2 kernel: [] system_call_fastpath+0x1c/0x21
May 25 16:27:50 SVR-AUBUN-SWD-ISCSI2 kernel: [] ? system_call_after_swapgs+0xc8/0x160
May 25 16:27:50 SVR-AUBUN-SWD-ISCSI2 kernel: INFO: task iscsi_ttx:1528 blocked for more than 120 seconds.
May 25 16:27:50 SVR-AUBUN-SWD-ISCSI2 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
May 25 16:27:50 SVR-AUBUN-SWD-ISCSI2 kernel: iscsi_ttx D ffff9df7b194dee0 0 1528 2 0x00000084
May 25 16:27:50 SVR-AUBUN-SWD-ISCSI2 kernel: Call Trace:
May 25 16:27:50 SVR-AUBUN-SWD-ISCSI2 kernel: [] schedule+0x29/0x70
May 25 16:27:50 SVR-AUBUN-SWD-ISCSI2 kernel: [] schedule_timeout+0x239/0x2c0
May 25 16:27:50 SVR-AUBUN-SWD-ISCSI2 kernel: [] ? list_del+0xd/0x30
May 25 16:27:50 SVR-AUBUN-SWD-ISCSI2 kernel: [] wait_for_completion+0xfd/0x140
May 25 16:27:50 SVR-AUBUN-SWD-ISCSI2 kernel: [] ? wake_up_state+0x20/0x20
May 25 16:27:50 SVR-AUBUN-SWD-ISCSI2 kernel: [] transport_generic_free_cmd+0xa2/0x150 [target_core_mod]
May 25 16:27:50 SVR-AUBUN-SWD-ISCSI2 kernel: [] iscsit_free_cmd+0x82/0x140 [iscsi_target_mod]
May 25 16:27:50 SVR-AUBUN-SWD-ISCSI2 kernel: [] iscsit_close_connection+0x4c6/0x8a0 [iscsi_target_mod]
May 25 16:27:50 SVR-AUBUN-SWD-ISCSI2 kernel: [] iscsit_take_action_for_connection_exit+0x8b/0x120 [iscsi_target_mod]
May 25 16:27:50 SVR-AUBUN-SWD-ISCSI2 kernel: [] iscsi_target_tx_thread+0x1f2/0x240 [iscsi_target_mod]
May 25 16:27:50 SVR-AUBUN-SWD-ISCSI2 kernel: [] ? wake_up_atomic_t+0x30/0x30
May 25 16:27:50 SVR-AUBUN-SWD-ISCSI2 kernel: [] ? iscsit_thread_get_cpumask+0xf0/0xf0 [iscsi_target_mod]
May 25 16:27:50 SVR-AUBUN-SWD-ISCSI2 kernel: [] kthread+0xd1/0xe0
May 25 16:27:50 SVR-AUBUN-SWD-ISCSI2 kernel: [] ? insert_kthread_work+0x40/0x40
May 25 16:27:50 SVR-AUBUN-SWD-ISCSI2 kernel: [] ret_from_fork_nospec_begin+0x21/0x21
May 25 16:27:50 SVR-AUBUN-SWD-ISCSI2 kernel: [] ? insert_kthread_work+0x40/0x40
I think these may be unrelated but interesting.
May 25 14:47:15 SVR-AUBUN-SWD-ISCSI1 tcmu-runner: 2018-05-25 14:47:15.539 852 [WARN] tcmu_block_device:395 rbd/AUBUN-VMW-CLUSTER01-SSD.AUBUN-VMW-Cluster01: Kernel does not support the block_dev action. May 25 14:47:15 SVR-AUBUN-SWD-ISCSI1 tcmu-runner: tcmu_block_device:395 rbd/AUBUN-VMW-CLUSTER01-SSD.AUBUN-VMW-Cluster01: Kernel does not support the block_dev action. May 25 14:47:15 SVR-AUBUN-SWD-ISCSI1 tcmu-runner: 2018-05-25 14:47:15.555 852 [WARN] tcmu_rbd_lock:735 rbd/AUBUN-VMW-CLUSTER01-SSD.AUBUN-VMW-Cluster01: Acquired exclusive lock. May 25 14:47:15 SVR-AUBUN-SWD-ISCSI1 tcmu-runner: tcmu_rbd_lock:735 rbd/AUBUN-VMW-CLUSTER01-SSD.AUBUN-VMW-Cluster01: Acquired exclusive lock. May 25 14:47:15 SVR-AUBUN-SWD-ISCSI1 tcmu-runner: tcmu_unblock_device:418 rbd/AUBUN-VMW-CLUSTER01-SSD.AUBUN-VMW-Cluster01: Kernel does not support the block_dev action. May 25 14:47:15 SVR-AUBUN-SWD-ISCSI1 tcmu-runner: 2018-05-25 14:47:15.563 852 [WARN] tcmu_unblock_device:418 rbd/AUBUN-VMW-CLUSTER01-SSD.AUBUN-VMW-Cluster01: Kernel does not support the block_dev action.
May 25 17:06:22 SVR-AUBUN-SWD-ISCSI1 tcmu-runner: 2018-05-25 17:06:22.619 852 [WARN] tcmur_cmdproc_thread:635 rbd/AUBUN-VMW-CLUSTER01-SSD.AUBUN-VMW-Cluster01: Command 0x85 not supported May 25 17:06:22 SVR-AUBUN-SWD-ISCSI1 tcmu-runner: tcmur_cmdproc_thread:635 rbd/AUBUN-VMW-CLUSTER01-SSD.AUBUN-VMW-Cluster01: Command 0x85 not supported May 25 17:16:22 SVR-AUBUN-SWD-ISCSI1 tcmu-runner: 2018-05-25 17:16:22.546 852 [WARN] tcmur_cmdproc_thread:635 rbd/AUBUN-VMW-CLUSTER01-SSD.AUBUN-VMW-Cluster01: Command 0x85 not supported May 25 17:16:22 SVR-AUBUN-SWD-ISCSI1 tcmu-runner: tcmur_cmdproc_thread:635 rbd/AUBUN-VMW-CLUSTER01-SSD.AUBUN-VMW-Cluster01: Command 0x85 not supported May 25 17:30:57 SVR-AUBUN-SWD-ISCSI1 tcmu-runner: 2018-05-25 17:30:57.074 852 [WARN] tcmur_cmdproc_thread:635 rbd/AUBUN-VMW-CLUSTER01-SSD.AUBUN-VMW-Cluster01: Command 0x85 not supported May 25 17:30:57 SVR-AUBUN-SWD-ISCSI1 tcmu-runner: tcmur_cmdproc_thread:635 rbd/AUBUN-VMW-CLUSTER01-SSD.AUBUN-VMW-Cluster01: Command 0x85 not supported May 25 17:36:22 SVR-AUBUN-SWD-ISCSI1 tcmu-runner: 2018-05-25 17:36:22.638 852 [WARN] tcmur_cmdproc_thread:635 rbd/AUBUN-VMW-CLUSTER01-SSD.AUBUN-VMW-Cluster01: Command 0x85 not supported May 25 17:36:22 SVR-AUBUN-SWD-ISCSI1 tcmu-runner: tcmur_cmdproc_thread:635 rbd/AUBUN-VMW-CLUSTER01-SSD.AUBUN-VMW-Cluster01: Command 0x85 not supported May 25 17:46:22 SVR-AUBUN-SWD-ISCSI1 tcmu-runner: 2018-05-25 17:46:22.545 852 [WARN] tcmur_cmdproc_thread:635 rbd/AUBUN-VMW-CLUSTER01-SSD.AUBUN-VMW-Cluster01: Command 0x85 not supported May 25 17:46:22 SVR-AUBUN-SWD-ISCSI1 tcmu-runner: tcmur_cmdproc_thread:635 rbd/AUBUN-VMW-CLUSTER01-SSD.AUBUN-VMW-Cluster01: Command 0x85 not supported May 25 18:00:57 SVR-AUBUN-SWD-ISCSI1 tcmu-runner: 2018-05-25 18:00:57.089 852 [WARN] tcmur_cmdproc_thread:635 rbd/AUBUN-VMW-CLUSTER01-SSD.AUBUN-VMW-Cluster01: Command 0x85 not supported May 25 18:00:57 SVR-AUBUN-SWD-ISCSI1 tcmu-runner: tcmur_cmdproc_thread:635 rbd/AUBUN-VMW-CLUSTER01-SSD.AUBUN-VMW-Cluster01: Command 0x85 not supported