ceph / ceph-nvmeof

Service to provide Ceph storage over NVMe-oF/TCP protocol
GNU Lesser General Public License v3.0
85 stars 46 forks source link

GW server logs notice on keep alive timeout and disconnects host from subsystem #161

Closed rahullepakshi closed 9 months ago

rahullepakshi commented 1 year ago

Not sure if this is a config miss or related to any other thing but does anyone has an idea about below from GW server logs

[2023-07-18 04:42:36.216478] ctrlr.c: 221:nvmf_ctrlr_keep_alive_poll: *NOTICE*: Disconnecting host nqn.2014-08.org.nvmexpress:uuid:157f1d62-0e94-4c55-a4ee-bab80b0e8ca8 from subsystem nqn.2016-06.io.spdk:cnode1 due to keep alive timeout.
[2023-07-18 04:42:37.335758] ctrlr.c: 639:nvmf_ctrlr_add_io_qpair: *ERROR*: Inactive admin qpair (state 3, group (nil))
[2023-07-18 04:44:28.451211] tcp.c: 969:_tcp_write_pdu: *ERROR*: Could not write IC_RESP to socket: rc=-1, errno=32

This is seen around 60 minutes after discovery and connection from Linux initiator/host. In this span I added 115 namespaces to this subsystem, run some I/O from host and idle for say 40 minutes. Post which I could not list nvme volumes on initiator and probably because of above logs it got disconnected. Eventually GW crashed but I am unsure if this is the reason for it.

Complete logs -

 INFO:control.grpc:Received request to create bdev bdev115 from rbd/image115 with block size 4096
[2023-07-18 04:00:29.920613] bdev_rbd.c:1248:bdev_rbd_create: *NOTICE*: Add bdev115 rbd disk to lun
INFO:control.grpc:create_bdev: bdev115
DEBUG:control.state:omap_key generated: bdev_bdev115
INFO:control.grpc:Received request to add bdev115 to nqn.2016-06.io.spdk:cnode1
INFO:control.grpc:add_namespace: 115
DEBUG:control.state:omap_key generated: namespace_nqn.2016-06.io.spdk:cnode1_115
[2023-07-18 04:41:03.744638] ctrlr.c: 221:nvmf_ctrlr_keep_alive_poll: *NOTICE*: Disconnecting host nqn.2014-08.org.nvmexpress:uuid:157f1d62-0e94-4c55-a4ee-bab80b0e8ca8 from subsystem nqn.2016-06.io.spdk:cnode1 due to keep alive timeout.
[2023-07-18 04:41:15.561995] ctrlr.c: 708:_nvmf_ctrlr_add_io_qpair: *ERROR*: Inactive admin qpair (state 3, group (nil))
[2023-07-18 04:41:25.955404] ctrlr.c: 221:nvmf_ctrlr_keep_alive_poll: *NOTICE*: Disconnecting host nqn.2014-08.org.nvmexpress:uuid:157f1d62-0e94-4c55-a4ee-bab80b0e8ca8 from subsystem nqn.2016-06.io.spdk:cnode1 due to keep alive timeout.
[2023-07-18 04:41:52.087383] ctrlr.c: 221:nvmf_ctrlr_keep_alive_poll: *NOTICE*: Disconnecting host nqn.2014-08.org.nvmexpress:uuid:157f1d62-0e94-4c55-a4ee-bab80b0e8ca8 from subsystem nqn.2016-06.io.spdk:cnode1 due to keep alive timeout.
[2023-07-18 04:42:04.703479] tcp.c: 969:_tcp_write_pdu: *ERROR*: Could not write IC_RESP to socket: rc=-1, errno=32
[2023-07-18 04:42:09.687838] ctrlr.c: 221:nvmf_ctrlr_keep_alive_poll: *NOTICE*: Disconnecting host nqn.2014-08.org.nvmexpress:uuid:157f1d62-0e94-4c55-a4ee-bab80b0e8ca8 from subsystem nqn.2016-06.io.spdk:cnode1 due to keep alive timeout.
[2023-07-18 04:42:17.073381] ctrlr.c: 221:nvmf_ctrlr_keep_alive_poll: *NOTICE*: Disconnecting host nqn.2014-08.org.nvmexpress:uuid:157f1d62-0e94-4c55-a4ee-bab80b0e8ca8 from subsystem nqn.2016-06.io.spdk:cnode1 due to keep alive timeout.
[2023-07-18 04:42:36.216478] ctrlr.c: 221:nvmf_ctrlr_keep_alive_poll: *NOTICE*: Disconnecting host nqn.2014-08.org.nvmexpress:uuid:157f1d62-0e94-4c55-a4ee-bab80b0e8ca8 from subsystem nqn.2016-06.io.spdk:cnode1 due to keep alive timeout.
[2023-07-18 04:42:37.335758] ctrlr.c: 639:nvmf_ctrlr_add_io_qpair: *ERROR*: Inactive admin qpair (state 3, group (nil))
[2023-07-18 04:44:28.451211] tcp.c: 969:_tcp_write_pdu: *ERROR*: Could not write IC_RESP to socket: rc=-1, errno=32
[2023-07-18 04:44:47.443666] ctrlr.c: 221:nvmf_ctrlr_keep_alive_poll: *NOTICE*: Disconnecting host nqn.2014-08.org.nvmexpress:uuid:157f1d62-0e94-4c55-a4ee-bab80b0e8ca8 from subsystem nqn.2016-06.io.spdk:cnode1 due to keep alive timeout.
[2023-07-18 04:45:26.051144] ctrlr.c: 221:nvmf_ctrlr_keep_alive_poll: *NOTICE*: Disconnecting host nqn.2014-08.org.nvmexpress:uuid:157f1d62-0e94-4c55-a4ee-bab80b0e8ca8 from subsystem nqn.2016-06.io.spdk:cnode1 due to keep alive timeout.
[2023-07-18 04:45:45.349413] ctrlr.c: 221:nvmf_ctrlr_keep_alive_poll: *NOTICE*: Disconnecting host nqn.2014-08.org.nvmexpress:uuid:157f1d62-0e94-4c55-a4ee-bab80b0e8ca8 from subsystem nqn.2016-06.io.spdk:cnode1 due to keep alive timeout.
ERROR:control.server:spdk_get_version failed with: 
 [Errno 32] Broken pipe
INFO:control.server:Terminating SPDK...
INFO:control.server:Stopping the server...
INFO:control.server:Exiting the gateway process.

Few references - https://github.com/spdk/spdk/issues/1089 https://github.com/linux-nvme/nvme-cli/issues/624

Observed it again - Test suite- https://github.com/rahullepakshi/cephci/blob/openstack_nvmf_scale/suites/quincy/nvmeof/ceph_nvmeof_openstack_scale.yaml

( I had to copy paste to pad.ceph as I did not find other place to attach logs that can be viewed by all) https://pad.ceph.com/p/test_failure_log -> https://github.com/rahullepakshi/cephci/blob/openstack_nvmf_scale/suites/quincy/nvmeof/ceph_nvmeof_openstack_scale.yaml#L148-L180 GW crash log - https://pad.ceph.com/p/KATO_ceph_NVMeTCP_issue

leonidc commented 1 year ago

From the logs that we have I guess the issue can be something like "socket hung" in the Gateway. Is it reproducible? Do you think it depends on number connected hosts/namespaces, Rahul?

We observe the following errors in the log:

Eventually the spdk process is killed due to not responding to rpc keep-alive . It also may indicate the same issue - socket hung

"ERROR:control.server:spdk_get_version failed with: [Errno 104] Connection reset by peer INFO:control.server:Terminating SPDK..."

Does spdk create a core-dump upon termination? It would be great to see the back-trace of spdk processes It is clear that just from the logs it is hard to figure out the bug, Is it a known issue? (I am new to this project)

rahullepakshi commented 1 year ago

@leonidc Yes it is reproducible. able to see this thrice. Disconnection is issued against all 3 hosts that have been connected to 3 individual subsystems on this GW. This is incremental single GW components scale test and FIO which was in progress on individual/ single device from host 2 got failed as nvme list is empty upon host disconnection

2023-07-19 16:12:22,785 (cephci.test_ceph_nvmeof_gateway_scale) [INFO] - cephci.ceph.ceph.py:1503 - long running command on 10.0.209.209 -- fio  --ioengine libaio --filename /dev/nvme0n31 --runtime 30 --time_based --name test-1 --numjobs 1 --rw read --iodepth 8 --fsync 32 --group_reporting with None seconds
2023-07-19 16:12:24,789 (cephci.test_ceph_nvmeof_gateway_scale) [DEBUG] - cephci.ceph.ceph.py:1523 - b'test-1: (g=0): rw=read, bs=(R) 4096B-4096B, (W) 4096B-4096B, (T) 4096B-4096B, ioengine=libaio, iodepth=8'
2023-07-19 16:12:24,790 (cephci.test_ceph_nvmeof_gateway_scale) [DEBUG] - cephci.ceph.ceph.py:1523 - b'fio-3.27'
2023-07-19 16:12:24,791 (cephci.test_ceph_nvmeof_gateway_scale) [DEBUG] - cephci.ceph.ceph.py:1523 - b'Starting 1 process'
2023-07-19 16:12:53,912 (cephci.test_ceph_nvmeof_gateway_scale) [DEBUG] - cephci.ceph.ceph.py:1523 - b''
2023-07-19 16:12:53,913 (cephci.test_ceph_nvmeof_gateway_scale) [DEBUG] - cephci.ceph.ceph.py:1523 - b'test-1: (groupid=0, jobs=1): err= 0: pid=4732: Wed Jul 19 12:12:53 2023'

................(successful transaction).....................

2023-07-19 16:12:53,935 (cephci.test_ceph_nvmeof_gateway_scale) [DEBUG] - cephci.ceph.ceph.py:1523 - b'  nvme0n31: ios=0/0, merge=0/0, ticks=0/0, in_queue=0, util=0.00%'
2023-07-19 16:12:53,938 (cephci.test_ceph_nvmeof_gateway_scale) [INFO] - cephci.ceph.ceph.py:1527 - Command completed on 2023-07-19 16:12:53.937420
2023-07-19 16:12:53,938 (cephci.test_ceph_nvmeof_gateway_scale) [DEBUG] - cephci.utility.utils.py:1959 - Config Received for fio: {'device_name': '/dev/nvme0n30', 'client_node': <ceph.ceph.CephNode object at 0x7fc864160668>, 'run_time': '30', 'long_running': True, 'io_type': 'read', 'cmd_timeout': 'notimeout'}
2023-07-19 16:12:53,939 (cephci.test_ceph_nvmeof_gateway_scale) [INFO] - cephci.ceph.ceph.py:1503 - long running command on 10.0.209.209 -- fio  --ioengine libaio --filename /dev/nvme0n30 --runtime 30 --time_based --name test-1 --numjobs 1 --rw read --iodepth 8 --fsync 32 --group_reporting with None seconds
2023-07-19 16:12:55,943 (cephci.test_ceph_nvmeof_gateway_scale) [DEBUG] - cephci.ceph.ceph.py:1523 - b'test-1: (g=0): rw=read, bs=(R) 4096B-4096B, (W) 4096B-4096B, (T) 4096B-4096B, ioengine=libaio, iodepth=8'
2023-07-19 16:12:55,944 (cephci.test_ceph_nvmeof_gateway_scale) [DEBUG] - cephci.ceph.ceph.py:1523 - b'fio-3.27'
2023-07-19 16:12:55,944 (cephci.test_ceph_nvmeof_gateway_scale) [DEBUG] - cephci.ceph.ceph.py:1523 - b'Starting 1 process'
2023-07-19 16:24:58,052 (cephci.test_ceph_nvmeof_gateway_scale) [DEBUG] - cephci.ceph.ceph.py:1523 - b'fio: pid=4749, err=5/file:io_u.c:1841, func=io_u error, error=Input/output error'
2023-07-19 16:24:58,053 (cephci.test_ceph_nvmeof_gateway_scale) [DEBUG] - cephci.ceph.ceph.py:1523 - b'fio: io_u error on file /dev/nvme0n30: Input/output error: read offset=0, buflen=4096'
2023-07-19 16:24:58,078 (cephci.test_ceph_nvmeof_gateway_scale) [DEBUG] - cephci.ceph.ceph.py:1523 - b''
2023-07-19 16:24:58,079 (cephci.test_ceph_nvmeof_gateway_scale) [DEBUG] - cephci.ceph.ceph.py:1523 - b'test-1: (groupid=0, jobs=1): err= 5 (file:io_u.c:1841, func=io_u error, error=Input/output error): pid=4749: Wed Jul 19 12:24:58 2023'

Tagging @sdpeters @trociny @PepperJo @idryomov for more insights

sdpeters commented 1 year ago

Does spdk create a core-dump upon termination? It would be great to see the back-trace of spdk processes It is clear that just from the logs it is hard to figure out the bug, Is it a known issue? (I am new to this project)

Depends how the gateway terminates it.

If the problem can be detected (failing host connections or IOs) before the SPDK process is killed, you could run gcore on the SPDK process to get a core dump (or do it with gdb).

I don't have any theories about what the problem is, but with a dump you could at least look for threads waiting for locks, etc.

Is nvmf_tcp logging already enabled (I haven't looked at these logs yet so don't know)? You can do that at some convenient time (before tings start failing) with the log_set_flag JSON RPC command.

sdpeters commented 1 year ago

This is seen around 60 minutes after discovery and connection from Linux initiator/host.

I see the discovery controller we're talking about here is the one in the SPDK gateway process.

Though the SPDK discovery KATO issue inked above seems to have ben fixed, it's worth asking if this problem occurs when the host doesn't connect to that discovery controller (the subsystem ports are configured manually in the host instead)?

I say this because the POR is for hosts to use the separate cluster-wide discovery controller we'll include with the gateway rather than the discovery controllers in each gateway process. This is because the SPDK discovery controllers are only aware of the subsystems in that same SPDK process. Hosts need to discover all the subsystems and ports in all the Ceph NVMe-oF gateway nodes (see #190).

I see these log messages from the SPDK process related to the discovery controller:

[2023-07-19 11:52:18.495378] subsystem.c:1201:spdk_nvmf_subsystem_listener_allowed: *WARNING*: Allowing connection to discovery subsystem on TCP/10.0.210.100/5003, even though this listener was not added to the discovery subsystem. This behavior is deprecated and will be removed in a future release.

So if we needed to use this discovery controller it seems it should be added to be added to the TCP listener. Given its limitations, we should probably disable it instead,

leonidc commented 1 year ago

Modified server.py not to kill the spdk when it becomes not responsive, When iisue was reproduced the below stack was seen in the gdb:

Thread 1 (Thread 0x7efe8dcf6700 (LWP 181668) "reactor_0"):

0 0x000000000048e6a5 in _sock_flush (sock=sock@entry=0x2ab40c0) at posix.c:1209

1 0x000000000048f0cc in posix_sock_group_impl_poll (_group=0x1dedfa0, max_events=32, socks=0x7ffff60ba3b0) at posix.c:1881

2 0x00000000004f8522 in sock_group_impl_poll_coumax_eventsnt (max_events=32, group=0x1e009b0, group_impl=0x1dedfa0) at sock.c:683

3 spdk_sock_group_poll_count (group=0x1e009b0, max_events=@entry=32) at sock.c:717

4 0x00000000004f85da in spdk_sock_group_poll (group=) at sock.c:668

5 0x00000000004ab8da in nvmf_tcp_poll_group_poll (group=0x1de6130) at tcp.c:3208

6 0x000000000049dd6b in nvmf_poll_group_poll (ctx=) at nvmf.c:71

7 0x00000000004fb90e in thread_execute_poller (poller=0x1e078d0, thread=0x1deb6a0) at thread.c:932

8 thread_poll (thread=thread@entry=0x1deb6a0, max_msgs=max_msgs@entry=0, now=now@entry=3848417530563) at thread.c:1058

9 0x00000000004fc64f in spdk_thread_poll (thread=thread@entry=0x1deb6a0, max_msgs=max_msgs@entry=0, now=3848417530563) at thread.c:1109

10 0x00000000004d9380 in _reactor_run (reactor=0x1e0ae00) at reactor.c:903

11 reactor_run (arg=0x1e0ae00) at reactor.c:941

12 0x00000000004d9833 in spdk_reactors_start () at reactor.c:1053

13 0x00000000004d6797 in spdk_app_start (opts_user=opts_user@entry=0x7ffff60babc0, start_fn=start_fn@entry=0x423ce0 , arg1=arg1@entry=0x0) at app.c:754

14 0x000000000041ea2e in main (argc=4, argv=0x7ffff60badc8) at nvmf_main.c:47

Usually thread waits on epoll_wait (most of the time). But we need more dumps for analyze,

sdpeters commented 1 year ago

2 0x00000000004f8522 in sock_group_impl_poll_coumax_eventsnt (max_events=32, group=0x1e009b0, group_impl=0x1dedfa0) at sock.c:683

There's something funny about that stack trace line. There's no symbol named sock_group_impl_poll_coumax_eventsnt, but sock_group_impl_poll_count and max_events exist. Did gdb display it that way, or was this a copy and paste issue between your terminal and github?

sdpeters commented 1 year ago

Usually thread waits on epoll_wait (most of the time). But we need more dumps for analyze,

I suspect that thread was not blocked in _sock_flush(), but you just caught that poller at that point when you stopped the process. Everything in _sock_flush() that might block (SSL_writev() or sendmsg()) would leave another stack frame.

If you stop this with gdb again, it might be worth stepping through that thread for a while to see if it's looping endlessly over a (corrupted) TAILQ (unlikely), or just busily polling and not making any progress. If it ever returns, it's the latter.

Actually it probably makes more sense to first try this without using the SPDK discovery controller. If the problem still occurs then it's likely a customer will experience it even when using the gateway's cluster discovery service. If not, then we can work around the issue for this test.

leonidc commented 1 year ago

@sdpeters , sock_group_impl_poll_coumax_eventsnt - checked now, that is how it looks in our ibm internal box, was pasted there by @rahullepakshi. He used the gdb command "thread apply all bt" .It was the last shown thread's stack before gdb was disconnected from the process.

caroav commented 1 year ago

issue seems to be fixed after bumping the version to 23.01.1 LTS. Closing.

rahullepakshi commented 1 year ago

Re-opening this issue as issue is seen in SPDK v23.01.1 . On a VM, test is to create 256 namespaces in single subsystem only in Gateway node and run IO in parallel for each bdev, namespaces created. Failure is seen after 183 namespaces are created.

2023-08-18 17:14:29,123 (cephci.test_ceph_nvmeof_ns_limit) [DEBUG] - cephci.Regression.nvmeotcp.1.cephci.ceph.nvmeof.gateway.py:54 - INFO:control.server:Starting gateway ceph-regression-vgifrt-hbcopx-node5
INFO:control.server:SPDK PATH: /tmp/ceph-nvmeof
INFO:control.server:Starting /tmp/ceph-nvmeof/spdk/build/bin/nvmf_tgt -u -r /var/tmp/spdk.sock
INFO:control.server:{'Attempting to initialize SPDK: rpc_socket: /var/tmp/spdk.sock,', ' conn_retries: 10, timeout: 60.0'}
INFO: Setting log level to ERROR
INFO:JSONRPCClient(/var/tmp/spdk.sock):Setting log level to ERROR
[2023-08-18 06:05:07.184624] Starting ****SPDK v23.01.1**** git sha1 186986cf1 / DPDK 22.11.1 initialization...
[2023-08-18 06:05:07.185225] [ DPDK EAL parameters: nvmf --no-shconf -c 0x1 --no-pci --huge-unlink --log-level=lib.eal:6 --log-level=lib.cryptodev:5 --log-level=user1:6 --iova-mode=pa --base-virtaddr=0x200000000000 --match-allocations --file-prefix=spdk_pid185357 ]
TELEMETRY: No legacy callbacks, legacy socket not created
[2023-08-18 06:05:07.297796] app.c: 712:spdk_app_start: *NOTICE*: Total cores available: 1
[2023-08-18 06:05:07.343268] reactor.c: 926:reactor_run: *NOTICE*: Reactor started on core 0
[2023-08-18 06:05:07.379599] accel_sw.c: 681:sw_accel_module_init: *NOTICE*: Accel framework software module initialized.
DEBUG:control.server:create_transport: tcp options: 
[2023-08-18 06:05:07.569870] tcp.c: 629:nvmf_tcp_create: *NOTICE*: *** TCP Transport Init ***
INFO:control.state:First gateway: created object nvmeof.state
INFO:control.grpc:Received request to create subsystem nqn.2016-06.io.spdk:cnode1
INFO:control.grpc:create_subsystem nqn.2016-06.io.spdk:cnode1: True
DEBUG:control.state:omap_key generated: subsystem_nqn.2016-06.io.spdk:cnode1
INFO:control.grpc:Received request to create  TCP listener for nqn.2016-06.io.spdk:cnode1 at :5001.
[2023-08-18 06:05:38.687759] tcp.c: 850:nvmf_tcp_listen: *NOTICE*: *** NVMe/TCP Target Listening on 10.0.208.241 port 5001 ***
INFO:control.grpc:create_listener: True
DEBUG:control.state:omap_key generated: listener_nqn.2016-06.io.spdk:cnode1__TCP__5001
INFO:control.grpc:Received request to allow any host to nqn.2016-06.io.spdk:cnode1
INFO:control.grpc:add_host *: True
DEBUG:control.state:omap_key generated: host_nqn.2016-06.io.spdk:cnode1_*
[2023-08-18 06:05:52.803096] subsystem.c:1201:spdk_nvmf_subsystem_listener_allowed: *WARNING*: Allowing connection to discovery subsystem on TCP/10.0.208.241/5001, even though this listener was not added to the discovery subsystem.  This behavior is deprecated and will be removed in a future release.
INFO:control.grpc:Received request to create bdev bdev1 from rbd/image1 with block size 4096
[2023-08-18 06:06:01.775586] bdev_rbd.c:1248:bdev_rbd_create: *NOTICE*: Add bdev1 rbd disk to lun
INFO:control.grpc:create_bdev: bdev1
DEBUG:control.state:omap_key generated: bdev_bdev1
INFO:control.grpc:Received request to add bdev1 to nqn.2016-06.io.spdk:cnode1
INFO:control.grpc:add_namespace: 1
DEBUG:control.state:omap_key generated: namespace_nqn.2016-06.io.spdk:cnode1_1
INFO:control.grpc:Received request to create bdev bdev2 from rbd/image2 with block size 4096
[2023-08-18 06:06:30.367716] bdev_rbd.c:1248:bdev_rbd_create: *NOTICE*: Add bdev2 rbd disk to lun
INFO:control.grpc:create_bdev: bdev2
DEBUG:control.state:omap_key generated: bdev_bdev2
INFO:control.grpc:Received request to add bdev2 to nqn.2016-06.io.spdk:cnode1
INFO:control.grpc:add_namespace: 2
DEBUG:control.state:omap_key generated: namespace_nqn.2016-06.io.spdk:cnode1_2
.
.
.
[2023-08-18 07:32:43.341028] bdev_rbd.c:1248:bdev_rbd_create: *NOTICE*: Add bdev183 rbd disk to lun
INFO:control.grpc:create_bdev: bdev183
DEBUG:control.state:omap_key generated: bdev_bdev183
INFO:control.grpc:Received request to add bdev183 to nqn.2016-06.io.spdk:cnode1
INFO:control.grpc:add_namespace: 183
DEBUG:control.state:omap_key generated: namespace_nqn.2016-06.io.spdk:cnode1_183
[2023-08-18 07:33:51.420789] ctrlr.c: 225:nvmf_ctrlr_keep_alive_poll: *NOTICE*: Disconnecting host nqn.2014-08.org.nvmexpress:uuid:3791cbb1-621d-49dd-a102-bd147a608eed from subsystem nqn.2016-06.io.spdk:cnode1 due to keep alive timeout.
[2023-08-18 07:34:03.568912] ctrlr.c: 225:nvmf_ctrlr_keep_alive_poll: *NOTICE*: Disconnecting host nqn.2014-08.org.nvmexpress:uuid:3791cbb1-621d-49dd-a102-bd147a608eed from subsystem nqn.2016-06.io.spdk:cnode1 due to keep alive timeout.
[2023-08-18 07:34:15.653846] posix.c: 223:posix_sock_getaddr: *ERROR*: getpeername() failed (errno=107)
[2023-08-18 07:34:27.861349] tcp.c:1228:nvmf_tcp_handle_connect: *ERROR*: spdk_sock_getaddr() failed of tqpair=0x33f2d60
[2023-08-18 07:34:33.274892] posix.c: 223:posix_sock_getaddr: *ERROR*: getpeername() failed (errno=107)
[2023-08-18 07:34:44.361227] tcp.c:1228:nvmf_tcp_handle_connect: *ERROR*: spdk_sock_getaddr() failed of tqpair=0x151ae410
[2023-08-18 07:34:57.221455] tcp.c: 969:_tcp_write_pdu: *ERROR*: Could not write IC_RESP to socket: rc=-1, errno=32
[2023-08-18 07:35:07.438153] ctrlr.c: 225:nvmf_ctrlr_keep_alive_poll: *NOTICE*: Disconnecting host nqn.2014-08.org.nvmexpress:uuid:3791cbb1-621d-49dd-a102-bd147a608eed from subsystem nqn.2016-06.io.spdk:cnode1 due to keep alive timeout.
[2023-08-18 07:35:20.105154] ctrlr.c: 225:nvmf_ctrlr_keep_alive_poll: *NOTICE*: Disconnecting host nqn.2014-08.org.nvmexpress:uuid:3791cbb1-621d-49dd-a102-bd147a608eed from subsystem nqn.2016-06.io.spdk:cnode1 due to keep alive timeout.
[2023-08-18 07:35:26.224252] posix.c: 223:posix_sock_getaddr: *ERROR*: getpeername() failed (errno=107)
[2023-08-18 07:35:26.234008] tcp.c:1228:nvmf_tcp_handle_connect: *ERROR*: spdk_sock_getaddr() failed of tqpair=0x33f2d60
[2023-08-18 07:35:26.243246] ctrlr.c: 225:nvmf_ctrlr_keep_alive_poll: *NOTICE*: Disconnecting host nqn.2014-08.org.nvmexpress:uuid:3791cbb1-621d-49dd-a102-bd147a608eed from subsystem nqn.2016-06.io.spdk:cnode1 due to keep alive timeout.
[2023-08-18 07:35:26.243300] ctrlr.c: 225:nvmf_ctrlr_keep_alive_poll: *NOTICE*: Disconnecting host nqn.2014-08.org.nvmexpress:uuid:3791cbb1-621d-49dd-a102-bd147a608eed from subsystem nqn.2016-06.io.spdk:cnode1 due to keep alive timeout.
[2023-08-18 07:35:41.010855] ctrlr.c: 225:nvmf_ctrlr_keep_alive_poll: *NOTICE*: Disconnecting host nqn.2014-08.org.nvmexpress:uuid:3791cbb1-621d-49dd-a102-bd147a608eed from subsystem nqn.2016-06.io.spdk:cnode1 due to keep alive timeout.
[2023-08-18 07:35:57.390126] ctrlr.c: 225:nvmf_ctrlr_keep_alive_poll: *NOTICE*: Disconnecting host nqn.2014-08.org.nvmexpress:uuid:3791cbb1-621d-49dd-a102-bd147a608eed from subsystem nqn.2016-06.io.spdk:cnode1 due to keep alive timeout.
[2023-08-18 07:36:05.050867] ctrlr.c: 712:_nvmf_ctrlr_add_io_qpair: *ERROR*: Inactive admin qpair (state 3, group (nil))
[2023-08-18 07:36:22.481862] ctrlr.c: 225:nvmf_ctrlr_keep_alive_poll: *NOTICE*: Disconnecting host nqn.2014-08.org.nvmexpress:uuid:3791cbb1-621d-49dd-a102-bd147a608eed from subsystem nqn.2016-06.io.spdk:cnode1 due to keep alive timeout.
[2023-08-18 07:36:53.635850] ctrlr.c: 225:nvmf_ctrlr_keep_alive_poll: *NOTICE*: Disconnecting host nqn.2014-08.org.nvmexpress:uuid:3791cbb1-621d-49dd-a102-bd147a608eed from subsystem nqn.2016-06.io.spdk:cnode1 due to keep alive timeout.
[2023-08-18 07:36:56.757394] ctrlr.c: 712:_nvmf_ctrlr_add_io_qpair: *ERROR*: Inactive admin qpair (state 3, group (nil))
[2023-08-18 07:37:04.213107] ctrlr.c: 225:nvmf_ctrlr_keep_alive_poll: *NOTICE*: Disconnecting host nqn.2014-08.org.nvmexpress:uuid:3791cbb1-621d-49dd-a102-bd147a608eed from subsystem nqn.2016-06.io.spdk:cnode1 due to keep alive timeout.
[2023-08-18 07:37:37.015434] ctrlr.c: 225:nvmf_ctrlr_keep_alive_poll: *NOTICE*: Disconnecting host nqn.2014-08.org.nvmexpress:uuid:3791cbb1-621d-49dd-a102-bd147a608eed from subsystem nqn.2016-06.io.spdk:cnode1 due to keep alive timeout.
[2023-08-18 07:37:42.059726] ctrlr.c: 643:nvmf_ctrlr_add_io_qpair: *ERROR*: Inactive admin qpair (state 3, group (nil))
[2023-08-18 07:37:52.277365] ctrlr.c: 225:nvmf_ctrlr_keep_alive_poll: *NOTICE*: Disconnecting host nqn.2014-08.org.nvmexpress:uuid:3791cbb1-621d-49dd-a102-bd147a608eed from subsystem nqn.2016-06.io.spdk:cnode1 due to keep alive timeout.
ERROR:control.server:spdk_get_version failed with: 
 Timeout while waiting for response:

INFO:control.server:Terminating SPDK...
[2023-08-18 07:43:06.430232] posix.c: 223:posix_sock_getaddr: *ERROR*: getpeername() failed (errno=107)
INFO:control.server:Stopping the server...
INFO:control.server:Exiting the gateway process.
rahullepakshi commented 1 year ago

@caroav Can you provide me logs/ any information with which you saw that this issue might be fixed at SPDK v23.01.1 as at https://github.com/ceph/ceph-nvmeof/issues/161#issuecomment-1680503267 ? I wanted to cross check with what I have here when recreated at this version

caroav commented 1 year ago

@leonidc is this failure looking the same as the original failure of this issue?

leonidc commented 1 year ago

@rahullepakshi , errors looking the same as the original failure, can you please check in var/log/messages for messages like "heartbeat_check" from osd and "oom_kill_process" ?

PepperJo commented 1 year ago

@rahullepakshi I assume this is due to 1000s of threads being created because currently we don't share the Ceph client instance in SPDK. This will make SPDK really slow and might miss timeouts etc. See https://github.com/ceph/ceph-nvmeof/issues/28

caroav commented 1 year ago

In addition to @PepperJo comments, we also asked @rahullepakshi to run with tcp memory configuration as we set in the container (transport_tcp_options = {"in_capsule_data_size" : 8192, "max_io_qpairs_per_ctrlr" : 7}) This should also relieve the pressure on the memory.

rahullepakshi commented 1 year ago

@leonidc I do not see any messages on "heartbeat_check" from osd and "oom_kill_process" in this case Acknowledged @PepperJo

Few observations, 1) On a VM, with 16GB RAM running GW server with (transport_tcp_options = {"in_capsule_data_size" : 8192, "max_io_qpairs_per_ctrlr" : 7}) I did not see failure in test while running IO and even it was idle 2) On a VM, with 8GB RAM running GW server with (transport_tcp_options = {"in_capsule_data_size" : 8192, "max_io_qpairs_per_ctrlr" : 7}), after bringing up 256 namespaces on single GW and running IO nearly upto on 125 volumes, GW node was hung for 3-5 minutes and got terminated

# cat /tmp/ceph-nvmeof/output.log
INFO:control.server:Starting gateway ceph-nvmf1-gryb7n-node5
INFO:control.server:SPDK PATH: /tmp/ceph-nvmeof
INFO:control.server:Starting /tmp/ceph-nvmeof/spdk/build/bin/nvmf_tgt -u -r /var/tmp/spdk.sock
INFO:control.server:{'Attempting to initialize SPDK: rpc_socket: /var/tmp/spdk.sock,', ' conn_retries: 10, timeout: 60.0'}
INFO: Setting log level to ERROR
INFO:JSONRPCClient(/var/tmp/spdk.sock):Setting log level to ERROR
[2023-08-22 06:42:41.338156] Starting SPDK v23.01.1 git sha1 186986cf1 / DPDK 22.11.1 initialization...
[2023-08-22 06:42:41.339320] [ DPDK EAL parameters: nvmf --no-shconf -c 0x1 --no-pci --huge-unlink --log-level=lib.eal:6 --log-level=lib.cryptodev:5 --log-level=user1:6 --iova-mode=pa --base-virtaddr=0x200000000000 --match-allocations --file-prefix=spdk_pid182281 ]
TELEMETRY: No legacy callbacks, legacy socket not created
[2023-08-22 06:42:41.453485] app.c: 712:spdk_app_start: *NOTICE*: Total cores available: 1
[2023-08-22 06:42:41.493269] reactor.c: 926:reactor_run: *NOTICE*: Reactor started on core 0
[2023-08-22 06:42:41.528004] accel_sw.c: 681:sw_accel_module_init: *NOTICE*: Accel framework software module initialized.
DEBUG:control.server:create_transport: tcp options: {"in_capsule_data_size" : 8192, "max_io_qpairs_per_ctrlr" : 7}
[2023-08-22 06:42:41.730084] tcp.c: 629:nvmf_tcp_create: *NOTICE*: *** TCP Transport Init ***
INFO:control.state:First gateway: created object nvmeof.state
INFO:control.grpc:Received request to create subsystem nqn.2016-06.io.spdk:cnode1
INFO:control.grpc:create_subsystem nqn.2016-06.io.spdk:cnode1: True
.
.
.
INFO:control.grpc:Received request to add IQC2-bdev255 to nqn.2016-06.io.spdk:cnode1
INFO:control.grpc:add_namespace: 256
DEBUG:control.state:omap_key generated: namespace_nqn.2016-06.io.spdk:cnode1_256
INFO:control.grpc:Received request to get subsystems
INFO:control.grpc:get_subsystems: [{'nqn': 'nqn.2014-08.org.nvmexpress.discovery', 'subtype': 'Discovery', 'listen_addresses': [], 'allow_any_host': True, 'hosts': []}, {'nqn': 'nqn.2016-06.io.spdk:cnode1', 'subtype': 'NVMe', 'listen_addresses': [{'transport': 'TCP', 'trtype': 'TCP', 'adrfam': 'IPv4', 'traddr': '10.0.210.235', 'trsvcid': '5001'}], 'allow_any_host': True, 'hosts': [], 'serial_number': '1', 'model_number': 'SPDK bdev Controller', 'max_namespaces': 256, 'min_cntlid': 1, 'max_cntlid': 65519, 'namespaces': [{'nsid': 1, 'bdev_name': 'IQC2-bdev0', 'name': 'IQC2-bdev0', 'nguid': '56957D71AE5140559739BCDAE1287415', 'uuid': '56957d71-ae51-4055-9739-bcdae1287415'}, {'nsid': 2, 'bdev_name': 'IQC2-bdev1', 'name': 'IQC2-bdev1', 'nguid': '2049F1C65DA34A908EA7B9635B2E03C5', 'uuid': '2049f1c6-5da3-4a90-8ea7-b9635b2e03c5'}, {'nsid': 3, 'bdev_name': 'IQC2-bdev2', . . . . . . .
.
.
.
[2023-08-22 06:45:23.353164] subsystem.c:1201:spdk_nvmf_subsystem_listener_allowed: *WARNING*: Allowing connection to discovery subsystem on TCP/10.0.210.235/5001, even though this listener was not added to the discovery subsystem.  This behavior is deprecated and will be removed in a future release.
ERROR:control.server:spdk_get_version failed with:
 Timeout while waiting for response:

INFO:control.server:Terminating SPDK...
INFO:control.server:Stopping the server...
INFO:control.server:Exiting the gateway process.
caroav commented 1 year ago

@rahullepakshi it is my understanding now that with 256 namespaces, it is working fine if the GW has 16 GB RAM. Please confirm. It fails for 8 GB RAM configuration. That might be addressed if needed in another issue for Squid.

rahullepakshi commented 1 year ago

Thats correct @caroav

rahullepakshi commented 1 year ago

With upstream container build on VMs, (logs will open only within redhat network) Seeing GW crash with containers also with 4GB RAM node at 135 namespace with IO- http://magna002.ceph.redhat.com/cephci-jenkins/cephci-run-KDQSHS Whereas on 16GB RAM node - crashes at 391 namespaces with IO - http://magna002.ceph.redhat.com/cephci-jenkins/cephci-run-Y4GMR7/test_1k_namespace_with_1_subsystem_in_Single_GW_0.log

caroav commented 1 year ago

@rahullepakshi can you retest it with 0.0.4? We want to know if a node with 16GB RAM can handle 400 namespaces with IOs. It should be better we believe in 0.0.4 because of this PR https://github.com/ceph/ceph-nvmeof/pull/230

rahullepakshi commented 1 year ago

ok sure @caroav . I will plan for this week if possible

rahullepakshi commented 1 year ago

Just tested with 4GB RAM node with 0.0.4 and recent ceph build but with no IOs, as automation test runs fail due "nvme discover" command failure to discover subsystems from client node. It fails at namespace 191. I will also update for 16GB RAM node once issue is fixed.

[root@ceph-nvmf3-00egmz-node5 cephuser]# cat /proc/meminfo
MemTotal:        3735744 kB
MemFree:          166044 kB
MemAvailable:     211432 kB

Oct 05 02:18:17 ceph-nvmf3-00egmz-node5 ceph-f090e6ee-633c-11ee-bb8d-fa163e81204e-nvmeof-rbd-ceph-nvmf3-00egmz-node5-nsrbug[48764]: DEBUG:control.state:omap_key generated: namespace_nqn.2016-06.io.spdk:cnode1_191
Oct 05 02:18:20 ceph-nvmf3-00egmz-node5 ceph-f090e6ee-633c-11ee-bb8d-fa163e81204e-nvmeof-rbd-ceph-nvmf3-00egmz-node5-nsrbug[48764]: INFO:control.grpc:Received request to create bdev BBAE-bdev192 from rbd/BBAE-image192 with block size 4096
Oct 05 02:18:20 ceph-nvmf3-00egmz-node5 ceph-f090e6ee-633c-11ee-bb8d-fa163e81204e-nvmeof-rbd-ceph-nvmf3-00egmz-node5-nsrbug[48764]: [2023-10-05 06:18:20.637755] bdev_rbd.c:1199:bdev_rbd_create: *NOTICE*: Add BBAE-bdev192 rbd disk to lun
Oct 05 02:18:20 ceph-nvmf3-00egmz-node5 ceph-f090e6ee-633c-11ee-bb8d-fa163e81204e-nvmeof-rbd-ceph-nvmf3-00egmz-node5-nsrbug[48764]: INFO:control.grpc:create_bdev: BBAE-bdev192
Oct 05 02:18:20 ceph-nvmf3-00egmz-node5 ceph-f090e6ee-633c-11ee-bb8d-fa163e81204e-nvmeof-rbd-ceph-nvmf3-00egmz-node5-nsrbug[48764]: DEBUG:control.state:omap_key generated: bdev_BBAE-bdev192
Oct 05 02:18:22 ceph-nvmf3-00egmz-node5 ceph-f090e6ee-633c-11ee-bb8d-fa163e81204e-nvmeof-rbd-ceph-nvmf3-00egmz-node5-nsrbug[48764]: INFO:control.grpc:Received request to add BBAE-bdev192 to nqn.2016-06.io.spdk:cnode1
Oct 05 02:18:22 ceph-nvmf3-00egmz-node5 ceph-f090e6ee-633c-11ee-bb8d-fa163e81204e-nvmeof-rbd-ceph-nvmf3-00egmz-node5-nsrbug[48764]: INFO:control.grpc:add_namespace: 192
Oct 05 02:18:37 ceph-nvmf3-00egmz-node5 ceph-f090e6ee-633c-11ee-bb8d-fa163e81204e-nvmeof-rbd-ceph-nvmf3-00egmz-node5-nsrbug[48764]: DEBUG:control.state:omap_key generated: namespace_nqn.2016-06.io.spdk:cnode1_192
Oct 05 02:18:49 ceph-nvmf3-00egmz-node5 ceph-f090e6ee-633c-11ee-bb8d-fa163e81204e-nvmeof-rbd-ceph-nvmf3-00egmz-node5-nsrbug[48764]: INFO:control.state:Failed to notify.
Oct 05 02:20:36 ceph-nvmf3-00egmz-node5 ceph-f090e6ee-633c-11ee-bb8d-fa163e81204e-nvmeof-rbd-ceph-nvmf3-00egmz-node5-nsrbug[48764]: INFO:control.grpc:Received request to create bdev BBAE-bdev193 from rbd/BBAE-image193 with block size 4096
Oct 05 02:20:41 ceph-nvmf3-00egmz-node5 ceph-f090e6ee-633c-11ee-bb8d-fa163e81204e-nvmeof-rbd-ceph-nvmf3-00egmz-node5-nsrbug[48764]: INFO:control.grpc:Allocating cluster name='cluster_context_24'
Oct 05 02:21:54 ceph-nvmf3-00egmz-node5 ceph-f090e6ee-633c-11ee-bb8d-fa163e81204e-nvmeof-rbd-ceph-nvmf3-00egmz-node5-nsrbug[48764]: ERROR:control.grpc:create_bdev failed with:
Oct 05 02:21:54 ceph-nvmf3-00egmz-node5 ceph-f090e6ee-633c-11ee-bb8d-fa163e81204e-nvmeof-rbd-ceph-nvmf3-00egmz-node5-nsrbug[48764]:  Timeout while waiting for response:
Oct 05 02:21:54 ceph-nvmf3-00egmz-node5 ceph-f090e6ee-633c-11ee-bb8d-fa163e81204e-nvmeof-rbd-ceph-nvmf3-00egmz-node5-nsrbug[48764]:
Oct 05 02:21:54 ceph-nvmf3-00egmz-node5 ceph-f090e6ee-633c-11ee-bb8d-fa163e81204e-nvmeof-rbd-ceph-nvmf3-00egmz-node5-nsrbug[48764]:
Oct 05 02:21:54 ceph-nvmf3-00egmz-node5 ceph-f090e6ee-633c-11ee-bb8d-fa163e81204e-nvmeof-rbd-ceph-nvmf3-00egmz-node5-nsrbug[48764]:
Oct 05 02:22:04 ceph-nvmf3-00egmz-node5 ceph-f090e6ee-633c-11ee-bb8d-fa163e81204e-nvmeof-rbd-ceph-nvmf3-00egmz-node5-nsrbug[48764]: ERROR:control.server:spdk_get_version failed with:
Oct 05 02:22:04 ceph-nvmf3-00egmz-node5 ceph-f090e6ee-633c-11ee-bb8d-fa163e81204e-nvmeof-rbd-ceph-nvmf3-00egmz-node5-nsrbug[48764]:  Timeout while waiting for response:
Oct 05 02:22:04 ceph-nvmf3-00egmz-node5 ceph-f090e6ee-633c-11ee-bb8d-fa163e81204e-nvmeof-rbd-ceph-nvmf3-00egmz-node5-nsrbug[48764]:
Oct 05 02:22:04 ceph-nvmf3-00egmz-node5 ceph-f090e6ee-633c-11ee-bb8d-fa163e81204e-nvmeof-rbd-ceph-nvmf3-00egmz-node5-nsrbug[48764]:
Oct 05 02:22:04 ceph-nvmf3-00egmz-node5 ceph-f090e6ee-633c-11ee-bb8d-fa163e81204e-nvmeof-rbd-ceph-nvmf3-00egmz-node5-nsrbug[48764]:
Oct 05 02:22:07 ceph-nvmf3-00egmz-node5 ceph-f090e6ee-633c-11ee-bb8d-fa163e81204e-nvmeof-rbd-ceph-nvmf3-00egmz-node5-nsrbug[48764]: INFO:control.server:Aborting SPDK(client.nvmeof.rbd.ceph-nvmf3-00egmz-node5.nsrbug) pid 3...
Oct 05 02:22:57 ceph-nvmf3-00egmz-node5 ceph-f090e6ee-633c-11ee-bb8d-fa163e81204e-nvmeof-rbd-ceph-nvmf3-00egmz-node5-nsrbug[48764]: INFO:control.server:Stopping the server...
Oct 05 02:22:57 ceph-nvmf3-00egmz-node5 ceph-f090e6ee-633c-11ee-bb8d-fa163e81204e-nvmeof-rbd-ceph-nvmf3-00egmz-node5-nsrbug[48764]: INFO:control.server:Exiting the gateway process.
Oct 05 02:22:57 ceph-nvmf3-00egmz-node5 podman[93053]: 2023-10-05 02:22:57.911363083 -0400 EDT m=+0.057506712 container died 49f70f93ca6f9047512ff7fa0a08cbfb4025b0223fcd390cde1cd0e9a63814f1 (image=registry-proxy.engineering.redhat.com/rh-osbs/ceph-nvmeof:latest, name=ceph-f090e6ee-633c-11ee-bb8d-fa163e81204e-nvmeof-rbd-ceph-nvmf3-00egmz-node5-nsrbug, vcs-type=git, io.k8s.display-name=Red Hat Universal Base Image 9 Minimal, com.redhat.component=ceph-nvmeof-container, url=https://access.redhat.com/containers/#/registry.access.redhat.com/ceph-nvmeof/images/0.0.4-1, maintainer=Alexander Indenbaum <aindenba@ibm.com>, io.k8s.description=Ceph NVMe over Fabrics Gateway, description=Ceph NVMe over Fabrics Gateway, architecture=x86_64, version=0.0.4, build-date=2023-09-28T18:35:50, io.buildah.version=1.29.0, io.openshift.tags=minimal rhel9, name=ceph-nvmeof, distribution-scope=public, vendor=Red Hat, Inc., release=1, summary=Service to provide block storage on top of Ceph for platforms (e.g.: VMWare) without native Ceph support (RBD), replacing existing approaches (iSCSI) with a newer and more versatile standard (NVMe-oF)., vcs-ref=1bf80844db1f8190085b23b96d894dd34ee5e7f5, io.openshift.expose-services=, com.redhat.license_terms=https://www.redhat.com/agreements)
Oct 05 02:22:57 ceph-nvmf3-00egmz-node5 podman[93053]: 2023-10-05 02:22:57.948256622 -0400 EDT m=+0.094400239 container remove 49f70f93ca6f9047512ff7fa0a08cbfb4025b0223fcd390cde1cd0e9a63814f1 (image=registry-proxy.engineering.redhat.com/rh-osbs/ceph-nvmeof:latest, name=ceph-f090e6ee-633c-11ee-bb8d-fa163e81204e-nvmeof-rbd-ceph-nvmf3-00egmz-node5-nsrbug, vcs-ref=1bf80844db1f8190085b23b96d894dd34ee5e7f5, maintainer=Alexander Indenbaum <aindenba@ibm.com>, io.k8s.display-name=Red Hat Universal Base Image 9 Minimal, vendor=Red Hat, Inc., com.redhat.component=ceph-nvmeof-container, com.redhat.license_terms=https://www.redhat.com/agreements, description=Ceph NVMe over Fabrics Gateway, summary=Service to provide block storage on top of Ceph for platforms (e.g.: VMWare) without native Ceph support (RBD), replacing existing approaches (iSCSI) with a newer and more versatile standard (NVMe-oF)., io.openshift.tags=minimal rhel9, name=ceph-nvmeof, url=https://access.redhat.com/containers/#/registry.access.redhat.com/ceph-nvmeof/images/0.0.4-1, version=0.0.4, build-date=2023-09-28T18:35:50, release=1, io.buildah.version=1.29.0, io.openshift.expose-services=, vcs-type=git, io.k8s.description=Ceph NVMe over Fabrics Gateway, architecture=x86_64, distribution-scope=public)
Oct 05 02:22:58 ceph-nvmf3-00egmz-node5 systemd[1]: ceph-f090e6ee-633c-11ee-bb8d-fa163e81204e@nvmeof.rbd.ceph-nvmf3-00egmz-node5.nsrbug.service: Deactivated successfully.
Oct 05 02:22:58 ceph-nvmf3-00egmz-node5 systemd[1]: ceph-f090e6ee-633c-11ee-bb8d-fa163e81204e@nvmeof.rbd.ceph-nvmf3-00egmz-node5.nsrbug.service: Consumed 15min 15.449s CPU time.```
caroav commented 9 months ago

This is fixed a long time ago. Discussed with @rahullepakshi and we agreed to close.