Open madkiss opened 5 months ago
Some additional infos as I forgot to put those in the first mail. Target version is 1.2.9, Ceph version is ceph version 18.2.2 (531c0d11a1c5d39fbfe6aa8a521f023abf3bf3e2) reef (stable).
Set up is brand new with no previous configuration in place. No extraordinary strange configuration either. Any help will be greatly appreciated. Thank you very much in advance again.
I faced similar issues with v1.2.x in 18.2.2. I had to move to v1.0.0 and it functions as expected. To set to pull v1.0.0
ceph config set mgr mgr/cephadm/container_image_nvmeof quay.io/ceph/nvmeof:1.0.0
ceph orch apply nvmeof <pool-name> --placement=
I am facing similar challenges when I upgraded to 18.2.4 and currently testing deployments.
Right now the GW cannot work without a special build of ceph. The reason is that the GW depends on the new nvmeof paxos service which is a part of https://github.com/ceph/ceph/pull/54671. In the .env file, we update the sha to a ceph ci build that contains that PR, and should be working with the GW. Hopefully in the near future that PR and the nvmeof paxos service will be a part of ongoing ceph builds. Please make sure to work with latest release as update here, and with latest commit of devel branch.
These are the commands, I followed to create for ceph-nvmeof v1.0.0 in 18.2.2 and 18.2.4. I'm currently looking into deploying latest versions of ceph-nvmeof
ceph osd pool create nvmeof_pool01
rbd pool init nvmeof_pool01
rbd -p nvmeof_pool01 create nvme_image --size 50G
ceph config set mgr mgr/cephadm/container_image_nvmeof quay.io/ceph/nvmeof:1.0.0
ceph orch apply nvmeof nvmeof_pool01
alias nvmeof-cli='docker run -it quay.io/ceph/nvmeof-cli:1.0.0 --server-address <host-ip-where-nvmeof-service-is-running> --server-port 5500'
nvmeof-cli subsystem add --subsystem nqn.2016-06.io.spdk:ceph
nvmeof-cli namespace add --subsystem nqn.2016-06.io.spdk:ceph --rbd-pool nvmeof_pool01 --rbd-image nvme_image
ceph orch ps | grep nvme # This will give you the service name
nvmeof-cli listener add --subsystem nqn.2016-06.io.spdk:ceph --gateway-name client.<service-name-from-earlier-command> --traddr <host-ip-where-nvmeof-service-is-running> --trsvcid 4420
nvmeof-cli host add --subsystem nqn.2016-06.io.spdk:ceph --host "*" # Allows connections to any host
nvmeof-cli subsystem list # lists subsystems
nvmeof-cli namespace list --subsystem nqn.2016-06.io.spdk:ceph # lists bdevs
Please use quay.io/ceph/nvmeof:1.2.16
and quay.io/ceph/nvmeof-cli:1.2.16
The listener add command changed. I will update the documentation upstream soon, but meanwhile this is the right command:
listener add --subsystem nqn.2016-06.io.spdk:ceph --host-name HOST_NAME --traddr
With 19.1.0(rc) (upgraded from 18.2.4), I have been able to deploy ceph-nvmeof v1.2.16 and add subsystem
. I wonder if the nvmeof-cli v1.2.16 usage to add namespace
has changed as well, similar to listener
?
root@test-Standard-PC-i440FX-PIIX-1996:/home/test# ceph orch ps | grep nvme
nvmeof.nvmeof_pool01.test-Standard-PC-i440FX-PIIX-1996.ojnsii test-Standard-PC-i440FX-PIIX-1996 *:5500,4420,8009 running (2m) 2m ago 2m 44.4M - 1.2.16 c8d40f5109eb d95e2746f6d8
root@test-Standard-PC-i440FX-PIIX-1996:/home/test# alias nvmeof-cli='docker run -it quay.io/ceph/nvmeof-cli:1.2.16 --server-address <ip> --server-port 5500'
root@test-Standard-PC-i440FX-PIIX-1996:/home/test# nvmeof-cli subsystem add --subsystem nqn.2016-06.io.spdk:ceph
Adding subsystem nqn.2016-06.io.spdk:ceph: Successful
root@test-Standard-PC-i440FX-PIIX-1996:/home/test# nvmeof-cli namespace add --subsystem nqn.2016-06.io.spdk:ceph --rbd-pool nvmeof_pool01 --rbd-image nvme_image
Failure adding namespace to nqn.2016-06.io.spdk:ceph:
<_InactiveRpcError of RPC that terminated with:
status = StatusCode.UNKNOWN
details = "Exception calling application: Chosen ANA group is 0"
debug_error_string = "UNKNOWN:Error received from peer ipv4:<ip>:5500 {grpc_message:"Exception calling application: Chosen ANA group is 0", grpc_status:2, created_time:"2024-07-30T23:52:31.770067786+00:00"}"
>
With cephadm deployed v18.2.4, when I tried deploying ceph-nvmeof v1.2.16, I encountered the following in service status, which showed it to be associating with v19.0.0
root@test-Standard-PC-i440FX-PIIX-1996:/home/test# systemctl status ceph-7f3df55a-4e3b-11ef-a674-f94275ab1b57@nvmeof.nvmeof_pool01.test-Standard-PC-i440FX-PIIX-1996.wnrzxa
● ceph-7f3df55a-4e3b-11ef-a674-f94275ab1b57@nvmeof.nvmeof_pool01.test-Standard-PC-i440FX-PIIX-1996.wnrzxa.service - Ceph nvmeof.nvmeof_pool01.test-Standard-PC-i440FX-PIIX-1996.wnrzxa for 7f3df55a-4e3b-11ef-a674-f94275ab1b57
Loaded: loaded (/etc/systemd/system/ceph-7f3df55a-4e3b-11ef-a674-f94275ab1b57@.service; enabled; vendor preset: enabled)
Active: active (running) since Wed 2024-07-31 08:23:52 AEST; 1s ago
Main PID: 347496 (bash)
Tasks: 10 (limit: 18618)
Memory: 8.9M
CGroup: /system.slice/system-ceph\x2d7f3df55a\x2d4e3b\x2d11ef\x2da674\x2df94275ab1b57.slice/ceph-7f3df55a-4e3b-11ef-a674-f94275ab1b57@nvmeof.nvmeof_pool01.test-Standard-PC-i440FX-PIIX-1996.wnrzxa.service
├─347496 /bin/bash /var/lib/ceph/7f3df55a-4e3b-11ef-a674-f94275ab1b57/nvmeof.nvmeof_pool01.test-Standard-PC-i440FX-PIIX-1996.wnrzxa/unit.run
└─347514 /usr/bin/docker run --rm --ipc=host --stop-signal=SIGTERM --ulimit nofile=1048576 --net=host --init --name ceph-7f3df55a-4e3b-11ef-a674-f94275ab1b57-nvmeof-nvmeof_pool01-test-Standard-PC-i440FX-PIIX-1996-wnrzxa --p>
Jul 31 08:23:53 test-Standard-PC-i440FX-PIIX-1996 bash[347514]: [30-Jul-2024 22:23:53] INFO server.py:91 (7): Starting gateway client.nvmeof.nvmeof_pool01.test-Standard-PC-i440FX-PIIX-1996.wnrzxa
Jul 31 08:23:53 test-Standard-PC-i440FX-PIIX-1996 bash[347514]: [30-Jul-2024 22:23:53] INFO server.py:162 (7): Starting serve, monitor client version: ceph version 19.0.0-4672-g712d9957 (712d9957d9f2a12f0c34bc0475710fa23e01d609) squid (>
Jul 31 08:23:53 test-Standard-PC-i440FX-PIIX-1996 bash[347514]: [30-Jul-2024 22:23:53] INFO state.py:378 (7): nvmeof.None.state OMAP object already exists.
Jul 31 08:23:53 test-Standard-PC-i440FX-PIIX-1996 bash[347514]: [30-Jul-2024 22:23:53] INFO server.py:244 (7): Starting /usr/bin/ceph-nvmeof-monitor-client --gateway-name client.nvmeof.nvmeof_pool01.test-Standard-PC-i440FX-PIIX-1996.wnr>
Jul 31 08:23:53 test-Standard-PC-i440FX-PIIX-1996 bash[347514]: [30-Jul-2024 22:23:53] INFO server.py:248 (7): monitor client process id: 24
root@test-Standard-PC-i440FX-PIIX-1996:/home/test# nvmeof-cli subsystem add --subsystem nqn.2016-06.io.spdk:ceph
Failure adding subsystem nqn.2016-06.io.spdk:ceph:
<_InactiveRpcError of RPC that terminated with:
status = StatusCode.UNAVAILABLE
details = "failed to connect to all addresses; last error: UNKNOWN: ipv4:<ip>:5500: Failed to connect to remote host: Connection refused"
debug_error_string = "UNKNOWN:failed to connect to all addresses; last error: UNKNOWN: ipv4:<ip>:5500: Failed to connect to remote host: Connection refused {created_time:"2024-07-30T22:25:10.214187073+00:00", grpc_status:14}"
>
The listener
add and host
add to an existing subsystem are working as expected in v19.1.0(rc)
root@test-Standard-PC-i440FX-PIIX-1996:/home/test# nvmeof-cli listener add --subsystem nqn.2016-06.io.spdk:ceph --host-name test-Standard-PC-i440FX-PIIX-1996 --traddr <ip>--trsvcid 4420
Adding nqn.2016-06.io.spdk:ceph listener at <ip>:4420: Successful
root@test-Standard-PC-i440FX-PIIX-1996:/home/test# nvmeof-cli host add --subsystem nqn.2016-06.io.spdk:ceph --host "*"
Allowing open host access to nqn.2016-06.io.spdk:ceph: Successful
@Peratchi-Kannan what issues are you still having now? were you able to add ns?
@caroav I am not able to add namespace to the subsystem. When I try to add namespace, it throws an exception saying Exception calling application: Chosen ANA group is 0
@caroav I am not able to add namespace to the subsystem. When I try to add namespace, it throws an exception saying
Exception calling application: Chosen ANA group is 0
Hi @Peratchi-Kannan, I met the same issue when running the latest Ceph container image quay.ceph.io/ceph-ci/ceph:main
, @caroav told me to build ceph on top of PR 54671 but without commit "nvmeof gw monitor: disable by default". You can try this image: quay.ceph.io/ceph-ci/ceph:ceph-nvmeof-mon-arm64-testin
(ignore the tag name contains arm64, it should be x86 arch container image)
@Peratchi-Kannan see comment above from @xin3liang. We are planning to remove the "nvmeof gw monitor: disable by default" permanently from ceph. But this is pending on some cosmetic changes that we were asked to do. The changes are ongoing so I really hope that we could do that very soon. Meanwhile, you need to build as described in last comment.
Hi @xin3liang , The nvmeof service does not start when using quay.ceph.io/ceph-ci/ceph:ceph-nvmeof-mon-arm64-testin
image.
Hi @xin3liang , The nvmeof service does not start when using
quay.ceph.io/ceph-ci/ceph:ceph-nvmeof-mon-arm64-testin
image.
Hi @Peratchi-Kannan, I just verified the aarch64 image, not the x86 one. FYI, here are my cephadm deployment record and steps: https://linaro.atlassian.net/browse/STOR-272
Hi @Peratchi-Kannan, you could try this Ceph image: quay.ceph.io/ceph-ci/ceph:main-nvmeof
with the latest nvmeof images: quay.io/barakda1/nvmeof:latest
and quay.io/barakda1/nvmeof-cli:latest
.
I see the main-nvmeof
branch revert commit "nvmeof gw monitor: disable by default" https://github.com/ceph/ceph-ci/commits/main-nvmeof/
Hi @xin3liang ,
I confirm nvmeof works as expected with Ceph image: quay.ceph.io/ceph-ci/ceph:main-nvmeof
, and nvmeof images: quay.io/barakda1/nvmeof:latest
and quay.io/barakda1/nvmeof-cli:latest
Thanks
Hello everyone,
I am facing the same issue. I got nvmeof working under 18.2 but somehow it got broken, so I decided to upgrade to 19.2 that was just released. I am using 1.3.2 version and I am getting stuck at adding it to the namespace(Exception: chosen ANA group is 0), basically the same as @alarmed-ground has reported. I would like to stay with 19.2, not building it from the source, but get nvmeof working, even without the HA functionality for now. Any ideas how to do that ?
Hello everyone,
I am able to replicate @RobertLukan's situation. I am using quay.io/ceph/nvmeof:1.3.2
and quay.io/ceph/nvmeof-cli:1.3.2
, I upgraded the cluster from v19.1.0(rc) to v19.2.0 for my testing
Hello everyone,
Just a quick update. I was able to add new namespace from the WebUI instead of using the command(v1.3.2).
I had one namespace before the upgrade and created one after the upgrade.
Interestingly, after adding the new namespace from the WebUI, nvmeof-cli namespace list --subsystem nqn.2016-06.io.spdk:ceph
lists both namespaces and I am able to discover the subsystem from the client.
I am unable to connect to both namespaces in the client but I am able to discover the subsystem. My nvme version on the client is nvme version 1.16
As others have stated, things are not working with Reef 18.2.4.
I've tried using the stock nvmeof 1.0.0 version, 1.2.16, and 1.2.17 and all fail to keep the gateway up and running. Since the 1.0.0 version is deprecated and not recommended for use, I won't provide details about that, but the run log is as follows:
[04-Oct-2024 02:32:34] INFO utils.py:258 (2): Initialize gateway log level to "INFO"
[04-Oct-2024 02:32:34] INFO utils.py:271 (2): Log files will be saved in /var/log/ceph/nvmeof-client.nvmeof.mypool.myserver.aewiyx, using rotation
[04-Oct-2024 02:32:34] INFO config.py:78 (2): Using NVMeoF gateway version 1.2.17
[04-Oct-2024 02:32:34] INFO config.py:81 (2): Configured SPDK version 24.01
[04-Oct-2024 02:32:34] INFO config.py:84 (2): Using vstart cluster version based on 18.2.4
[04-Oct-2024 02:32:34] INFO config.py:87 (2): NVMeoF gateway built on: 2024-07-30 15:47:38 UTC
[04-Oct-2024 02:32:34] INFO config.py:90 (2): NVMeoF gateway Git repository: https://github.com/ceph/ceph-nvmeof
[04-Oct-2024 02:32:34] INFO config.py:93 (2): NVMeoF gateway Git branch: tags/1.2.17
[04-Oct-2024 02:32:34] INFO config.py:96 (2): NVMeoF gateway Git commit: 887c7841f275a0cbc00eddb8a038cde3935b95ba
[04-Oct-2024 02:32:34] INFO config.py:102 (2): SPDK Git repository: https://github.com/ceph/spdk.git
[04-Oct-2024 02:32:34] INFO config.py:105 (2): SPDK Git branch: undefined
[04-Oct-2024 02:32:34] INFO config.py:108 (2): SPDK Git commit: a16bb032516da05ea2b7c38fd0ad18e8a7190440
[04-Oct-2024 02:32:34] INFO config.py:59 (2): Using configuration file /src/ceph-nvmeof.conf
[04-Oct-2024 02:32:34] INFO config.py:61 (2): ====================================== Configuration file content ======================================
[04-Oct-2024 02:32:34] INFO config.py:65 (2): # This file is generated by cephadm.
[04-Oct-2024 02:32:34] INFO config.py:65 (2): [gateway]
[04-Oct-2024 02:32:34] INFO config.py:65 (2): name = client.nvmeof.mypool.myserver.aewiyx
[04-Oct-2024 02:32:34] INFO config.py:65 (2): group = None
[04-Oct-2024 02:32:34] INFO config.py:65 (2): addr = 10.20.30.40
[04-Oct-2024 02:32:34] INFO config.py:65 (2): port = 5500
[04-Oct-2024 02:32:34] INFO config.py:65 (2): enable_auth = False
[04-Oct-2024 02:32:34] INFO config.py:65 (2): state_update_notify = True
[04-Oct-2024 02:32:34] INFO config.py:65 (2): state_update_interval_sec = 5
[04-Oct-2024 02:32:34] INFO config.py:65 (2): enable_prometheus_exporter = True
[04-Oct-2024 02:32:34] INFO config.py:65 (2): prometheus_exporter_ssl = False
[04-Oct-2024 02:32:34] INFO config.py:65 (2): prometheus_port = 10008
[04-Oct-2024 02:32:34] INFO config.py:65 (2):
[04-Oct-2024 02:32:34] INFO config.py:65 (2): [ceph]
[04-Oct-2024 02:32:34] INFO config.py:65 (2): pool = mypool
[04-Oct-2024 02:32:34] INFO config.py:65 (2): config_file = /etc/ceph/ceph.conf
[04-Oct-2024 02:32:34] INFO config.py:65 (2): id = nvmeof.mypool.myserver.aewiyx
[04-Oct-2024 02:32:34] INFO config.py:65 (2):
[04-Oct-2024 02:32:34] INFO config.py:65 (2): [mtls]
[04-Oct-2024 02:32:34] INFO config.py:65 (2): server_key = ./server.key
[04-Oct-2024 02:32:34] INFO config.py:65 (2): client_key = ./client.key
[04-Oct-2024 02:32:34] INFO config.py:65 (2): server_cert = ./server.crt
[04-Oct-2024 02:32:34] INFO config.py:65 (2): client_cert = ./client.crt
[04-Oct-2024 02:32:34] INFO config.py:65 (2):
[04-Oct-2024 02:32:34] INFO config.py:65 (2): [spdk]
[04-Oct-2024 02:32:34] INFO config.py:65 (2): tgt_path = /usr/local/bin/nvmf_tgt
[04-Oct-2024 02:32:34] INFO config.py:65 (2): rpc_socket = /var/tmp/spdk.sock
[04-Oct-2024 02:32:34] INFO config.py:65 (2): timeout = 60
[04-Oct-2024 02:32:34] INFO config.py:65 (2): log_level = WARN
[04-Oct-2024 02:32:34] INFO config.py:65 (2): conn_retries = 10
[04-Oct-2024 02:32:34] INFO config.py:65 (2): transports = tcp
[04-Oct-2024 02:32:34] INFO config.py:65 (2): transport_tcp_options = {"in_capsule_data_size": 8192, "max_io_qpairs_per_ctrlr": 7}
[04-Oct-2024 02:32:34] INFO config.py:65 (2): tgt_cmd_extra_args = --cpumask=0xFF
[04-Oct-2024 02:32:34] INFO config.py:66 (2): ========================================================================================================
[04-Oct-2024 02:32:34] INFO server.py:91 (2): Starting gateway client.nvmeof.mypool.myserver.aewiyx
[04-Oct-2024 02:32:34] INFO server.py:162 (2): Starting serve, monitor client version: ceph version 19.0.0-4996-g0ec90b1e (0ec90b1e61a7489b13d6d8432156a0417f35db7f) squid (dev)
[04-Oct-2024 02:32:35] INFO state.py:387 (2): nvmeof.None.state OMAP object already exists.
[04-Oct-2024 02:32:35] INFO server.py:252 (2): Starting /usr/bin/ceph-nvmeof-monitor-client --gateway-name client.nvmeof.mypool.myserver.aewiyx --gateway-address 10.20.30.40:5500 --gateway-pool mypool --gateway-group None --monitor-group-address 10.20.30.40:5499 -c /etc/ceph/ceph.conf -n client.nvmeof.mypool.myserver.aewiyx -k /etc/ceph/keyring
[04-Oct-2024 02:32:35] INFO server.py:256 (2): monitor client process id: 19
[04-Oct-2024 02:32:35] INFO server.py:151 (2): MonitorGroup server is listening on 10.20.30.40:5499 for group id
[04-Oct-2024 02:34:17] ERROR server.py:42 (2): GatewayServer: SIGCHLD received signum=17
[04-Oct-2024 02:34:17] ERROR server.py:46 (2): PID of terminated child process is 19
[04-Oct-2024 02:34:17] ERROR server.py:111 (2): GatewayServer exception occurred:
Traceback (most recent call last):
File "/src/control/__main__.py", line 38, in <module>
gateway.serve()
File "/src/control/server.py", line 173, in serve
self._start_monitor_client()
File "/src/control/server.py", line 258, in _start_monitor_client
self._wait_for_group_id()
File "/src/control/server.py", line 152, in _wait_for_group_id
self.monitor_event.wait()
File "/usr/lib64/python3.9/threading.py", line 581, in wait
signaled = self._cond.wait(timeout)
File "/usr/lib64/python3.9/threading.py", line 312, in wait
waiter.acquire()
File "/src/control/server.py", line 55, in sigchld_handler
raise SystemExit(f"Gateway subprocess terminated {pid=} {exit_code=}")
SystemExit: Gateway subprocess terminated pid=19 exit_code=-6
[04-Oct-2024 02:34:17] INFO server.py:448 (2): Aborting (client.nvmeof.mypool.myserver.aewiyx) pid 19...
[04-Oct-2024 02:34:17] INFO state.py:545 (2): Cleanup OMAP on exit (gateway-client.nvmeof.mypool.myserver.aewiyx)
[04-Oct-2024 02:34:17] INFO server.py:137 (2): Exiting the gateway process.
Has anyone found a combination that works ? I tried 1.2.17, 1.1, 1.3.1, 1.3.2 without success.
The nvmeof is not a part of the official ceph reef and squid branches. It was approved to be merged to main long after that reef and squid were created. It will be a part of the next ceph upstream release. For now, anyone that needs the nvmeof to be working with reef or squid, you can build ceph from - https://github.com/ceph/ceph-ci/tree/squid-nvmeof , or https://github.com/ceph/ceph-ci/tree/reef-nvmeof.
The nvmeof is not a part of the official ceph reef and squid branches.
If this is the case, why do https://docs.ceph.com/en/reef/rbd/nvmeof-overview/ and https://docs.ceph.com/en/squid/rbd/nvmeof-overview/ exist?
The official Ceph documentation suggests that NVMe-oF is working since version 18.
I understand that the HA feature is not yet part of Ceph reef/squid, but I wonder why non-HA is not part of it ? Especially, I managed to get it working with nvmeof version 1.0.0, but unfortunately the integration has not survived the reboot of hosts.
There is also the ability to deploy an NVME-oF gateway in the current Reef dashboard, so there is definitely a disconnect as to what is production ready.
There is no - non ha mode. A single gw is a also managed by the ceph mon. I need to check about the documentation and we need to fix it is misleading. In any case, as I suggested, you can build ceph from the branches I mentioned and get it working.
The nvmeof is not a part of the official ceph reef and squid branches. It was approved to be merged to main long after that reef and squid were created. It will be a part of the next ceph upstream release. For now, anyone that needs the nvmeof to be working with reef or squid, you can build ceph from - https://github.com/ceph/ceph-ci/tree/squid-nvmeof , or https://github.com/ceph/ceph-ci/tree/reef-nvmeof.
@caroav, is there any word on if this will merge with a reef/squid update? I'm not familiar enough with how Ceph does feature and patching lifecycles.
is there any word on if this will merge with a reef/squid update? I'm not familiar enough with how Ceph does feature and patching lifecycles. I don't think it will be a part of reef and squid. @oritwas @neha-ojha @athanatos can you share your view?
@caroav Which PRs need to be backported? The process would be that you backport the relevant PRs/commits and open PRs against squid/reef.
The nvmeof is not a part of the official ceph reef and squid branches. It was approved to be merged to main long after that reef and squid were created. It will be a part of the next ceph upstream release. For now, anyone that needs the nvmeof to be working with reef or squid, you can build ceph from - https://github.com/ceph/ceph-ci/tree/squid-nvmeof , or https://github.com/ceph/ceph-ci/tree/reef-nvmeof.
I have tired to build these and they fail at Building CXX object src/librbd/CMakeFiles/rbd_api.dir/librbd.cc.o , where can I go to troubleshoot this build issue? I can provide more details if needed but I don't want to use this thread for that is I should be using some other resource. Here is one line of the error
/root/rpmbuild/BUILD/ceph-19.1.0-1427-g73d4dbc2f9d/src/librbd/librbd.cc: In member function 'int librbd::RBD::open(librados::v14_2_0::IoCtx&, librbd::Image&, const char, const char)':
/root/rpmbuild/BUILD/ceph-19.1.0-1427-g73d4dbc2f9d/src/librbd/librbd.cc:525:5: error: expected primary-expression before ',' token
525 | tracepoint(librbd, open_image_enter, ictx, ictx->name.c_str(), ictx->id.c_str(), ictx->snap_name.c_str(), ictx->read_only);
| ^~~~~~
I am trying to set up ceph-nvmeof 1.2.9 on Reef. Fresh cluster installed a few hours ago with cephadm. Deployed as per documentation. nvmeof fails to come up, logging messages I see are
May 23 12:55:14 ceph2 bash[119529]: 2024-05-23T12:55:14.333+0000 70f605860640 0 nvmeofgw void NVMeofGwMonitorClient::tick() May 23 12:55:14 ceph2 bash[119529]: 2024-05-23T12:55:14.333+0000 70f605860640 0 nvmeofgw bool get_gw_state(const char*, const std::map<std::pair<std::cxx11::basic_string, std::__cxx11::basic_string >, std::map<std:: cxx11::basic_string, NvmeGwState> >&, const NvmeGroupKey&, const NvmeGwId&, NvmeGwState&) can not find group (nvme,None) old map map: {}
May 23 12:55:14 ceph2 bash[119529]: 2024-05-23T12:55:14.333+0000 70f605860640 0 nvmeofgw void NVMeofGwMonitorClient::send_beacon() sending beacon as gid 24694 availability 0 osdmap_epoch 0 gwmap_epoch 0
May 23 12:55:14 ceph2 bash[76745]: debug 2024-05-23T12:55:14.333+0000 785f205e5700 0 can't decode unknown message type 2049 MSG_AUTH=17
May 23 12:55:14 ceph2 bash[119529]: 2024-05-23T12:55:14.333+0000 70f609868640 0 client.0 ms_handle_reset on v2:10.4.3.11:3300/0
May 23 12:55:14 ceph2 bash[119529]: 2024-05-23T12:55:14.333+0000 70f609868640 0 client.0 ms_handle_reset on v2:10.4.3.11:3300/0
May 23 12:55:14 ceph2 bash[119529]: 2024-05-23T12:55:14.337+0000 70f609868640 0 nvmeofgw virtual bool NVMeofGwMonitorClient::ms_dispatch2(ceph::ref_t&) got map type 4
May 23 12:55:14 ceph2 bash[119529]: 2024-05-23T12:55:14.337+0000 70f609868640 0 ms_deliver_dispatch: unhandled message 0x5e584cc24820 mon_map magic: 0 from mon.1 v2:10.4.3.11:3300/0
Another message is
May 23 12:57:26 ceph1 bash[146371]: 1: [v2:10.4.3.11:3300/0,v1:10.4.3.11:6789/0] mon.ceph2 May 23 12:57:26 ceph1 bash[146371]: 2: [v2:10.4.3.12:3300/0,v1:10.4.3.12:6789/0] mon.ceph3 May 23 12:57:26 ceph1 bash[146371]: -12> 2024-05-23T12:57:24.746+0000 73e68f1de640 0 nvmeofgw virtual bool NVMeofGwMonitorClient::ms_dispatch2(ceph::ref_t&) got map type 4
May 23 12:57:26 ceph1 bash[146371]: -11> 2024-05-23T12:57:24.746+0000 73e68f1de640 0 ms_deliver_dispatch: unhandled message 0x5757d2e9d380 mon_map magic: 0 from mon.0 v2:10.4.3.10:3300/0
May 23 12:57:26 ceph1 bash[146371]: -10> 2024-05-23T12:57:24.746+0000 73e68f1de640 10 monclient: handle_config config(2 keys)
May 23 12:57:26 ceph1 bash[146371]: -9> 2024-05-23T12:57:24.746+0000 73e68d9db640 4 set_mon_vals callback ignored cluster_network
May 23 12:57:26 ceph1 bash[146371]: -8> 2024-05-23T12:57:24.746+0000 73e68d9db640 4 set_mon_vals callback ignored container_image
May 23 12:57:26 ceph1 bash[146371]: -7> 2024-05-23T12:57:24.746+0000 73e68d9db640 4 nvmeofgw NVMeofGwMonitorClient::init()::<lambda()> nvmeof monc config notify callback
May 23 12:57:26 ceph1 bash[146371]: -6> 2024-05-23T12:57:25.654+0000 73e68d1da640 10 monclient: tick
May 23 12:57:26 ceph1 bash[146371]: -5> 2024-05-23T12:57:25.654+0000 73e68d1da640 10 monclient: _check_auth_tickets
May 23 12:57:26 ceph1 bash[146371]: -4> 2024-05-23T12:57:26.654+0000 73e68d1da640 10 monclient: tick
May 23 12:57:26 ceph1 bash[146371]: -3> 2024-05-23T12:57:26.654+0000 73e68d1da640 10 monclient: _check_auth_tickets
May 23 12:57:26 ceph1 bash[146371]: -2> 2024-05-23T12:57:26.742+0000 73e68b1d6640 0 nvmeofgw void NVMeofGwMonitorClient::tick()
May 23 12:57:26 ceph1 bash[146371]: -1> 2024-05-23T12:57:26.742+0000 73e68b1d6640 4 nvmeofgw void NVMeofGwMonitorClient::disconnect_panic() Triggering a panic upon disconnection from the monitor, elapsed 102, configured disconnect panic duration 100
May 23 12:57:26 ceph1 bash[146371]: 0> 2024-05-23T12:57:26.746+0000 73e68b1d6640 -1 * Caught signal (Aborted)
May 23 12:57:26 ceph1 bash[146371]: in thread 73e68b1d6640 thread_name:safe_timer
Cluster has a cluster network configured and I saw some messages about the option not being able to be changed at runtime. I did add it to ceph.conf though for the target, so that should be good. Any help will be greatly appreciated. Thank you in advance.