Unable to list Namespaces on initiator due to incorrect ANA state from preferred gateway

sunilkumarn417 commented 6 months ago

Unable to list namespaces hosted from one of the subsytem which is directly asssociated to Gateway (say GW1) Via load-balancing-group id (say 1).

**Gateways and its details**
---------------------------

[root@ceph-1sunilkumar-z1afhw-node6 cephuser]# podman run --rm -it quay.io/barakda1/nvmeof-cli:8677ba3 --server-address 10.0.208.213 --server-port 5500 gw info
CLI's version: 1.0.0
Gateway's version: 1.0.0
Gateway's name: client.nvmeof.rbd.ceph-1sunilkumar-z1afhw-node6.qzgyja
Gateway's load balancing group: 3
Gateway's address: 10.0.208.213
Gateway's port: 5500
SPDK version: 23.01.1

[root@ceph-1sunilkumar-z1afhw-node7 cephuser]# podman run --rm -it quay.io/barakda1/nvmeof-cli:8677ba3 --server-address 10.0.209.68 --server-port 5500 gw info
CLI's version: 1.0.0
Gateway's version: 1.0.0
Gateway's name: client.nvmeof.rbd.ceph-1sunilkumar-z1afhw-node7.vnbduo
Gateway's load balancing group: 1
Gateway's address: 10.0.209.68
Gateway's port: 5500
SPDK version: 23.01.1

**Namespaces associated with LBgroup Ids**
-------------------------------------------
[root@ceph-1sunilkumar-z1afhw-node6 cephuser]# podman run --rm -it quay.io/barakda1/nvmeof-cli:8677ba3 --server-address 10.0.208.213 --server-port 5500 namespace list -n nqn.2016-06.io.spdk:sub1
Namespaces in subsystem nqn.2016-06.io.spdk:sub1:
╒════════╤════════════════════════╤════════╤══════════════╤═════════╤═════════╤═════════════════════╤═════════════╤═══════════╤═══════════╤════════════╤═════════════╕
│   NSID │ Bdev                   │ RBD    │ RBD          │ Image   │ Block   │ UUID                │        Load │ R/W IOs   │ R/W MBs   │ Read MBs   │ Write MBs   │
│        │ Name                   │ Pool   │ Image        │ Size    │ Size    │                     │   Balancing │ per       │ per       │ per        │ per         │
│        │                        │        │              │         │         │                     │       Group │ second    │ second    │ second     │ second      │
╞════════╪════════════════════════╪════════╪══════════════╪═════════╪═════════╪═════════════════════╪═════════════╪═══════════╪═══════════╪════════════╪═════════════╡
│      1 │ bdev_836f98c2-a90d-    │ rbd    │ sub1_image_1 │ 10 GiB  │ 512 B   │ 836f98c2-a90d-4bf7- │           1 │ unlimited │ unlimited │ unlimited  │ unlimited   │
│        │ 4bf7-9942-7160e04eb81f │        │              │         │         │ 9942-7160e04eb81f   │             │           │           │            │             │
├────────┼────────────────────────┼────────┼──────────────┼─────────┼─────────┼─────────────────────┼─────────────┼───────────┼───────────┼────────────┼─────────────┤
│      2 │ bdev_bff93e4f-099a-    │ rbd    │ sub1_image_2 │ 10 GiB  │ 512 B   │ bff93e4f-099a-40dc- │           1 │ unlimited │ unlimited │ unlimited  │ unlimited   │
│        │ 40dc-8dba-c31b7bcd14e9 │        │              │         │         │ 8dba-c31b7bcd14e9   │             │           │           │            │             │
╘════════╧════════════════════════╧════════╧══════════════╧═════════╧═════════╧═════════════════════╧═════════════╧═══════════╧═══════════╧════════════╧═════════════╛

[root@ceph-1sunilkumar-z1afhw-node6 cephuser]# podman run --rm -it quay.io/barakda1/nvmeof-cli:8677ba3 --server-address 10.0.208.213 --server-port 5500 namespace list -n nqn.2016-06.io.spdk:sub2
Namespaces in subsystem nqn.2016-06.io.spdk:sub2:
╒════════╤════════════════════════╤════════╤══════════════╤═════════╤═════════╤═════════════════════╤═════════════╤═══════════╤═══════════╤════════════╤═════════════╕
│   NSID │ Bdev                   │ RBD    │ RBD          │ Image   │ Block   │ UUID                │        Load │ R/W IOs   │ R/W MBs   │ Read MBs   │ Write MBs   │
│        │ Name                   │ Pool   │ Image        │ Size    │ Size    │                     │   Balancing │ per       │ per       │ per        │ per         │
│        │                        │        │              │         │         │                     │       Group │ second    │ second    │ second     │ second      │
╞════════╪════════════════════════╪════════╪══════════════╪═════════╪═════════╪═════════════════════╪═════════════╪═══════════╪═══════════╪════════════╪═════════════╡
│      1 │ bdev_3d18c920-f869-    │ rbd    │ sub2_image_2 │ 10 GiB  │ 512 B   │ 3d18c920-f869-4f26- │           3 │ unlimited │ unlimited │ unlimited  │ unlimited   │
│        │ 4f26-a2c5-e83624e919d6 │        │              │         │         │ a2c5-e83624e919d6   │             │           │           │            │             │
├────────┼────────────────────────┼────────┼──────────────┼─────────┼─────────┼─────────────────────┼─────────────┼───────────┼───────────┼────────────┼─────────────┤
│      2 │ bdev_ee654a71-0c2b-    │ rbd    │ sub2_image_1 │ 10 GiB  │ 512 B   │ ee654a71-0c2b-4094- │           3 │ unlimited │ unlimited │ unlimited  │ unlimited   │
│        │ 4094-9caf-a62427890f6d │        │              │         │         │ 9caf-a62427890f6d   │             │           │           │            │             │
╘════════╧════════════════════════╧════════╧══════════════╧═════════╧═════════╧═════════════════════╧═════════════╧═══════════╧═══════════╧════════════╧═════════════╛

RBD images at Ceph-RBD
---------------------------
[ceph: root@ceph-1sunilkumar-z1afhw-node1-installer /]# rbd du
NAME          PROVISIONED  USED
sub1_image_1       10 GiB   0 B
sub1_image_2       10 GiB   0 B
sub2_image_1       10 GiB   0 B
sub2_image_2       10 GiB   0 B
<TOTAL>            40 GiB   0 B

**Listeners enabled on all subsystems for all Gateways**
------------------------------------------------------
[root@ceph-1sunilkumar-z1afhw-node6 cephuser]# podman run --rm -it quay.io/barakda1/nvmeof-cli:8677ba3 --server-address 10.0.208.213 --server-port 5500 listener list -n nqn.2016-06.io.spdk:sub1
Listeners for nqn.2016-06.io.spdk:sub1:
╒════════════════════════════════════════════════════════╤═════════════╤══════════════════╤═══════════════════╕
│ Gateway                                                │ Transport   │ Address Family   │ Address           │
╞════════════════════════════════════════════════════════╪═════════════╪══════════════════╪═══════════════════╡
│ client.nvmeof.rbd.ceph-1sunilkumar-z1afhw-node6.qzgyja │ TCP         │ IPv4             │ 10.0.208.213:4420 │
├────────────────────────────────────────────────────────┼─────────────┼──────────────────┼───────────────────┤
│ client.nvmeof.rbd.ceph-1sunilkumar-z1afhw-node7.vnbduo │ TCP         │ IPv4             │ 10.0.209.68:4420  │
╘════════════════════════════════════════════════════════╧═════════════╧══════════════════╧═══════════════════╛

[root@ceph-1sunilkumar-z1afhw-node6 cephuser]# podman run --rm -it quay.io/barakda1/nvmeof-cli:8677ba3 --server-address 10.0.208.213 --server-port 5500 listener list -n nqn.2016-06.io.spdk:sub2
Listeners for nqn.2016-06.io.spdk:sub2:
╒════════════════════════════════════════════════════════╤═════════════╤══════════════════╤═══════════════════╕
│ Gateway                                                │ Transport   │ Address Family   │ Address           │
╞════════════════════════════════════════════════════════╪═════════════╪══════════════════╪═══════════════════╡
│ client.nvmeof.rbd.ceph-1sunilkumar-z1afhw-node6.qzgyja │ TCP         │ IPv4             │ 10.0.208.213:4420 │
├────────────────────────────────────────────────────────┼─────────────┼──────────────────┼───────────────────┤
│ client.nvmeof.rbd.ceph-1sunilkumar-z1afhw-node7.vnbduo │ TCP         │ IPv4             │ 10.0.209.68:4420  │
╘════════════════════════════════════════════════════════╧═════════════╧══════════════════╧═══════════════════╛

At Client Side

As we can notice below, the namespaces from subsystem1 is not connected.

[root@ceph-1sunilkumar-z1afhw-node8 cephuser]# nvme connect-all --traddr 10.0.209.68 --transport=tcp

[root@ceph-1sunilkumar-z1afhw-node8 cephuser]# nvme list
Node                  Generic               SN                   Model                                    Namespace  Usage                      Format           FW Rev
--------------------- --------------------- -------------------- ---------------------------------------- ---------- -------------------------- ---------------- --------
/dev/nvme3n2          /dev/ng3n2            2                    Ceph bdev Controller                     0x2         10.74  GB /  10.74  GB    512   B +  0 B   23.01.1
/dev/nvme3n1          /dev/ng3n1            2                    Ceph bdev Controller                     0x1         10.74  GB /  10.74  GB    512   B +  0 B   23.01.1

[root@ceph-1sunilkumar-z1afhw-node8 cephuser]# lsblk
NAME    MAJ:MIN RM  SIZE RO TYPE MOUNTPOINTS
sr0      11:0    1  514K  0 rom
vda     252:0    0   80G  0 disk
├─vda1  252:1    0    1M  0 part
├─vda2  252:2    0  200M  0 part /boot/efi
├─vda3  252:3    0  600M  0 part /boot
└─vda4  252:4    0 79.2G  0 part /
nvme3n1 259:5    0   10G  0 disk        ---> nvme namespace from nqn.2016-06.io.spdk:sub2
nvme3n2 259:7    0   10G  0 disk        ---> nvme namespace from nqn.2016-06.io.spdk:sub2

[root@ceph-1sunilkumar-z1afhw-node8 cephuser]# nvme list-subsys
nvme-subsys3 - NQN=nqn.2016-06.io.spdk:sub2
\
 +- nvme3 tcp traddr=10.0.208.213,trsvcid=4420,src_addr=10.0.210.4 live
 +- nvme4 tcp traddr=10.0.209.68,trsvcid=4420,src_addr=10.0.210.4 live
nvme-subsys1 - NQN=nqn.2016-06.io.spdk:sub1
\
 +- nvme2 tcp traddr=10.0.209.68,trsvcid=4420,src_addr=10.0.210.4 live
 +- nvme1 tcp traddr=10.0.208.213,trsvcid=4420,src_addr=10.0.210.4 live

[root@ceph-1sunilkumar-z1afhw-node8 cephuser]# nvme list-subsys /dev/nvme3n1
nvme-subsys3 - NQN=nqn.2016-06.io.spdk:sub2
\
 +- nvme3 tcp traddr=10.0.208.213,trsvcid=4420,src_addr=10.0.210.4 live optimized
 +- nvme4 tcp traddr=10.0.209.68,trsvcid=4420,src_addr=10.0.210.4 live inaccessible
[root@ceph-1sunilkumar-z1afhw-node8 cephuser]#

[root@ceph-1sunilkumar-z1afhw-node8 cephuser]# nvme list-subsys /dev/nvme3n2
nvme-subsys3 - NQN=nqn.2016-06.io.spdk:sub2
\
 +- nvme3 tcp traddr=10.0.208.213,trsvcid=4420,src_addr=10.0.210.4 live optimized
 +- nvme4 tcp traddr=10.0.209.68,trsvcid=4420,src_addr=10.0.210.4 live inaccessible

ANA States from both Gateways

GW: node6 , LBGroup-Id :3
-----------------------------
[root@ceph-1sunilkumar-z1afhw-node6 src]# /usr/libexec/spdk/scripts/rpc.py  nvmf_subsystem_get_listeners nqn.2016-06.io.spdk:sub1 | head -n 24
[
  {
    "address": {
      "trtype": "TCP",
      "adrfam": "IPv4",
      "traddr": "10.0.208.213",
      "trsvcid": "4420"
    },
    "ana_states": [
      {
        "ana_group": 1,
        "ana_state": "inaccessible"
      },
      {
        "ana_group": 2,
        "ana_state": "inaccessible"
      },
      {
        "ana_group": 3,
        "ana_state": "optimized"
      },
      {
        "ana_group": 4,
        "ana_state": "inaccessible"

[root@ceph-1sunilkumar-z1afhw-node6 src]# /usr/libexec/spdk/scripts/rpc.py  nvmf_subsystem_get_listeners nqn.2016-06.io.spdk:sub2 | head -n 24
[
  {
    "address": {
      "trtype": "TCP",
      "adrfam": "IPv4",
      "traddr": "10.0.208.213",
      "trsvcid": "4420"
    },
    "ana_states": [
      {
        "ana_group": 1,
        "ana_state": "inaccessible"
      },
      {
        "ana_group": 2,
        "ana_state": "inaccessible"
      },
      {
        "ana_group": 3,
        "ana_state": "optimized"
      },
      {
        "ana_group": 4,
        "ana_state": "inaccessible"

GW : node7, LBGroup ID: 1
---------------------------
[root@ceph-1sunilkumar-z1afhw-node7 src]# /usr/libexec/spdk/scripts/rpc.py  nvmf_subsystem_get_listeners nqn.2016-06.io.spdk:sub1 | head -n 24
[
  {
    "address": {
      "trtype": "TCP",
      "adrfam": "IPv4",
      "traddr": "10.0.209.68",
      "trsvcid": "4420"
    },
    "ana_states": [
      {
        "ana_group": 1,
        "ana_state": "inaccessible"
      },
      {
        "ana_group": 2,
        "ana_state": "inaccessible"
      },
      {
        "ana_group": 3,
        "ana_state": "inaccessible"
      },
      {
        "ana_group": 4,
        "ana_state": "inaccessible"
[root@ceph-1sunilkumar-z1afhw-node7 src]# /usr/libexec/spdk/scripts/rpc.py  nvmf_subsystem_get_listeners nqn.2016-06.io.spdk:sub2 | head -n 24
[
  {
    "address": {
      "trtype": "TCP",
      "adrfam": "IPv4",
      "traddr": "10.0.209.68",
      "trsvcid": "4420"
    },
    "ana_states": [
      {
        "ana_group": 1,
        "ana_state": "inaccessible"
      },
      {
        "ana_group": 2,
        "ana_state": "inaccessible"
      },
      {
        "ana_group": 3,
        "ana_state": "inaccessible"
      },
      {
        "ana_group": 4,
        "ana_state": "inaccessible"

sunilkumarn417 commented 6 months ago

Ceph Builds

http://quay.io/barakda1/ceph:47ea673ae9ebf51b2ebc505093bd7272422045e4 quay.io/barakda1/nvmeof:8677ba3 quay.io/barakda1/nvmeof-cli:8677ba3

caroav commented 6 months ago

@sunilkumarn417 can you please add the output of: host list -n , for both subsystems.

sunilkumarn417 commented 6 months ago

@sunilkumarn417 can you please add the output of: host list -n , for both subsystems.

[root@ceph-1sunilkumar-z1afhw-node6 cephuser]# podman run --rm -it quay.io/barakda1/nvmeof-cli:8677ba3 --server-address 10.0.208.213 --server-port 5500 host list -n nqn.2016-06.io.spdk:sub2 Hosts allowed to access nqn.2016-06.io.spdk:sub2: ╒════════════╕ │ Host NQN │ ╞════════════╡ │ Any host │ ╘════════════╛ [root@ceph-1sunilkumar-z1afhw-node6 cephuser]# podman run --rm -it quay.io/barakda1/nvmeof-cli:8677ba3 --server-address 10.0.208.213 --server-port 5500 host list -n nqn.2016-06.io.spdk:sub1 Hosts allowed to access nqn.2016-06.io.spdk:sub1: ╒════════════╕ │ Host NQN │ ╞════════════╡ │ Any host │ ╘════════════╛

caroav commented 6 months ago

This happens because for some reason 10.0.208.213 is optimized only on grp3, and 10.0.209.68 is inaccessible on all. At least that's what I see when I log into the systems now. The 2 namespaces that belong to sub1 are on grp 1, so they're currently not optimized on any listener so we cannot see them. We need to understand how it got to the situation where grp 1 is not optimized on any gw. I suspect it might be related to the issue that we have, that we don't reassign the same grp id to the same gw, after removing a gw. Not sure. @sunilkumarn417 can you describe the steps you did to cause failover. Did you also did any ceph adm command or other command to remove/add a gw? Also can you open the mon logs to file on this setup?

sunilkumarn417 commented 6 months ago

@caroav These are the steps I followed,

Ceph Nodes Inventory
10.0.210.141    ceph-1sunilkumar-z1afhw-node1-installer ceph-1sunilkumar-z1afhw-node1-installer. - MON, MGR
10.0.211.144    ceph-1sunilkumar-z1afhw-node2   ceph-1sunilkumar-z1afhw-node2  - MON, MGR
10.0.208.216    ceph-1sunilkumar-z1afhw-node3   ceph-1sunilkumar-z1afhw-node3.  - MON, OSD Node 
10.0.211.89 ceph-1sunilkumar-z1afhw-node4   ceph-1sunilkumar-z1afhw-node4.  - OSD Node
10.0.211.212    ceph-1sunilkumar-z1afhw-node5   ceph-1sunilkumar-z1afhw-node5.  - OSD Node
10.0.208.213    ceph-1sunilkumar-z1afhw-node6   ceph-1sunilkumar-z1afhw-node6.  - NVMeoF GW
10.0.209.68 ceph-1sunilkumar-z1afhw-node7   ceph-1sunilkumar-z1afhw-node7.   - NVMeoF GW
10.0.210.4  ceph-1sunilkumar-z1afhw-node8   ceph-1sunilkumar-z1afhw-node8.   - Client
10.0.208.67 ceph-1sunilkumar-z1afhw-node9   ceph-1sunilkumar-z1afhw-node9

Configured Ceph cluster - MON, MGR, OSDs.
Deploy NVMeoGW Service on node6 and node7.
(Node6 - Load Balancing GroupId: 2 ), (Node7 - Load Balancing GroupId: 1)
Performed ceph orch daemon rm nvmeofgw.node6 daemon removal.
Daemon got added back with different client id and Load balancing GroupId(3).
(Node6 - Load Balancing GroupId: 3 ), (Node7 - Load Balancing GroupId: 1)
Created 2 subsystems - nqn.2016-06.io.spdk:sub1 nqn.2016-06.io.spdk:sub2.
Accept all hosts in both subsystems host *.
Added all GW clients for both subsystems.
Created 2 images/namespace per each subsystem as below .
sub1_image1 sub1_image2 attached with Load balancing group Id 1 under Subsystem1.
sub2_image1 sub2_image2 attached with Load balancing group Id 3 under Subsystem2.
At client, nvme connect-all and noticed only images from sub2 are visible.

sunilkumarn417 commented 6 months ago

Able to hit the issue again.

manasagowri commented 6 months ago

Able to reproduce this issue in another cluster which I created as well.

GW1

[root@ceph-rbd1-mytest-rxmvqg-node4 ~]# podman run quay.io/barakda1/nvmeof-cli:8677ba3 --server-address 10.0.210.179 --server-port 5500 gw info CLI's version: 1.0.0 Gateway's version: 1.0.0 Gateway's name: client.nvmeof.nvmeof.ceph-rbd1-mytest-rxmvqg-node4.bxkxze Gateway's load balancing group: 2 Gateway's address: 10.0.210.179 Gateway's port: 5500 SPDK version: 23.01.1

[root@ceph-rbd1-mytest-rxmvqg-node4 src]# /usr/libexec/spdk/scripts/rpc.py nvmf_subsystem_get_listeners nqn.2016-06.io.spdk:cnode1 | head -n 24 [ { "address": { "trtype": "TCP", "adrfam": "IPv4", "traddr": "10.0.210.179", "trsvcid": "4420" }, "ana_states": [ { "ana_group": 1, "ana_state": "inaccessible" }, { "ana_group": 2, "ana_state": "inaccessible" }, { "ana_group": 3, "ana_state": "inaccessible" }, { "ana_group": 4, "ana_state": "inaccessible"

[root@ceph-rbd1-mytest-rxmvqg-node4 src]# /usr/libexec/spdk/scripts/rpc.py nvmf_subsystem_get_listeners nqn.2016-06.io.spdk:cnode2 | head -n 24 [ { "address": { "trtype": "TCP", "adrfam": "IPv4", "traddr": "10.0.210.179", "trsvcid": "4420" }, "ana_states": [ { "ana_group": 1, "ana_state": "inaccessible" }, { "ana_group": 2, "ana_state": "inaccessible" }, { "ana_group": 3, "ana_state": "inaccessible" }, { "ana_group": 4, "ana_state": "inaccessible"

GW2

[root@ceph-rbd1-mytest-rxmvqg-node5 ~]# podman run quay.io/barakda1/nvmeof-cli:8677ba3 --server-address 10.0.208.28 --server-port 5500 gw info CLI's version: 1.0.0 Gateway's version: 1.0.0 Gateway's name: client.nvmeof.nvmeof.ceph-rbd1-mytest-rxmvqg-node5.yovvcu Gateway's load balancing group: 1 Gateway's address: 10.0.208.28 Gateway's port: 5500 SPDK version: 23.01.1

[root@ceph-rbd1-mytest-rxmvqg-node5 src]# /usr/libexec/spdk/scripts/rpc.py nvmf_subsystem_get_listeners nqn.2016-06.io.spdk:cnode2 | head -n 24 [ { "address": { "trtype": "TCP", "adrfam": "IPv4", "traddr": "10.0.208.28", "trsvcid": "4420" }, "ana_states": [ { "ana_group": 1, "ana_state": "optimized" }, { "ana_group": 2, "ana_state": "inaccessible" }, { "ana_group": 3, "ana_state": "inaccessible" }, { "ana_group": 4, "ana_state": "inaccessible"

[root@ceph-rbd1-mytest-rxmvqg-node5 src]# /usr/libexec/spdk/scripts/rpc.py nvmf_subsystem_get_listeners nqn.2016-06.io.spdk:cnode1 | head -n 24 [ { "address": { "trtype": "TCP", "adrfam": "IPv4", "traddr": "10.0.208.28", "trsvcid": "4420" }, "ana_states": [ { "ana_group": 1, "ana_state": "optimized" }, { "ana_group": 2, "ana_state": "inaccessible" }, { "ana_group": 3, "ana_state": "inaccessible" }, { "ana_group": 4, "ana_state": "inaccessible"

On client:

[root@ceph-rbd1-mytest-rxmvqg-node6 ~]# nvme list-subsys nvme-subsys3 - NQN=nqn.2016-06.io.spdk:cnode2 \ +- nvme3 tcp traddr=10.0.210.179,trsvcid=4420,src_addr=10.0.208.169 live +- nvme4 tcp traddr=10.0.208.28,trsvcid=4420,src_addr=10.0.208.169 live nvme-subsys1 - NQN=nqn.2016-06.io.spdk:cnode1 \ +- nvme2 tcp traddr=10.0.208.28,trsvcid=4420,src_addr=10.0.208.169 live +- nvme1 tcp traddr=10.0.210.179,trsvcid=4420,src_addr=10.0.208.169 live [root@ceph-rbd1-mytest-rxmvqg-node6 ~]# nvme list-subsys /dev/nvme3n1 nvme-subsys3 - NQN=nqn.2016-06.io.spdk:cnode2 \ +- nvme3 tcp traddr=10.0.210.179,trsvcid=4420,src_addr=10.0.208.169 live inaccessible +- nvme4 tcp traddr=10.0.208.28,trsvcid=4420,src_addr=10.0.208.169 live optimized [root@ceph-rbd1-mytest-rxmvqg-node6 ~]# nvme list-subsys /dev/nvme1n1 nvme-subsys1 - NQN=nqn.2016-06.io.spdk:cnode1 \ +- nvme2 tcp traddr=10.0.208.28,trsvcid=4420,src_addr=10.0.208.169 live +- nvme1 tcp traddr=10.0.210.179,trsvcid=4420,src_addr=10.0.208.169 live

`[root@ceph-rbd1-mytest-rxmvqg-node6 ~]# nvme list Node Generic SN Model Namespace Usage Format FW Rev

/dev/nvme3n5 /dev/ng3n5 2 Ceph bdev Controller 0x5 536.87 GB / 536.87 GB 512 B + 0 B 23.01.1 /dev/nvme3n4 /dev/ng3n4 2 Ceph bdev Controller 0x4 536.87 GB / 536.87 GB 512 B + 0 B 23.01.1 /dev/nvme3n3 /dev/ng3n3 2 Ceph bdev Controller 0x3 536.87 GB / 536.87 GB 512 B + 0 B 23.01.1 /dev/nvme3n2 /dev/ng3n2 2 Ceph bdev Controller 0x2 536.87 GB / 536.87 GB 512 B + 0 B 23.01.1 /dev/nvme3n1 /dev/ng3n1 2 Ceph bdev Controller 0x1 536.87 GB / 536.87 GB 512 B + 0 B 23.01.1 `

caroav commented 6 months ago

The issue is that the nvmeof monitor DB has zomobie gws. It is known and being taken care of. For now, the only way to avoid this issue is:

Don't use any rm commands for the gw with ceph adm.
In case you have a cluster that already hit it, you need to scratch install the ceph cluster.

ceph / ceph-nvmeof

Unable to list Namespaces on initiator due to incorrect ANA state from preferred gateway #506

At Client Side

Ceph Builds