ceph / ceph-nvmeof

Service to provide Ceph storage over NVMe-oF/TCP protocol
GNU Lesser General Public License v3.0
89 stars 46 forks source link

Make 'ana_state' of all ana_group other than Gateway's load balancing group as "inaccessible" #501

Open rahullepakshi opened 8 months ago

rahullepakshi commented 8 months ago

This issue is open for discussion if needed.

I assume that ana_group and --load-balancing-group are same. I am observing a strange ana_state of "optmized" for multiple "ana_groups" on a Gateway node as below. In this case ana_group 2 and 4 also has ana_state optimized and of ana_group: 7 is expected as this Gateway's load balancing group is 7

[root@ceph-nvmeof-ha-rqn1w7-node4 cephuser]# podman run --rm -it quay.io/barakda1/nvmeof-cli:8677ba3 --server-address 10.0.210.209 --server-port 5500 gw info
CLI's version: 1.0.0
Gateway's version: 1.0.0
Gateway's name: client.nvmeof.nvmeof.ceph-nvmeof-ha-rqn1w7-node4.bpvcer
**Gateway's load balancing group: 7**
Gateway's address: 10.0.210.209
Gateway's port: 5500
SPDK version: 23.01.1

[root@ceph-nvmeof-ha-rqn1w7-node4 src]#  /usr/libexec/spdk/scripts/rpc.py  nvmf_subsystem_get_listeners nqn.2016-06.io.spdk:cnode1
[
  {
    "address": {
      "trtype": "TCP",
      "adrfam": "IPv4",
      "traddr": "10.0.210.209",
      "trsvcid": "4420"
    },
    "ana_states": [
      {
        "ana_group": 1,
        "ana_state": "inaccessible"
      },
      {
        "ana_group": 2,
        "ana_state": "optimized"
      },
      {
        "ana_group": 3,
        "ana_state": "inaccessible"
      },
      {
        "ana_group": 4,
        "ana_state": "optimized"
      },
      {
        "ana_group": 5,
        "ana_state": "inaccessible"
      },
      {
        "ana_group": 6,
        "ana_state": "inaccessible"
      },
      {
        "ana_group": 7,
        **"ana_state": "optimized"**

IMO, this should not the be the case, all ana_states other than that of ana_group: 7 should always be inaccessible. Because a GW should only have one optimized path at any point in time. Say if we add a new gateway and it takes "Gateway's load balancing group as 2 or 4" this will cause issue as there will be 2 optimized paths to same namespace always and during failover - I am not sure how this handled.

Please let me know you thoughts

caroav commented 8 months ago

@baum can you refer to this issue? cc - @leonidc

caroav commented 8 months ago

@rahullepakshi please provide more details:

  1. Did you remove daemons or uninstalled gws on this cluster?
  2. How many nvmeof gws are running there?

Most probably this is once again the same rm issue.

rahullepakshi commented 8 months ago

@caroav replies inline

@rahullepakshi please provide more details:

  1. Did you remove daemons or uninstalled gws on this cluster?

    Yes, I removed daemons and uninstalled gws too and for this fresh installation it has GW load balacning set to 7. But my question during this new installation, though it sets to 7, why should ana_state of ana_group 2 and 4 be optimized. On a fresh installation, it should make them inaccessible as they are not this GWs preffered ana_group

  2. How many nvmeof gws are running there?

    2 nvmeof GWs

Most probably this is once again the same rm issue.

This may be related but not same