Open RobertLukan opened 2 days ago
Hello Robert,
Indeed, the trsvcid is dynamic in this implementation. The reason for this is that we have to "play nice" with existing system services and so using a fixed, non-standard port is problematic. The upstream NVMe-oF gateway uses 4420, but this is only suggested, not enforced (as far as I understand), and doing something like that poses problems as well (You can't mix TCPv4 and TCPv6 simultaneously for the same endpoint, for example).
What we can do, however, is: Allow the users to specify the full endpoint address (not just the subnet), and make it fixed once it's been created, so that it's stable across reboots. Note that if you aren't using multiple units to back the connection, you'll most likely lose the connection upon reboot, unless you have a retry mechanism (or unless the reboot happens very fast, I don't know).
WRT to discovery, I'll go ahead and add a listener at a user-defined port. The way it works right now is that the endpoints themselves serve as discovery services, but if their addresses aren't stable, then that's no good.
As long as it is stable across the reboots is fine, so that an initiator can connect to it back. I will test it once you have the changes implemented. Thank you for a very quick response.
I have deployed the charm ceph-nvme by packing it first and using it with juju afterwards. I followed the README and everything looks fine. However, after I reboot any endpoint the trsvcid changes and nvme initiator looses the connection.
Maybe I am missing something but I don't know how to maintain a connection from an initiator point of view.
I would expect that the endpoint pair traddr / trsvcid would be maintained and saved in the config file, but it does not look like. Or a standard port 4420 would be used as is being used by ceph-nvmeof (https://github.com/ceph/ceph-nvmeof).
Another "strange" behaviour is the discovery process, it cannot be done with the standard trsvcid of 8009 but it only works with a dynamic high port. In this case a discovery process really does not work, as an initiator cannot really figure out in dynamic way endpoints.
Simple example:
root@juju1:~# juju run ceph-nvme/7 list-endpoints Running operation 229 with 1 task
Waiting for task 230... endpoints: '[{''nqn'': ''nqn.2014-08.org.nvmexpress:uuid:a961efa5-4308-48bd-b509-7f1e2d35b281'', ''addr'': ''192.168.164.174'', ''port'': ''39851'', ''hosts'': [], ''allow_any_host'': True, ''pool'': ''nvmeof'', ''image'': ''nvmeimage'', ''cluster'': ''ceph.61'', ''type'': ''rbd''}]'
root@juju1:~# juju ssh ceph-nvme/7 sudo reboot
Broadcast message from root@VM166 on pts/1 (Sun 2024-11-24 08:02:33 UTC):
The system will reboot now!
Connection to 192.168.164.174 closed. root@juju1:~# juju run ceph-nvme/7 list-endpoints Running operation 233 with 1 task
Waiting for task 234... endpoints: '[{''nqn'': ''nqn.2014-08.org.nvmexpress:uuid:a961efa5-4308-48bd-b509-7f1e2d35b281'', ''addr'': ''192.168.164.174'', ''port'': ''39944'', ''hosts'': [], ''allow_any_host'': True, ''pool'': ''nvmeof'', ''image'': ''nvmeimage'', ''cluster'': ''ceph.61'', ''type'': ''rbd''}]'
root@juju1:~#
Here is the juju status output:
root@juju1:~# juju status Model Controller Cloud/Region Version SLA Timestamp ceph my-controller my-maas/default 3.5.4 unsupported 08:07:08Z
App Version Status Scale Charm Channel Rev Exposed Message ceph-mon 19.2.0 active 3 ceph-mon latest/edge 242 no Unit is ready and clustered ceph-nvme active 2 ceph-nvme 9 no ready ceph-osd 19.2.0 active 3 ceph-osd latest/edge 614 no Unit is ready (1 OSD) myubuntu 22.04 active 1 ubuntu latest/stable 25 no
ubuntu 22.04 active 1 ubuntu latest/stable 25 no
vault 1.8.8 blocked 1 vault latest/edge 363 no Unit is sealed
Unit Workload Agent Machine Public address Ports Message ceph-mon/55 active idle 43/lxd/0 192.168.164.160 Unit is ready and clustered ceph-mon/56 active idle 44/lxd/0 192.168.164.161 Unit is ready and clustered ceph-mon/57 active idle 45/lxd/0 192.168.164.162 Unit is ready and clustered ceph-nvme/7 active idle 52 192.168.164.174 ready ceph-nvme/8 active idle 53 192.168.164.175 ready ceph-osd/15 active idle 43 192.168.164.157 Unit is ready (1 OSD) ceph-osd/16 active idle 44 192.168.164.158 Unit is ready (1 OSD) ceph-osd/17 active idle 45 192.168.164.159 Unit is ready (1 OSD) myubuntu/0 active idle 43/lxd/3 192.168.164.167
ubuntu/4 active idle 43/lxd/2 192.168.164.166
vault/2 blocked idle 47 192.168.164.164 8200/tcp Unit is sealed
Machine State Address Inst id Base AZ Message 43 started 192.168.164.157 VM163 ubuntu@24.04 default Deployed 43/lxd/0 started 192.168.164.160 juju-9b40c1-43-lxd-0 ubuntu@24.04 default Container started 43/lxd/2 started 192.168.164.166 juju-9b40c1-43-lxd-2 ubuntu@22.04 default Container started 43/lxd/3 started 192.168.164.167 juju-9b40c1-43-lxd-3 ubuntu@22.04 default Container started 44 started 192.168.164.158 fair-llama ubuntu@24.04 default Deployed 44/lxd/0 started 192.168.164.161 juju-9b40c1-44-lxd-0 ubuntu@24.04 default Container started 44/lxd/1 started 192.168.164.163 juju-9b40c1-44-lxd-1 ubuntu@22.04 default Container started 45 started 192.168.164.159 VM161 ubuntu@24.04 default Deployed 45/lxd/0 started 192.168.164.162 juju-9b40c1-45-lxd-0 ubuntu@24.04 default Container started 47 started 192.168.164.164 VM164 ubuntu@24.04 default Deployed 52 started 192.168.164.174 VM166 ubuntu@24.04 default Deployed 53 started 192.168.164.175 VM165 ubuntu@24.04 default Deployed
root@juju1:~# juju ssh ceph-mon/leader sudo rbd status --pool nvmeof --image nvmeimage Watchers: watcher=192.168.164.175:0/671622659 client.68293 cookie=132959436803440 watcher=192.168.164.174:0/1203118954 client.68363 cookie=133328736921136
root@juju1:~# juju ssh ceph-mon/leader sudo rbd info nvmeimage --pool nvmeof rbd image 'nvmeimage': size 5 GiB in 1280 objects order 22 (4 MiB objects) snapshot_count: 0 id: 107a0813ae815 block_name_prefix: rbd_data.107a0813ae815 format: 2 features: layering, exclusive-lock, object-map, fast-diff, deep-flatten op_features: flags: create_timestamp: Sat Nov 23 19:18:42 2024 access_timestamp: Sun Nov 24 08:02:51 2024 modify_timestamp: Sat Nov 23 19:18:42 2024