Open hydro-b opened 2 years ago
I have tried with a linux kernel NVMe-OF target, but same result:
# server
modprobe nvmet
modprobe nvmet-tcp
mkdir /sys/kernel/config/nvmet/subsystems/nvmet-test
cd /sys/kernel/config/nvmet/subsystems/nvmet-test
echo 1 > /sys/kernel/config/nvmet/subsystems/nvmet-test/attr_allow_any_host
mkdir namespaces/1
cd namespaces/1/
rbd create knvmet01 --size=10G
rbd map knvmet01
echo -n /dev/rbd0 | tee -a device_path > /dev/null
echo 1| tee -a enable > /dev/null
mkdir /sys/kernel/config/nvmet/ports/1
cd /sys/kernel/config/nvmet/ports/1
echo 2001:7b8:3000:999::11 |tee -a addr_traddr > /dev/null
echo tcp|tee -a addr_trtype > /dev/null
echo 4420|tee -a addr_trsvcid > /dev/null
echo ipv6 |tee -a addr_adrfam > /dev/null
ln -s /sys/kernel/config/nvmet/subsystems/nvmet-test/ /sys/kernel/config/nvmet/ports/1/subsystems/nvmet-t
# client
nvme discover -t tcp -a 2001:7b8:3000:999::11 -s 4420
Discovery Log Number of Records 1, Generation counter 2
=====Discovery Log Entry 0======
trtype: tcp
adrfam: ipv6
subtype: nvme subsystem
treq: not specified, sq flow control disable supported
portid: 1
trsvcid: 4420
subnqn: nvmet-test
traddr: 2001:7b8:3000:999::11
sectype: none
nvme connect -t tcp --traddr 2001:7b8:3000:999::11 s 4420 -n nvmet-test
Failed to write to /dev/nvme-fabrics: Connection refused
I have done successful kernel NVMe-OF / SPDK (with NVMe disks) in the past, but not sure what is happening now.
A dash got lost in front of the "s":
~/ceph-nvmeof# nvme connect -t tcp --traddr 2001:7b8:3000:999::11 s 5001 -n nqn.2016-06.io.spdk:cnode1
Failed to write to /dev/nvme-fabrics: Connection refused
So I guess the defautl port gets used which no target is listening on. So after I added the dash I tried to log in again:
nvme connect -t tcp --traddr 2001:7b8:3000:999::11 -s 5001 -n nqn.2016-06.io.spdk:cnode1
Failed to write to /dev/nvme-fabrics: Input/output error
[2022-10-07 16:30:48.296084] ctrlr.c: 680:nvmf_qpair_access_allowed: *ERROR*: Subsystem 'nqn.2016-06.io.spdk:cnode1' does not allow host 'nqn.2014-08.org.nvmexpress:uuid:07ac0178-d457-415d-8ba8-bef911e64b88' to connect at this address.
Which is odd as the nqn is added to allow access:
~/ceph-nvmeof# python3 -m control.cli get_subsystems
INFO:__main__:Get subsystems:
[
{
"nqn": "nqn.2014-08.org.nvmexpress.discovery",
"subtype": "Discovery",
"listen_addresses": [],
"allow_any_host": true,
"hosts": []
},
{
"nqn": "nqn.2016-06.io.spdk:cnode1",
"subtype": "NVMe",
"listen_addresses": [
{
"transport": "TCP",
"trtype": "TCP",
"adrfam": "IPv6",
"traddr": "[2001:7b8:3000:999::11]",
"trsvcid": "5001"
}
],
"allow_any_host": true,
"hosts": [
{
"nqn": "nqn.2014-08.org.nvmexpress:uuid:2c1d0ce8-4711-4551-8369-4dbc0d874c87"
},
{
"nqn": "nqn.2014-08.org.nvmexpress:uuid:07ac0178-d457-415d-8ba8-bef911e64b88"
}
],
"serial_number": "SPDK00000000000001",
"model_number": "SPDK bdev Controller",
"max_namespaces": 32,
"min_cntlid": 1,
"max_cntlid": 65519,
"namespaces": [
{
"nsid": 1,
"bdev_name": "Ceph0",
"name": "Ceph0",
"nguid": "A2A33970D20A4EA18A1ADDBD44062E4F",
"uuid": "a2a33970-d20a-4ea1-8a1a-ddbd44062e4f"
}
]
}
]
Besides that I added the "allow any host". I removed all hosts from the acl so any host should be allowed, but still access is denied:
[2022-10-07 16:41:47.574131] ctrlr.c: 680:nvmf_qpair_access_allowed: *ERROR*: Subsystem 'nqn.2016-06.io.spdk:cnode1' does not allow host 'nqn.2014-08.org.nvmexpress:uuid:07ac0178-d457-415d-8ba8-bef911e64b88' to connect at this address.
~/ceph-nvmeof# python3 -m control.cli get_subsystems
INFO:__main__:Get subsystems:
[
{
"nqn": "nqn.2014-08.org.nvmexpress.discovery",
"subtype": "Discovery",
"listen_addresses": [],
"allow_any_host": true,
"hosts": []
},
{
"nqn": "nqn.2016-06.io.spdk:cnode1",
"subtype": "NVMe",
"listen_addresses": [
{
"transport": "TCP",
"trtype": "TCP",
"adrfam": "IPv6",
"traddr": "[2001:7b8:3000:999::11]",
"trsvcid": "5001"
}
],
"allow_any_host": true,
"hosts": [],
"serial_number": "SPDK00000000000001",
"model_number": "SPDK bdev Controller",
"max_namespaces": 32,
"min_cntlid": 1,
"max_cntlid": 65519,
"namespaces": [
{
"nsid": 1,
"bdev_name": "Ceph0",
"name": "Ceph0",
"nguid": "A2A33970D20A4EA18A1ADDBD44062E4F",
"uuid": "a2a33970-d20a-4ea1-8a1a-ddbd44062e4f"
}
]
}
]
I have not had this problem. But I have been using ipv4 and don't have a chance now to try ipv6.
Just glancing through the spdk code, the error is returned because spdk_nvmf_subsystem_listener_allowed
failed. And it failed because spdk_nvme_transport_id_compare
did not return success for any listener. And I see that spdk_nvme_transport_id_compare
compares traddr
strings. I wonder couldn't be that it cannot find the listener because traddr
contains []
? What if you try adding the listener without []
?
The reason why I added the [ ] in the ceph-nvmeof.conf file was because of this:
E1007 19:20:32.562363631 1444846 chttp2_server.cc:1053] UNKNOWN:No address added out of total 1 resolved for '2001:7b8:3000:999::11:5500' {created_time:"2022-10-07T19:20:32.56100231+02:00", children:[UNKNOWN:Unable to configure socket {fd:19, created_time:"2022-10-07T19:20:32.560903576+02:00", children:[UNKNOWN:Cannot assign requested address {created_time:"2022-10-07T19:20:32.560827797+02:00", errno:99, os_error:"Cannot assign requested address", syscall:"bind"}]}]}
INFO:control.server:Terminating SPDK...
INFO:control.server:Stopping the server...
INFO:control.server:Exiting the gateway process.
It interprets the :5500
as part of the address. This is often the case with software, and often the [ ] brackets are required. But as you noted, it does matter in this case. So what did I do to resolve this? I configured the gateway to listen on 127.0.0.1. And then followed the example with all things I learned until know and added a listener for IPv6 but without the brackets. And then it worked. See below:
python3 -m control.cli create_bdev -i mytestdevimage -p rbd -b Ceph0
INFO:__main__:Created bdev Ceph0: True
python3 -m control.cli create_subsystem -n nqn.2016-06.io.spdk:cnode1 -s SPDK00000000000001
INFO:__main__:Created subsystem nqn.2016-06.io.spdk:cnode1: True
python3 -m control.cli add_namespace -n nqn.2016-06.io.spdk:cnode1 -b Ceph0
INFO:__main__:Added namespace 1 to nqn.2016-06.io.spdk:cnode1: True
python3 -m control.cli add_host -n nqn.2016-06.io.spdk:cnode1 -t "*"
INFO:__main__:Allowed open host access to nqn.2016-06.io.spdk:cnode1: True
python3 -m control.cli create_listener -n nqn.2016-06.io.spdk:cnode1 -s 5001 -f ipv6 -a 2001:7b8:3000:999::11 -g stefantest
INFO:__main__:Created nqn.2016-06.io.spdk:cnode1 listener: True
Discovery and connecting on the target:
nvme discover -t tcp -a 2001:7b8:3000:999::11 -s 5001
Discovery Log Number of Records 1, Generation counter 1
=====Discovery Log Entry 0======
trtype: tcp
adrfam: ipv6
subtype: nvme subsystem
treq: not required
portid: 0
trsvcid: 5001
subnqn: nqn.2016-06.io.spdk:cnode1
traddr: 2001:7b8:3000:999::11
sectype: none
nvme connect -t tcp -a 2001:7b8:3000:999::11 -n nqn.2016-06.io.spdk:cnode1 -s 5001
nvme list
Node SN Model Namespace Usage Format FW Rev
---------------- -------------------- ---------------------------------------- --------- -------------------------- ---------------- --------
/dev/nvme0n1 SPDK00000000000001 SPDK bdev Controller 1 10.74 GB / 10.74 GB 4 KiB + 0 B 22.01
So I think we can conclude by this that the gateway should be able to parse the IPv6 address without the brackets. Or SPDK should be improved so it accepts IPv6 addresses with or without brackets.
Thanks for taking the time to carefully read the error messages and the code!
I followed the example in the README: expose a single Ceph rbd device, Ceph0, as a NVMe-OF target. For sake of completeness see the ouput of the gateway below:
Ask controller for state:
On client server discover the NVMe-OF target:
Make sure hostnqn matches access restriction:
And try to connect to it:
This fails:
I have tested kernels
5.4.0-126-generic
and5.15.0-48-generic
.Let's try if SPDK identify works:
What stands out for me from above report is
Max Number of Namespaces: 0
. Can it be that for some reason the namespace is not exposed?Let's ask SPDK through
rpc.py
:There definitely is a namespace exposed. I'm at a loss at this point. Is there something obvious I'm overlooking? Note that I have also tried "127.0.0.1" as listener address and tried to connect with the kernel client on the host running the gateway, but this results in the same error.