NetApp / trident

Storage orchestrator for containers
Apache License 2.0
732 stars 218 forks source link

NVMe/tcp doesn't work with long k8s node names #906

Open magicite opened 2 weeks ago

magicite commented 2 weeks ago

Describe the bug When using nvme/tcp in k8s environments with long node names, pods cannot attach to the storage.

Event from pod that won't initialize

   Warning  FailedAttachVolume  16m                 attachdetach-controller  AttachVolume.Attach failed for volume "pvc-48bc8cb8-941c-4b39-a1e7-5082c4d74474" : rpc erro │
│ r: code = Unknown desc = [GET /protocols/nvme/subsystems][400] nvme_subsystem_collection_get default  &{Error:0xc001b405d0}

Corresponding entry from security audit log show:

Tue Jun 18 17:03:16 2024  nas3502-04   [kern_audit:info:3702] 8503eb0000008fb4 :: nas3502-A800:http :: 172.16.251.224:60978 :: site-que1:vsadmin :: GET /api/protocols/nvme/subsystems?fields=%2A%2A&name=site-que1-wrk-62e68963-v6vx4-6cf6b4ac-ea2a-4675-9eb1-740bc8f6ecf0&svm.uuid=faa0ae84-2cee-11ef-ac41-d039ea9b7294 :: Error: "site-que1-wrk-62e68963-v6vx4-6cf6b4ac-ea2a-4675-9eb1-740bc8f6ecf0" is an invalid value for field "name" (<text (size 1..64)>)

Environment

To Reproduce Given the entry from the audit log, I think you need to have a k8s node with a long name.

Expected behavior The volume should attach.

Extra info I happened to have an older test environment set up, that was originally used with an older version of astra trident and ONTAP software, which also has long node names. Things worked in that environment, and have held steady since then. I just went to create a new pod, without updating the astra trident software, but with the AFF800 now running 9.14.1, and it fails identical to above. I would guess then that this is a regression introduced in or around ontap 9.14.1.

YvosOnTheHub commented 2 weeks ago

Hi @magicite,

When Trident implemented NVMe driver, the max length of subsystem that ONTAP used was 96 chars and that’s what Trident used. However, ONTAP has changed the max length of NVMe subsystem from 96 chars to 64 chars in 9.14.1. This ONTAP change breaks the backward compatibility.

For the file system volumes, the NVMe subsystem is a combination of host-node-name and trident UUID . This issue has already been identified & should be fixed in Trident 24.06

Meanwhile, 3 options: