Abnormal workflow termination can orphan NVMe namespaces

Abnormally terminated workflows can fail to cleanup nvme namespaces.

Documenting the symptom here. Not yet sure of the root cause(s).

Cleanup method

Orphaned NVMe Namespaces? If all of your workflows have completed, you can check a particular rabbit to determine if it has orphaned NVMe namespaces by:

~/tools/nvme.sh list

If there are namespaces listed there, they are orphaned.

The easy way to delete these namespaces is:

delete the nnfnodeecdata resource for the Rabbit in question
delete the nnf-node-manager pod for the Rabbit in question

The nnf-node-manager pod will restart automatically. Because its nnfnodeecdata resource has been removed, it will cleanup all existing namespaces during initialization..

NearNodeFlash / NearNodeFlash.github.io

Abnormal workflow termination can orphan NVMe namespaces #148