NearNodeFlash / NearNodeFlash.github.io

View this document https://nearnodeflash.github.io/
Apache License 2.0
3 stars 3 forks source link

Failure to create XFS filesystem #128

Closed behlendorf closed 4 months ago

behlendorf commented 4 months ago

As of version v0.0.7+ workflows that allocate an xfs filesystem get hung up retrying lvm commands. Attached is the nnf-node-manager log for a workflow which is attempting to allocate a single xfs filesystem.

nnf-node-manager.log

A couple observations. 1) vgcreate is running much to early before any of the PVs are created when it has no chance of success. It does retry. 2) wipefs is failing because the device doesn't exist in /dev/mapper/ directory. If I manually run vgchange --lock-start and lvchange --activate y then /dev/mapper/ is correctly populated. But it does take a moment, you may just need to wait on udev.

behlendorf commented 4 months ago

vgcreate is running much to early before any of the PVs are created when it has no change of success. It does retry.

This is unrelated to the xfs failure, but I see similar behavior with Lustre filesystems. Specifically zpool create is invoked incorrectly a few times, presumably because the kernel hasn't created the devices for the NVMe namespaces yet, then once they exist it's run correctly. One potential concern here is zpool create will still succeed even if only some of the devices are passed resulting in a reduced capacity. Although, I don't know if that's possible in practice. For example:

> zpool create -O canmount=off -o cachefile=none zf500d3d-mdt-0
stderr: missing vdev specification
> zpool create -O canmount=off -o cachefile=none zf500d3d-mdt-0 /dev/nvme3n8 <only a few additional vdevs>
success
ajfloeder commented 4 months ago

@behlendorf your analysis is on the mark. I see a problem in the code that allows the vgcreate to start before anything is ready, but I'm not exactly sure yet how to fix that. I see the same issue when I run in our test environment, so I hope to have a fix soon.

ajfloeder commented 4 months ago

My change avoids the sequencing problem where the pv/vg/lv operations start before the namespaces have been recognized, but there is still a problem with wipefs. [nnf-node-manager.log] (https://github.com/NearNodeFlash/NearNodeFlash.github.io/files/14385698/nnf-node-manager.log)

ajfloeder commented 4 months ago

I believe the question here is whether the /dev/mapper name still uses the hyphen '-' as a separator which would mean we are generating the wrong name, or as Brian suggested above, we are not waiting long enough for the device to appear.

ajfloeder commented 4 months ago

We've confirmed that the hyphen is the separator. The issue is that we need to wait long for the /dev/mapper device to appear.

ajfloeder commented 4 months ago

Re-opening issue. Every vgactivate command requires a wait for the device to appear in /dev/mapper.