Failure to create XFS filesystem

NearNodeFlash / NearNodeFlash.github.io

View this document https://nearnodeflash.github.io/

Apache License 2.0

3 stars 3 forks source link

Failure to create XFS filesystem #128

Closed behlendorf closed 4 months ago

behlendorf commented 4 months ago

As of version v0.0.7+ workflows that allocate an xfs filesystem get hung up retrying lvm commands. Attached is the nnf-node-manager log for a workflow which is attempting to allocate a single xfs filesystem.

nnf-node-manager.log

A couple observations. 1) vgcreate is running much to early before any of the PVs are created when it has no chance of success. It does retry. 2) wipefs is failing because the device doesn't exist in /dev/mapper/ directory. If I manually run vgchange --lock-start and lvchange --activate y then /dev/mapper/ is correctly populated. But it does take a moment, you may just need to wait on udev.

behlendorf commented 4 months ago

vgcreate is running much to early before any of the PVs are created when it has no change of success. It does retry.

This is unrelated to the xfs failure, but I see similar behavior with Lustre filesystems. Specifically zpool create is invoked incorrectly a few times, presumably because the kernel hasn't created the devices for the NVMe namespaces yet, then once they exist it's run correctly. One potential concern here is zpool create will still succeed even if only some of the devices are passed resulting in a reduced capacity. Although, I don't know if that's possible in practice. For example:

> zpool create -O canmount=off -o cachefile=none zf500d3d-mdt-0
stderr: missing vdev specification
> zpool create -O canmount=off -o cachefile=none zf500d3d-mdt-0 /dev/nvme3n8 <only a few additional vdevs>
success