dracutdevs / dracut

dracut the event driven initramfs infrastructure
https://github.com/dracutdevs/dracut/wiki
GNU General Public License v2.0
597 stars 396 forks source link

fix(nvmf): move /etc/nvme/host{nqn,id} requirement to hostonly #2523

Open tbzatek opened 11 months ago

tbzatek commented 11 months ago

This pull request changes the 95nvmf module /etc/nvme/host{nqn,id} files inclusion.

Changes

When creating initramfs for universal boot image such as an installer, we can't include any machine-specific IDs. Let's move the check for /etc/nvme/hostnqn and /etc/nvme/hostid files presence to the hostonly section to avoid unsatisfied requirements.

Checklist

tbzatek commented 11 months ago

Cc: @mwilck @johnmeneghini @lnykryn

This is my attempt to fix the universal-image scenario. We're facing this issue specifically in the installer, may apply to any kind of pre-build images (e.g. cloud) or layered images with immutable base.

For obvious reasons we can't include any machine-specific IDs and that's where we're facing unsatisfied requirements, leading to the nvmf module not being included.

For the NBFT case where HostNQN and HostID is configured, no files are needed. For the rest of the cases we'll need to rely on nvme-cli/libnvme auto-generated IDs from e.g. DMI records. An alternative would be to generate new IDs before making first connection attempt, subject to discussion.

pvalena commented 11 months ago

LGTM.

tbzatek commented 11 months ago

Then both files should be marked as hostonly, i.e.,inst_simple -H <file> on the following lines:

https://github.com/dracutdevs/dracut/blob/6acfecae572fb457115b276b5b64d9424ad5187b/modules.d/95nvmf/module-setup.sh#L133-L134

Done.

I haven't tested this change yet, we'll be doing extensive testing in coming weeks with different combinations of Fabrics tech and HostNQN/HostID (non)presence. For now I just wanted to have more eyes to evaluate the changes.

mwilck commented 11 months ago

This causes both files to be written to the internal /usr/lib/dracut/hostonly-files when the initrd is built in hostonly mode, so these files (and all files that are only needed in a hostonly initrd) can be removed at boot by passing the rd.hostonly=0 option in the kernel command line (see man dracut.cmdline.7).

That doesn't make much sense for NVMe hostnqn and hostid. Without a hostnqn/hostid, no NVMe-oF connections are possible. If the user wants to set different values, she can do so today by using rd.nvmf.hostnqn= and rd.nvmf.hostid= parameters.

johnmeneghini commented 11 months ago

On 9/22/23 09:33, Martin Wilck wrote:

That doesn't make much sense for NVMe hostnqn and hostid. Without a hostnqn/hostid, no NVMe connections are possible. If the user wants to set different values, she can do so today by using |rd.nvmf.hostnqn=| and |rd.nvmf.hostid=| parameters.

The upstream nvme-cli code today will create the hostnqn/hostid if none is provided. I think this is ideally what we want.

https://github.com/linux-nvme/nvme-cli/blob/a2587cbd2eea1d0506d052356abe94cb27cb9884/fabrics.c#L733

So as long as the runtime environment provides a dmi/product_uuid, you don't need a /etc/nvme/hostnqn\id file.

sudo cat /sys/class/dmi/id/product_uuid 66b848cc-25ac-11b2-a85c-ff377e9ee3de

Ideally we want all supported systems to be complaint with TP-4126.

From TP-4126 NVMe-oF Boot HostNQN and HostID: this is requirement for the host.

The pre-OS boot environment and OS environment should use a fixed platform UUID to create a HostNQN and HostID. The implementation should use the System UUID found in the SMBIOS table. The System Management BIOS (SMBIOS) Reference Specification is described in DSP0134. The SMBIOS table is typically available to pre-OS firmware and Expansion ROM Firmware in the pre-OS boot environment as well as to the OS environment.

I think rd.nvmf.hostnqn/hostid should be a manual override.

mwilck commented 11 months ago

Right. I didn't think about the product UUID automatism. With that, it would actually be possible to clone a system and use rd.hostonly=0 for booting (assuming that the target subsystem is configured to serve namespaces to the new host's NQN, of course).

tbzatek commented 11 months ago

Great, I was hoping to have this kind of discussion!

More things to consider:

mwilck commented 11 months ago

NBFT without HostNQN set

I'm pretty certain that won't work, at least not with the current firmware.

johnmeneghini commented 11 months ago

On 9/22/23 11:06, Tomáš Bžatek wrote:

I still want to have this tested three scenarios: NVMe/FC, NBFT with HostNQN set in the boot attempt and NBFT without HostNQN set. This will take some time.

I agree that we want to test at least these three scenarios. Not sure if NBFT boot w/out HostNQN is will work with the existing code... but in theory it should be possible with v6.5...

Let's make sure we don't merge this until all of this has been tested.

tbzatek commented 9 months ago

This has been thoroughly tested on various system deployments for the past two months. Ready for review.

pvalena commented 8 months ago

LGTM.

tbzatek commented 6 months ago

Rebased. Anything else missing for review?

stale[bot] commented 4 months ago

This issue is being marked as stale because it has not had any recent activity. It will be closed if no further activity occurs. If this is still an issue in the latest release of Dracut and you would like to keep it open please comment on this issue within the next 7 days. Thank you for your contributions.