My customer is trying to reimage their DGX A100 cluster to DGX OS 5.4. After pushing the image to the DGX A100 and rebooting, the system loads to a GRUB prompt. Manually loading the kernel works, but I can't find the documentation for pointing the Packer image at the right /efi partition to load the kernel for GRUB. I've seen a similar issue mentioned where a user was able to install DGX OS 5.0.5 and then update from there. My customer tried this, but reports that 5.0.5 failed as well, adding:
I see this in the dgs README, "TODO Next: * kernel parameters in MAAS (w/ tags)".
I can't seem to find any documents on this "TODO" for maas kernel parameters. Could this be the issue? Do you know of any documentation showing this?
The original text is below:
I was able to build and use the image with Maas. The packer image seems to want to use /efi/boot/grub64.efi after the installation, but this doesn't exist (see attached image), which then loads me to the grub command shell.
Looks like /efi is on a separate partition
I can load it manually using:
set root=(md/0)
set prefix=(md/0)/boot/grub
insmod normal
normal
This will then allow me to select the DGX os and boot up to finish a "successful" maas deployment.
I'm sure i can figure a way to "jimmy rig" this to work, but thought I would throw it y'alls way to see if you have a quick and easy solution before i custom your already custom packer image.
My customer is trying to reimage their DGX A100 cluster to DGX OS 5.4. After pushing the image to the DGX A100 and rebooting, the system loads to a GRUB prompt. Manually loading the kernel works, but I can't find the documentation for pointing the Packer image at the right /efi partition to load the kernel for GRUB. I've seen a similar issue mentioned where a user was able to install DGX OS 5.0.5 and then update from there. My customer tried this, but reports that 5.0.5 failed as well, adding:
The original text is below: