coreos / fedora-coreos-tracker

Issue tracker for Fedora CoreOS
https://fedoraproject.org/coreos/
264 stars 59 forks source link

Dell PowerEdge Servers incompatible with CoreOS installation media. #707

Closed bnordgren closed 2 years ago

bnordgren commented 3 years ago

I was asked to create a new issue with more information. Really, aside from "You can't install to certain Dell PowerEdge servers using an ISO9660 image" I don't know what to add. You either provide what works or expect it to fail.

I do not know the extent of this limitation on Dell's part. It fails on my T310, and since it's basically a firmware thing, I bet it has wider applicability but do not have the ability to explore further. Regardless, the original comment follows the link to Dell's instructions on how to make compatible boot media.

https://www.dell.com/support/kbdoc/en-us/000141551/how-to-create-a-bootable-usb-device-with-rufus-to-update-dell-servers

I think this should be re-opened. Yes, I'm sure "dd" mode works because the failure is that the mount unit in systemd is looking for an ISO9660 image. When you DD it, you get an ISO image. Ironically, using that Rufus tool, "ISO" mode actually means "format the USB stick normally and copy all the files over", so the result is a vfat formatted stick, not an ISO9660.

Prior to this weekend I knew nothing about Rufus. I found out about it because my Dell PowerEdge T310 would not boot a USB stick created with the "dd" command line. Or with the Fedora Media Writer, which does the same thing, as far as I can tell. However, a vfat formatted USB which is bootable will be recognized. (Note: The same boot media--using the dd method--works fine in a normal PC, just not on the one I'm trying to install.) Note that this Rufus tool (in ISO mode) is on the Dell website as the tool to use to make bootable USBs for OS installation on their PowerEdge servers.

So, while "dd" works with the CoreOS image, it is not always possible to boot using that method; ergo the CoreOS image needs to support mounting /run/media/iso as "whatever it is", not locking it down to ISO9660. Or write off certain server class machines from certain manufacturers as "not supported".

I'm super annoyed with Dell for making me use Windows to install Linux. Fedora media writer should probably offer similar functionality as this "Rufus" tool, as well, but that's a different ticket for a different project.

Originally posted by @bnordgren in https://github.com/coreos/fedora-coreos-tracker/issues/554#issuecomment-735413225

jlebon commented 3 years ago

Dell not supporting booting from ISO9660 is odd. As a sanity-check, did you verify whether there was a BIOS update available which enables this?

Another approach is iDRAC -- if you have iDRAC, are you able to virtual mount the ISO and boot from it that way instead?

bgilbert commented 3 years ago

Thanks for the report!

So, while "dd" works with the CoreOS image, it is not always possible to boot using that method; ergo the CoreOS image needs to support mounting /run/media/iso as "whatever it is", not locking it down to ISO9660.

We probably shouldn't start trying to support modified boot media generated by arbitrary tools. I'm wondering whether we can adjust something about the way we generate our ISO image so a dd'ed USB stick will be accepted by the firmware.

Other things to try: does an ISO of Fedora Workstation or Fedora Server or Ubuntu work when dd'ed to a USB stick? Does a Fedora CoreOS ISO work if burned to an actual DVD?

lucab commented 3 years ago

Possibly relevant: https://superuser.com/questions/1063220/windows-10-iso-on-usb-refuses-to-boot

bnordgren commented 3 years ago

So this is quite a ways back in my rear view mirror at this point, and I'm not going to do anything which risks re-provisioning since my system is now set up--and I've already restored 5TB of data to the raid that the ignition file created. As I recall, I tried all the combinations I could think of, but I didn't remote into iDRAC's web interface from a different machine. I do believe the firmware is up to date, and this really isn't a "bug", the Dell is functioning the way the designers intended. Even if it's stupid. :)

There are two problems here.

  1. Dell won't boot off of an ISO image.
  2. CoreOS, once transformed into something that is recognized as boot media ala that Rufus tool, still fails to boot because it has "-t iso9660" on the mount command for the root filesystem.

I believe that all that is necessary to fix number two is to remove the filesystem type specification when you mount the root image. All I needed to do was repeat the mount command without "-t iso9660" and exit out of the emergency shell. It continued merrily on its way.

That Rufus tool fixes number one by taking an ISO image and making it into something recognizable as boot media. Perhaps this is a feature which should be integrated into Fedora Media Writer.

Number two may be the only aspect of this issue which is within your sphere of influence.

bnordgren commented 3 years ago

Oh, and as I recall, the standard Fedora 33 Silverblue iso had problem number 1 but not problem number 2.

bgilbert commented 3 years ago

The fact that it's an ISO9660 image isn't supposed to be relevant, since we use isohybrid to add partition tables and set up bootloaders so that the dd'ed image looks like a regular disk. The firmware shouldn't have to understand ISO9660 at all.

For UEFI it turns out that we weren't doing this correctly; https://github.com/coreos/coreos-assembler/pull/1990 should fix that. However, the Silverblue ISO does do it correctly, and apparently won't boot on your machine. Here's a developer build of the FCOS ISO image with the GPT set up properly; you could try dd'ing that to a USB stick and see if it boots on your system. As long as you don't pass any coreos.inst kernel arguments, merely booting the live system shouldn't affect the installed OS.

There's some evidence on the Internet that some Dell machines won't correctly boot isohybridized media dd'ed to a USB stick. That could be an isohybrid and/or firmware bug, and thus the fix may lie with one of those tools. Again, I don't think we should necessarily try to support ISO images that have been restructured by some third-party tool.

henryleduc commented 3 years ago

I have been battling with this for a few days on both of my R710s I finally got it to work on one! If you want a workaround for the stable release until the above mentioned issue is implemented, you can use BIOS boot mode.

It has a BIOS version of 6.4.0

by using sudo dd if=/path/to/fedora-coreos-33.20201214.3.1-live.x86_64.iso of=/dev/sdb for burning the ISO to my USB.

Then while booting the server after the UEFI boot came up failing to find any bootable drives, you can go to system settings by hitting F2 and changing the BOOT mode to BIOS then once you go in here it will allow you to select the USB and it booted straight up from there for me.

If it's not working and you have an older BIOS version I'd reccomend quickly installing another distro to flash the BIOS and then try again.

This is a great tutorial for flashing the BIOS: https://major.io/2016/01/18/updating-dell-poweredge-bios-from-linux/

bgilbert commented 3 years ago

@henryleduc Great, thanks for the report! It makes sense that boot would work correctly in BIOS mode. Are you able to try the test image linked from https://github.com/coreos/fedora-coreos-tracker/issues/707#issuecomment-755056011 (or, at this point, a nightly testing-devel build) and report whether it works in UEFI mode?

bnordgren commented 3 years ago

Trying to boot in BIOS mode was the first thing I tried. No go. Error message is "isolinux.bin is missing or corrupt." (on the F33 Silverblue media.) Then it moves along to trying to PXE boot.

My BIOS version is 1.8.2 (2011) vs most current available of 1.14.0 (2018). The tutorial above doesn't address my machine, as there's no "BIN" variant of the download, or support for RHEL 7. It looks like I need to make a dos bootable USB stick in order to update.

bnordgren commented 3 years ago

Upgraded to latest BIOS for the PowerEdge T310 (1.14.0) and downloaded the current "live iso" from the testing stream linked above (fedora-coreos-33.20210117.20.0-live.x86_64.iso).

dd of the above to a USB stick, switching to BIOS mode and booting works as expected. Looks like this may be resolved for testing stream. I changed two things at once, so I can't say whether it was the BIOS upgrade or the changes in the ISO which resolved the issue.

bnordgren commented 3 years ago

PS: Where are my manners? I forgot to say thanks!

lessfoobar commented 3 years ago

I was asked to create a new issue with more information. Really, aside from "You can't install to certain Dell PowerEdge servers using an ISO9660 image" I don't know what to add. You either provide what works or expect it to fail.

I do not know the extent of this limitation on Dell's part. It fails on my R815, and since it's basically a firmware thing, I bet it has wider applicability but do not have the ability to explore further.

The problem is I can't boot when the bios is in UEFI. As soon as I change to BIOS it boots without a problem. I was told its a know issue and guess this version of fcos haven't fixed it. FCOS version: fedora-coreos-33.20210117.3.2-live.x86_64.iso Dell r815 Hardware Tag: 3svgsw1 Bios version 1.10 3.4.1

Hope this helps for resolving the issue

bgilbert commented 2 years ago

We've subsequently applied https://github.com/coreos/coreos-assembler/pull/2404 to fix UEFI systems that prefer or require booting from the ESP of a hard disk instead of El Torito, and https://github.com/coreos/coreos-assembler/pull/2435 to stop the ISO from adding itself to the UEFI boot variables. The first change in particular is likely to have fixed this issue, so I'll close. If you're still seeing this, please file a new issue with details of your hardware and the behavior you're seeing.