coreos / bugs

Issue tracker for CoreOS Container Linux
https://coreos.com/os/eol/
147 stars 30 forks source link

grub appears to have a limit of disks #2124

Closed kbrwn closed 6 years ago

kbrwn commented 6 years ago

Issue Report

Bug

The number of disks that grub can recognize is limited.

Container Linux Version

Stable 1465.6.0

Beta 1492.5.0

Alpha 1506.0.0

Environment

bare metal

Expected Behavior

pxe boot coreos node, use coreos-install script to install to 19th attached disk, reboot node, select usr-a, boot continues as normal.

Actual Behavior

pxe boot coreos node, use coreos-install script to install to 19th attached disk, reboot node, select usr-a, boot fails with classic linux boot issue of grub being unable to load the kernel and initial RAM disk.

Other Info

From pxe node:

# lsblk 
NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT 
sda 8:0 0 953.9G 0 disk 
sdb 8:16 0 953.9G 0 disk 
sdc 8:32 0 7.3T 0 disk 
sdd 8:48 0 7.3T 0 disk 
sde 8:64 0 7.3T 0 disk 
sdf 8:80 0 7.3T 0 disk 
sdg 8:96 0 7.3T 0 disk 
sdh 8:112 0 7.3T 0 disk 
sdi 8:128 0 7.3T 0 disk 
sdj 8:144 0 7.3T 0 disk 
sdk 8:160 0 7.3T 0 disk 
sdl 8:176 0 7.3T 0 disk 
sdm 8:192 0 7.3T 0 disk 
sdn 8:208 0 7.3T 0 disk 
sdo 8:224 0 7.3T 0 disk 
sdp 8:240 0 7.3T 0 disk 
sdq 65:0 0 7.3T 0 disk 
sdr 65:16 0 7.3T 0 disk 
sds 65:32 0 477G 0 disk 
|-sds1 65:33 0 128M 0 part 
|-sds2 65:34 0 2M 0 part 
|-sds3 65:35 0 1G 0 part 
|-sds4 65:36 0 1G 0 part 
|-sds6 65:38 0 128M 0 part 
|-sds7 65:39 0 64M 0 part 
`-sds9 65:41 0 2.1G 0 part 
loop0 7:0 0 222M 0 loop /usr

grub showing only 16 disks:

grub-16

bgilbert commented 6 years ago

Is this in BIOS mode? If so, does switching to EFI boot make a difference?

am04 commented 6 years ago

@bgilbert How do I know which mode is this boot in? Thanks

bgilbert commented 6 years ago

At the GRUB prompt, type:

echo $grub_platform

It will say pc or efi.

am04 commented 6 years ago

pc

bgilbert commented 6 years ago

What's the hardware model?

If the firmware config menus have an option to boot via EFI instead of BIOS, you could give that a try.

am04 commented 6 years ago

Radisys TOCP-SSLED-CFG3

There's EFI shell, but that is a manual. let me dig out more.

am04 commented 6 years ago

It works with EFI boot. All coreOS partitions can be located by grub. However, the node can not be PXE-booted anymore. It falls back to booting from disk.

bgilbert commented 6 years ago

It's possible that the problem is firmware-specific. For example, on QEMU in BIOS mode, GRUB only sees the first 6 disks. It's interesting, though, that the firmware is able to load GRUB but GRUB can't read the disk it was loaded from.

Does changing the boot order in the firmware avoid the issue for you?

Network boot should work with EFI, but it's a bit awkward right now (https://github.com/coreos/bugs/issues/2151). The easiest way to do it is to boot an iPXE EFI image and then follow the iPXE instructions.

bgilbert commented 6 years ago

Thank you for reporting this issue. Unfortunately, we don't think we'll end up addressing it in Container Linux.

As we recently announced, we're working on a successor to Container Linux, and we expect most major development to occur there instead. Meanwhile, Container Linux won't see many new features, but will still be fully maintained into 2020. Stay tuned for more details about that.