coreos / bugs

Issue tracker for CoreOS Container Linux
https://coreos.com/os/eol/
147 stars 30 forks source link

parted vs. cgpt - boot issues on bare metal #159

Closed tomgillett closed 8 years ago

tomgillett commented 10 years ago

Background information

I've got a bizarre issue running CoreOS on bare metal - problem is apparent on 4x machines.

After a clean install on dedicated hardware I have to run parted over the disk to ensure that the backup GPT is on the end of the disk. Without doing this, the machine hangs on reboot.

If I run parted, CoreOS will boot once - and only once. Any reboot after that stage fails to come back up. I have to contact my provider to find out why and apparently the message displayed on screen is "no boot devices, sda ok, hardware ok".

Debugging After some debugging with @marineam on IRC:

Conclusion

marineam: mr-spoon: ok, for the time being you aren't going to be able to do normal upgrades. I will review the differences between what parted and what cgpt spits out and see if I can figure out what may be going on.

tomgillett commented 10 years ago

Unfortunately, I'm still seeing the same issue (on 490.0.0); as requested by @marineam, some more information on the hardware in question. The following output is from the OVH Rescue 'netboot':

root@rescue:~# lscpu
Architecture:          x86_64
CPU op-mode(s):        32-bit, 64-bit
Byte Order:            Little Endian
CPU(s):                4
On-line CPU(s) list:   0-3
Thread(s) per core:    1
Core(s) per socket:    4
Socket(s):             1
NUMA node(s):          1
Vendor ID:             GenuineIntel
CPU family:            6
Model:                 42
Stepping:              7
CPU MHz:               1612.000
BogoMIPS:              6185.48
Virtualization:        VT-x
L1d cache:             32K
L1i cache:             32K
L2 cache:              256K
L3 cache:              6144K
NUMA node0 CPU(s):     0-3

root@rescue:~# lshw
rescue.ovh.net
    description: Desktop Computer
    product: ()
    vendor: DH67FC
    width: 32 bits
    capabilities: smbios-2.7 dmi-2.7 smp-1.4 smp vsyscall32
    configuration: boot=normal chassis=desktop cpus=4 uuid=C99843C0-106E-11E2-B579-4C72B99902F7
  *-core
       description: Motherboard
       product: DH61AG
       vendor: Intel Corporation
       physical id: 2
       version: AAG23736-505
       serial: BTAG24000BR8
       slot: To be filled by O.E.M.
     *-firmware
          description: BIOS
          vendor: Intel Corp.
          physical id: 0
          version: AGH6110H.86A.0045.2013.0121.1756
          date: 01/21/2013
          size: 64KiB
          capacity: 960KiB
          capabilities: pci upgrade shadowing cdboot bootselect socketedrom edd int13floppy1200 int13floppy720 int13floppy2880 int5printscreen int9keyboard int14serial int17printer acpi usb biosbootspecification uefi
     *-cpu:0
          description: CPU
          product: Intel(R) Core(TM) i5-2400 CPU @ 3.10GHz
          vendor: Intel Corp.
          physical id: 4
          bus info: cpu@0
          version: 6.10.7
          serial: 0002-06A7-0000-0000-0000-0000
          slot: LGA1155 CPU 1
          size: 3100MHz
          capacity: 4GHz
          width: 64 bits
          clock: 25MHz
          capabilities: x86-64 boot fpu fpu_exception wp vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx rdtscp constant_tsc arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc aperfmperf eagerfpu pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 cx16 xtpr pdcm pcid sse4_1 sse4_2 x2apic popcnt tsc_deadline_timer aes xsave avx lahf_lm ida arat epb xsaveopt pln pts dtherm tpr_shadow vnmi flexpriority ept vpid cpufreq
          configuration: cores=4 enabledcores=1 id=0
        *-cache:0
             description: L1 cache
             physical id: 5
             slot: L1-Cache
             size: 32KiB
             capacity: 32KiB
             capabilities: internal write-back unified
        *-cache:1
             description: L2 cache
             physical id: 6
             slot: L2-Cache
             size: 1MiB
             capacity: 1MiB
             capabilities: internal varies unified
        *-cache:2
             description: L3 cache
             physical id: 7
             slot: L3-Cache
             size: 6MiB
             capacity: 6MiB
             capabilities: internal unified
        *-logicalcpu:0
             description: Logical CPU
             physical id: 0.1
             width: 64 bits
             capabilities: logical
        *-logicalcpu:1
             description: Logical CPU
             physical id: 0.2
             width: 64 bits
             capabilities: logical
        *-logicalcpu:2
             description: Logical CPU
             physical id: 0.3
             width: 64 bits
             capabilities: logical
        *-logicalcpu:3
             description: Logical CPU
             physical id: 0.4
             width: 64 bits
             capabilities: logical
        *-logicalcpu:4
             description: Logical CPU
             physical id: 0.5
             width: 64 bits
             capabilities: logical
        *-logicalcpu:5
             description: Logical CPU
             physical id: 0.6
             width: 64 bits
             capabilities: logical
        *-logicalcpu:6
             description: Logical CPU
             physical id: 0.7
             width: 64 bits
             capabilities: logical
        *-logicalcpu:7
             description: Logical CPU
             physical id: 0.8
             width: 64 bits
             capabilities: logical
        *-logicalcpu:8
             description: Logical CPU
             physical id: 0.9
             width: 64 bits
             capabilities: logical
        *-logicalcpu:9
             description: Logical CPU
             physical id: 0.a
             width: 64 bits
             capabilities: logical
        *-logicalcpu:10
             description: Logical CPU
             physical id: 0.b
             width: 64 bits
             capabilities: logical
        *-logicalcpu:11
             description: Logical CPU
             physical id: 0.c
             width: 64 bits
             capabilities: logical
        *-logicalcpu:12
             description: Logical CPU
             physical id: 0.d
             width: 64 bits
             capabilities: logical
        *-logicalcpu:13
             description: Logical CPU
             physical id: 0.e
             width: 64 bits
             capabilities: logical
        *-logicalcpu:14
             description: Logical CPU
             physical id: 0.f
             width: 64 bits
             capabilities: logical
        *-logicalcpu:15
             description: Logical CPU
             physical id: 0.10
             width: 64 bits
             capabilities: logical
     *-memory
          description: System Memory
          physical id: 26
          slot: System board or motherboard
          size: 16GiB
        *-bank:0
             description: DIMM DDR3 Synchronous 1333 MHz (0.8 ns)
             product: 9905428-093.A00LF
             vendor: Kingston
             physical id: 0
             serial: 563D3409
             slot: SODIMM1
             size: 8GiB
             width: 64 bits
             clock: 1333MHz (0.8ns)
        *-bank:1
             description: DIMM DDR3 Synchronous 1333 MHz (0.8 ns)
             product: 9905428-093.A00LF
             vendor: Kingston
             physical id: 1
             serial: 623D813E
             slot: SODIMM2
             size: 8GiB
             width: 64 bits
             clock: 1333MHz (0.8ns)
     *-cpu:1
          physical id: 1
          bus info: cpu@1
          version: 6.10.7
          serial: 0002-06A7-0000-0000-0000-0000
          size: 3100MHz
          capabilities: vmx ht cpufreq
          configuration: id=2
        *-logicalcpu:0
             description: Logical CPU
             physical id: 2.1
             capabilities: logical
        *-logicalcpu:1
             description: Logical CPU
             physical id: 2.2
             capabilities: logical
        *-logicalcpu:2
             description: Logical CPU
             physical id: 2.3
             capabilities: logical
        *-logicalcpu:3
             description: Logical CPU
             physical id: 2.4
             capabilities: logical
        *-logicalcpu:4
             description: Logical CPU
             physical id: 2.5
             capabilities: logical
        *-logicalcpu:5
             description: Logical CPU
             physical id: 2.6
             capabilities: logical
        *-logicalcpu:6
             description: Logical CPU
             physical id: 2.7
             capabilities: logical
        *-logicalcpu:7
             description: Logical CPU
             physical id: 2.8
             capabilities: logical
        *-logicalcpu:8
             description: Logical CPU
             physical id: 2.9
             capabilities: logical
        *-logicalcpu:9
             description: Logical CPU
             physical id: 2.a
             capabilities: logical
        *-logicalcpu:10
             description: Logical CPU
             physical id: 2.b
             capabilities: logical
        *-logicalcpu:11
             description: Logical CPU
             physical id: 2.c
             capabilities: logical
        *-logicalcpu:12
             description: Logical CPU
             physical id: 2.d
             capabilities: logical
        *-logicalcpu:13
             description: Logical CPU
             physical id: 2.e
             capabilities: logical
        *-logicalcpu:14
             description: Logical CPU
             physical id: 2.f
             capabilities: logical
        *-logicalcpu:15
             description: Logical CPU
             physical id: 2.10
             capabilities: logical
     *-cpu:2
          physical id: 2
          bus info: cpu@2
          version: 6.10.7
          serial: 0002-06A7-0000-0000-0000-0000
          size: 3100MHz
          capabilities: vmx ht cpufreq
          configuration: id=4
        *-logicalcpu:0
             description: Logical CPU
             physical id: 4.1
             capabilities: logical
        *-logicalcpu:1
             description: Logical CPU
             physical id: 4.2
             capabilities: logical
        *-logicalcpu:2
             description: Logical CPU
             physical id: 4.3
             capabilities: logical
        *-logicalcpu:3
             description: Logical CPU
             physical id: 4.4
             capabilities: logical
        *-logicalcpu:4
             description: Logical CPU
             physical id: 4.5
             capabilities: logical
        *-logicalcpu:5
             description: Logical CPU
             physical id: 4.6
             capabilities: logical
        *-logicalcpu:6
             description: Logical CPU
             physical id: 4.7
             capabilities: logical
        *-logicalcpu:7
             description: Logical CPU
             physical id: 4.8
             capabilities: logical
        *-logicalcpu:8
             description: Logical CPU
             physical id: 4.9
             capabilities: logical
        *-logicalcpu:9
             description: Logical CPU
             physical id: 4.a
             capabilities: logical
        *-logicalcpu:10
             description: Logical CPU
             physical id: 4.b
             capabilities: logical
        *-logicalcpu:11
             description: Logical CPU
             physical id: 4.c
             capabilities: logical
        *-logicalcpu:12
             description: Logical CPU
             physical id: 4.d
             capabilities: logical
        *-logicalcpu:13
             description: Logical CPU
             physical id: 4.e
             capabilities: logical
        *-logicalcpu:14
             description: Logical CPU
             physical id: 4.f
             capabilities: logical
        *-logicalcpu:15
             description: Logical CPU
             physical id: 4.10
             capabilities: logical
     *-cpu:3
          physical id: 3
          bus info: cpu@3
          version: 6.10.7
          serial: 0002-06A7-0000-0000-0000-0000
          size: 3100MHz
          capabilities: vmx ht cpufreq
          configuration: id=6
        *-logicalcpu:0
             description: Logical CPU
             physical id: 6.1
             capabilities: logical
        *-logicalcpu:1
             description: Logical CPU
             physical id: 6.2
             capabilities: logical
        *-logicalcpu:2
             description: Logical CPU
             physical id: 6.3
             capabilities: logical
        *-logicalcpu:3
             description: Logical CPU
             physical id: 6.4
             capabilities: logical
        *-logicalcpu:4
             description: Logical CPU
             physical id: 6.5
             capabilities: logical
        *-logicalcpu:5
             description: Logical CPU
             physical id: 6.6
             capabilities: logical
        *-logicalcpu:6
             description: Logical CPU
             physical id: 6.7
             capabilities: logical
        *-logicalcpu:7
             description: Logical CPU
             physical id: 6.8
             capabilities: logical
        *-logicalcpu:8
             description: Logical CPU
             physical id: 6.9
             capabilities: logical
        *-logicalcpu:9
             description: Logical CPU
             physical id: 6.a
             capabilities: logical
        *-logicalcpu:10
             description: Logical CPU
             physical id: 6.b
             capabilities: logical
        *-logicalcpu:11
             description: Logical CPU
             physical id: 6.c
             capabilities: logical
        *-logicalcpu:12
             description: Logical CPU
             physical id: 6.d
             capabilities: logical
        *-logicalcpu:13
             description: Logical CPU
             physical id: 6.e
             capabilities: logical
        *-logicalcpu:14
             description: Logical CPU
             physical id: 6.f
             capabilities: logical
        *-logicalcpu:15
             description: Logical CPU
             physical id: 6.10
             capabilities: logical
     *-pci
          description: Host bridge
          product: 2nd Generation Core Processor Family DRAM Controller
          vendor: Intel Corporation
          physical id: 100
          bus info: pci@0000:00:00.0
          version: 09
          width: 32 bits
          clock: 33MHz
        *-display UNCLAIMED
             description: VGA compatible controller
             product: 2nd Generation Core Processor Family Integrated Graphics Controller
             vendor: Intel Corporation
             physical id: 2
             bus info: pci@0000:00:02.0
             version: 09
             width: 64 bits
             clock: 33MHz
             capabilities: msi pm vga_controller bus_master cap_list
             configuration: latency=0
             resources: memory:fe000000-fe3fffff memory:e0000000-efffffff ioport:f000(size=64)
        *-communication UNCLAIMED
             description: Communication controller
             product: 6 Series/C200 Series Chipset Family MEI Controller #1
             vendor: Intel Corporation
             physical id: 16
             bus info: pci@0000:00:16.0
             version: 04
             width: 64 bits
             clock: 33MHz
             capabilities: pm msi bus_master cap_list
             configuration: latency=0
             resources: memory:fe525000-fe52500f
        *-network
             description: Ethernet interface
             product: 82579V Gigabit Network Connection
             vendor: Intel Corporation
             physical id: 19
             bus info: pci@0000:00:19.0
             logical name: eth0
             version: 05
             serial: 4c:72:b9:99:02:f7
             size: 100Mbit/s
             capacity: 1Gbit/s
             width: 32 bits
             clock: 33MHz
             capabilities: pm msi bus_master cap_list ethernet physical tp 10bt 10bt-fd 100bt 100bt-fd 1000bt-fd autonegotiation
             configuration: autonegotiation=on broadcast=yes driver=e1000e driverversion=2.3.2-k duplex=full firmware=0.13-4 ip=176.31.182.89 latency=0 link=yes multicast=yes port=twisted pair speed=100Mbit/s
             resources: irq:43 memory:fe500000-fe51ffff memory:fe524000-fe524fff ioport:f080(size=32)
        *-usb:0
             description: USB controller
             product: 6 Series/C200 Series Chipset Family USB Enhanced Host Controller #2
             vendor: Intel Corporation
             physical id: 1a
             bus info: pci@0000:00:1a.0
             version: 05
             width: 32 bits
             clock: 33MHz
             capabilities: pm debug ehci bus_master cap_list
             configuration: driver=ehci-pci latency=0
             resources: irq:16 memory:fe523000-fe5233ff
        *-pci:0
             description: PCI bridge
             product: 6 Series/C200 Series Chipset Family PCI Express Root Port 1
             vendor: Intel Corporation
             physical id: 1c
             bus info: pci@0000:00:1c.0
             version: b5
             width: 32 bits
             clock: 33MHz
             capabilities: pci pciexpress msi pm normal_decode bus_master cap_list
             configuration: driver=pcieport
             resources: irq:40
        *-pci:1
             description: PCI bridge
             product: 6 Series/C200 Series Chipset Family PCI Express Root Port 2
             vendor: Intel Corporation
             physical id: 1c.1
             bus info: pci@0000:00:1c.1
             version: b5
             width: 32 bits
             clock: 33MHz
             capabilities: pci pciexpress msi pm normal_decode bus_master cap_list
             configuration: driver=pcieport
             resources: irq:41 memory:fe400000-fe4fffff
           *-usb
                description: USB controller
                product: uPD720200 USB 3.0 Host Controller
                vendor: NEC Corporation
                physical id: 0
                bus info: pci@0000:02:00.0
                version: 03
                width: 64 bits
                clock: 33MHz
                capabilities: pm msi msix pciexpress xhci bus_master cap_list
                configuration: driver=xhci_hcd latency=0
                resources: irq:17 memory:fe400000-fe401fff
        *-usb:1
             description: USB controller
             product: 6 Series/C200 Series Chipset Family USB Enhanced Host Controller #1
             vendor: Intel Corporation
             physical id: 1d
             bus info: pci@0000:00:1d.0
             version: 05
             width: 32 bits
             clock: 33MHz
             capabilities: pm debug ehci bus_master cap_list
             configuration: driver=ehci-pci latency=0
             resources: irq:23 memory:fe522000-fe5223ff
        *-isa
             description: ISA bridge
             product: H61 Express Chipset Family LPC Controller
             vendor: Intel Corporation
             physical id: 1f
             bus info: pci@0000:00:1f.0
             version: 05
             width: 32 bits
             clock: 33MHz
             capabilities: isa bus_master cap_list
             configuration: latency=0
        *-storage
             description: SATA controller
             product: 6 Series/C200 Series Chipset Family SATA AHCI Controller
             vendor: Intel Corporation
             physical id: 1f.2
             bus info: pci@0000:00:1f.2
             version: 05
             width: 32 bits
             clock: 66MHz
             capabilities: storage msi pm ahci_1.0 bus_master cap_list
             configuration: driver=ahci latency=0
             resources: irq:42 ioport:f0d0(size=8) ioport:f0c0(size=4) ioport:f0b0(size=8) ioport:f0a0(size=4) ioport:f060(size=32) memory:fe521000-fe5217ff
        *-serial UNCLAIMED
             description: SMBus
             product: 6 Series/C200 Series Chipset Family SMBus Controller
             vendor: Intel Corporation
             physical id: 1f.3
             bus info: pci@0000:00:1f.3
             version: 05
             width: 64 bits
             clock: 33MHz
             configuration: latency=0
             resources: memory:fe520000-fe5200ff ioport:f040(size=32)
     *-scsi
          physical id: 5
          logical name: scsi1
          capabilities: emulated
        *-disk
             description: ATA Disk
             product: TOSHIBA DT01ACA1
             vendor: Toshiba
             physical id: 0.0.0
             bus info: scsi@1:0.0.0
             logical name: /dev/sda
             version: MS2O
             serial: 6389XTVPS
             size: 931GiB (1TB)
             capabilities: partitioned partitioned:dos
             configuration: ansiversion=5 sectorsize=4096
           *-volume:0
                description: Windows FAT volume
                vendor: mkdosfs
                physical id: 1
                bus info: scsi@1:0.0.0,1
                logical name: /dev/sda1
                version: FAT16
                serial: 53f1-c861
                size: 127MiB
                capacity: 128MiB
                capabilities: primary bootable fat initialized
                configuration: FATs=2 filesystem=fat label=EFI-SYSTEM
           *-volume:1 UNCLAIMED
                description: EFI GPT partition
                physical id: 2
                bus info: scsi@1:0.0.0,2
                capacity: 2047KiB
                capabilities: primary nofs
  *-ide:0
       description: IDE Channel 0
       physical id: 0
       bus info: ide@0
       logical name: ide0
  *-ide:1
       description: IDE Channel 0
       physical id: 1
       bus info: ide@1
       logical name: ide1
  *-network:0 DISABLED
       description: Ethernet interface
       physical id: 3
       logical name: dummy0
       serial: ba:81:0b:9a:32:ba
       capabilities: ethernet physical
       configuration: broadcast=yes
  *-network:1 DISABLED
       description: Ethernet interface
       physical id: 4
       logical name: bond0
       serial: 36:b7:98:2e:b7:dc
       capabilities: ethernet physical
       configuration: autonegotiation=off broadcast=yes driver=bonding driverversion=3.7.1 firmware=2 link=no master=yes multicast=yes
marineam commented 10 years ago

Do you have control over the firmware version or at least configuration? I've seen odd things with the Intel firmware on another board and the system would switch between booting and not booting CoreOS based on the order in which I toggled assorted boot ordering options and whether 'UEFI' or 'Legacy BIOS' booting was enabled. My first suggestion would be to simply make sure both UEFI/Legacy are enabled, for me the system failed to find the boot disk properly if either were disabled. After toggeling either reboot into the BIOS config in order to get the boot device list to be correct. Secondly see if you can play with the boot ordering a little, at least one entry in the list should be to boot the disk via UEFI mode and another via legacy bios (although it won't be clearly labeled as such).

If you cannot find a stable configuration that reliably works try upgrading the firmware to the latest version, I see in the release notes that there are a number of fixes naming things like "Fixed issue where operating system fails to install." and "Fixed issue where Boot Device Priority cannot be changed."

tomgillett commented 10 years ago

Unfortunately, the servers are budget rented / dedicated boxes from OVH. The support on this range is minimal and so I have no control over firmware / BIOS settings / etc.

It may well be that the problem is unique to this hardware, firmware version etc. and so I do understand if there is nothing more that can be done.

marineam commented 10 years ago

I do find it interesting that lshw is reporting the hybrid MBR instead of the GPT table so there may still be a bug in cgpt regarding disk sizes 1TB in size, leaving the GPT legitimately invalid and ignored by the system. I'll investigate that further to make sure my testing didn't miss something there.

andypost commented 9 years ago

Confirm he trouble after update there's no way to reboot. on bare metal server (online.net PowerEdge R320) I always got "no such partition" from grub the next reboot does not find grub PS: checked on 2 servers

crawford commented 9 years ago

@marineam is there anything more to be done with this?

marineam commented 8 years ago

In the current beta and alpha versions a major corruption bug in GRUB has finally been fixed and cgpt repair has also been greatly improved. I don't know for sure if those changes fix this issue or not but at the very least troubleshooting further should be easier.

I'm closing this bug due to its old age but feel free to retest and reopen if there is still a problem on these systems.

nailgun commented 8 years ago

@marineam

Unfortunately I'm experiencing the same issue on OVH dedicated server. I'm ready to debug this. Can you direct me what should I do? I've already tried to install CoreOS from stable, beta and alpha channels with no luck. Fixing GPT table using parted or cgpt repair didn't help too.

Now I've ordered a KVM-over-IP interface and waiting for it to be installed.

BTW. I don't believe in that now, but my first installation on the same machine did boot well. After that I've reinstalled CoreOS and the second installation as well as any following didn't boot. It's quite weird.

P.S. Please reopen the ticket. I can't do that.

nailgun commented 8 years ago

I've got an access to the bare-metal terminal. But it's only for 1 day. So any help in debugging is highly appreciated.

nailgun commented 8 years ago

OK, here is my report. :) The problem is actually GPT.

I will start from the beginning. From the hosting control panel you can setup a boot source for any server. It can be a Hard disk, iPXE, Rescue image or Rescue kernel.

By reading BIOS boot messages I've found that server is always configured to boot from PXE, even if you selected Hard disk source. The only thing changes is which PXE image datacenter's DHCP server returns. In case if you selected Hard disk, returned PXE image just tries to boot from local disk.

Looks like this PXE image determines that CoreOS boot partition is not valid and hangs asking you to insert valid bootable media. I've recorded video with these BIOS messages.

If I manually select hard drive in boot menu, the system boots normally. I don't know implementation details about how GPT works, but looks like its boot code is in the same location as in MBR.

I don't like to contact OVH tech support. I'm sure they won't change their production solution for a lone hacker. Another solution is to change BIOS config for every server to boot from HDD by default, but this is time and cost ineffective.

I've tried to use iPXE bootloader's command sanboot with no luck. It also checks MBR magic bytes like a PXE.

Also I've tried to dump first 512 bytes of an HDD and use them as an image in iPXE's chain command. This just prints GRUB to the terminal and nothing more.

For now I don't have any ideas how to deal with GPT on OVH.

crawford commented 8 years ago

@nailgun unfortunately, @marineam is on holiday until December. I'll re-open this and assign it to him.

nailgun commented 8 years ago

I've asked Stack Overflow for the help.

crawford commented 8 years ago

I looked at this again a bit more closely and now I'm quite confused. CoreOS Linux supports a hybrid boot pattern (either MBR or GPT will work). If you dump the first 512 bytes of the disk (xxd -l 512 -g 1 /dev/vda), you'll see the GRUB stage-1 and the 0x55 0xAA boot signature. If you mount the first partition, you'll also see the EFI/boot directory containing an EFI application (GRUB, in this case). If your provider thinks this image is unbootable, either its firmware is confused by the hybrid pattern or it's looking at the wrong disk. Is there just one disk attached to this machine? Can you confirm that these machines are supposed to be able to boot MBR and/or GPT? You might need to file a bug with them if this turns out to be a firmware issue.

nailgun commented 8 years ago

After some struggling I came to the same. Looks like it's firmware issue. I've filed a ticket to OVH support and waiting for answer.

Anyway I am confused why iPXE can't boot from the disk. I tried this command sandisk --no-describe --drive 0x80. Does it still depend on firmware?

Can you confirm that these machines are supposed to be able to boot MBR and/or GPT?

The server is legacy BIOS, so it able to boot MBR drives. There is no problem with classic MBRs.

If you dump the first 512 bytes of the disk (xxd -l 512 -g 1 /dev/vda), you'll see the GRUB stage-1 and the 0x55 0xAA boot signature. If you mount the first partition, you'll also see the EFI/boot directory containing an EFI application (GRUB, in this case).

Yeah, I've checked this too. So it's very confusing.

Is there just one disk attached to this machine?

There're 3 SSD disks. But I've installed other OSes on the same drive and they did boot.

crawford commented 8 years ago

Does the suggestion listed on http://www.syslinux.org/archives/2006-August/007139.html help at all? Not sure if you have control over the PXE menu.

nailgun commented 8 years ago

Normally I can't control PXE menu (need to order KVM-over-IP console for a limited time). But I have a limited control over PXE server and can select one of 4 PXE configs:

PXE configs from the links above I've got from datacenter's DHCP server using tftp client.

Nor PXE's localboot 0 nor iPXE's sandisk --no-describe --drive 0x80 don't work. They don't identify drive as bootable for some reason. At this video I'm trying to boot using PXE's localboot 0. iPXE gives the same result.

Also I have tried to reproduce the same setup on VirtualBox. I've installed CoreOS on a disk, then used iPXE iso image to boot with sandisk command. And it works well. So looks like iPXE still depends on firmware to boot from a disk.

crawford commented 8 years ago

Okay, thanks for digging in. I don't think there is much we can do here without more info from the provider. It looks like something is wrong with the firmware on those machines.

nailgun commented 8 years ago

Yeah. I spent 25 euros :) and many hours to understand how their servers boot. Hope they will try to do something with this.

felixkrohn commented 8 years ago

Hi, This particular mainboard has already caused plenty of headache, it is indeed broken in relation to GPT. (many issues) If you want to use this server, try and install it in MBR format, not GPT. Other possibility would be to use another server with a compatible mainboard, in which case you can simply install coreOS from the OVH customer interface ("stable" and "alpha" channels are available) The setting that the server first tries to boot from the NIC is wanted and needed, without this you won't be able to reboot in rescue (nor reinstall the server).

nailgun commented 8 years ago

@felixkrohn thanks! Can you please give an advice on OVH server that I can order that is comparable to E3-SSD-3 about performance and cost and which is compatible with CoreOS?

Maybe better is to list all KS, SYS and OVH servers here which are compatible.

Also can you comment why this MB doesn't boot in this particular case (CoreOS has MBR compatible bootsector)? If you select the drive manually through the boot menu, it works.

crawford commented 8 years ago

@felixkrohn thanks for confirming. Sounds like this is an OVH-specific issue. I'm going to close this out.

felixkrohn commented 8 years ago

@nailgun My colleague sent you a list this morning, hope you were able to pick one of your liking. @crawford ACK for closing (Just want to add for future search karma: it's not purely OVH-specific but Intel DH67BL/DH61AG/DH67VR-specific - of which we happen to have a certain number)

nailgun commented 8 years ago

@felixkrohn

I've found that setting Boot -> UEFI Boot BIOS option to Enabled fixes all issues. And this doesn't break legacy MBR boot sequence. See screenshot of the option.

Is it possible to request setting this option before server installation? Ordering KVM-IP for every installation is too expensive and too long.

felixkrohn commented 8 years ago

@nailgun it actually does break things, because when the firmware finds a GPT partition table on disk it will skip netboot, and thus thwart all future attempt to boot in rescue (or reinstallation) mode :-(

nailgun commented 8 years ago

Sorry, you're right.

nailgun commented 8 years ago

Anyway I don't understand why iPXE doesn't detect MBR on disk. This is the output of my custom built iPXE image with debug enabled. This is the source code of MBR check.

Looks like BIOS doesn't map the disk to 0x80.

crawford commented 7 years ago

@felixkrohn excellent. I'll add those machines to my blacklist.