SerenityOS / serenity

The Serenity Operating System 🐞
https://serenityos.org
BSD 2-Clause "Simplified" License
29.87k stars 3.15k forks source link

Kernel: !m_storage_devices.is_empty() assertion failure on boot #10177

Open thankyouverycool opened 2 years ago

thankyouverycool commented 2 years ago

Run into this periodically booting i686 in ubuntu qemu without kvm

[Kernel]: Loading kernel symbol table... [Kernel]: CPU[0]: Supported features: nx pae pge rdrand smap smep sse tsc rdtscp constant_tsc sep mmx sse2 sse3 ssse3 sse4.1 sse4.2 xsave fxsr hypervisor [Kernel]: CPU[0]: Physical address bit width: 36 [#0 Kernel]: Initializing unhandled interrupt handlers [Kernel]: CPU[0]: CPUID hypervisor signature 'TCGTCGTCGTCG' (0x54474354 0x43544743 0x47435447), max leaf 0x40000001 [Kernel]: Kernel Commandline: Kernel/Prekernel/Prekernel hello [Kernel]: MM: Multiboot mmap: address=0x00000000, length=654336, type=1 [Kernel]: MM: Got an unaligned physical_region from the bootloader; correcting length 654336 by 3072 bytes [Kernel]: MM: Multiboot mmap: address=0x0009fc00, length=1024, type=2 [Kernel]: MM: Multiboot mmap: address=0x000f0000, length=65536, type=2 [Kernel]: MM: Multiboot mmap: address=0x00100000, length=535691264, type=1 [Kernel]: MM: Multiboot mmap: address=0x1ffe0000, length=131072, type=2 [Kernel]: MM: Multiboot mmap: address=0xfffc0000, length=262144, type=2 [Kernel]: MM: Contiguous reserved range from P000000000009fc00, length is 394240 [Kernel]: MM: Contiguous reserved range from P000000001ffe0000, length is 3758227456 [Kernel]: MM: boot_pdpt @ P000000000010f000 [Kernel]: MM: boot_pd0 @ P0000000000110000 [Kernel]: MM: boot_pd_kernel @ P0000000000131000 [Kernel]: MM: Physical page entries: 0xc1e01000 - 0xc2601fff (size 0x00801000) [Kernel]: MM: Low memory range @ P0000000000000000 - P00000000000fffff (size 0x100000) [Kernel]: MM: Prekernel range @ P0000000000100000 - P0000000000153fff (size 0x54000) [Kernel]: MM: Kernel range @ P0000000000200000 - P0000000001ca6fff (size 0x1aa7000) [Kernel]: MM: Physical Pages range @ P0000000001ca8000 - P00000000024adfff (size 0x806000) [Kernel]: MM: Super physical region: P0000000000201000 - P0000000000300fff (size 0x100000) [Kernel]: 1x PhysicalZone (1 MiB) @ 0000000000201000-0000000000200fff [Kernel]: MM: User physical region: P0000000000155000 - P00000000001fefff (size 0xaa000) [Kernel]: MM: User physical region: P00000000024ae000 - P000000001ffdffff (size 0x1db32000) [Kernel]: 29x PhysicalZone (16 MiB) @ 00000000024ae000-000000001e4adfff [Kernel]: * 11x PhysicalZone (1 MiB) @ 000000001f4ae000-000000001feadfff [#0 Kernel]: Early access to ACPI tables for interrupt setup [Kernel]: Interrupts: Switch to Legacy PIC mode [Kernel]: PIC: Cascading mode, vectors 0x50-0x5f [#0 Kernel]: Trying to unregister unused handler (?) [#0 Kernel]: Trying to unregister unused handler (?) [#0 Kernel]: Interrupts: Detected Dual i8259 [Kernel]: ACPI: Using RSDP @ P00000000000f5a90 [Kernel]: ACPI: Main Description Table valid? true [Kernel]: ACPI: Using RSDT, enumerating tables @ P000000001ffe2b7b [Kernel]: ACPI: RSDT revision 1, total length: 52 [Kernel]: ACPI: Initializing Fixed ACPI data [Kernel]: ACPI: Fixed ACPI data, Revision 1, length: 116 bytes [Kernel]: ACPI: DSDT P000000001ffe0040 [Kernel]: RTC: Year: 2021, month: 9, day: 18, hour: 11, minute: 46, second: 20 [Kernel]: HPET @ P000000001ffe2b1b [Kernel]: HPET: Minimum clock tick - 0 [Kernel]: HPET: Timers count - 3 [Kernel]: HPET: Main counter size: 64-bit [Kernel]: HPET: Timer[0] comparator size: 64-bit, mode: 64-bit [Kernel]: HPET: Timer[1] comparator size: 64-bit, mode: 64-bit [Kernel]: HPET: Timer[2] comparator size: 64-bit, mode: 64-bit [Kernel]: HPET: frequency 100000000 Hz (100 MHz) resolution: 10 ns [#0 Kernel]: Trying to unregister unused handler (?) [#0 Kernel]: Trying to unregister unused handler (?) [#0 Kernel]: HPET: Setting appropriate functions to timers. [#0 Kernel]: Time: Scanning for periodic timers [#0 Kernel]: Time: Scanning for non-periodic timers [Kernel]: KernelRng: Using RDSEED or RDRAND as entropy source [colonel(0:0)]: Starting SerenityOS... init_stage2(1:1): PCI: Using I/O instructions for PCI configuration space access init_stage2(1:1): PCI [0000:00:00:00] PCI::ID 8086:1237: PCI [0000:00:01:00] PCI::ID 8086:7000: PCI [0000:00:01:01] PCI::ID 8086:7010: PCI [0000:00:01:02] PCI::ID 8086:7020: PCI [0000:00:01:03] PCI::ID 8086:7113: PCI [0000:00:02:00] PCI::ID 1234:1111: PCI [0000:00:03:00] PCI::ID 1af4:1003: PCI [0000:00:04:00] PCI::ID 1af4:1005: PCI [0000:00:05:00] PCI::ID 1b36:0001: PCI [0000:01:00:00] PCI::ID 8086:100e: PCI [0000:01:01:00] PCI::ID 8086:244e: PCI [0000:02:00:00] PCI::ID 1b36:0007: PCI [0000:00:06:00] PCI::ID 8086:244e: PCI [0000:03:00:00] PCI::ID 1b36:0007: PCI [0000:03:01:00] PCI::ID 8086:2922: PCI [0000:00:07:00] PCI::ID 8086:100e: VMWareBackdoor: Enabling absolute mouse mode [#0 init_stage2(1:1)]: I8042: Dual channel controller init_stage2(1:1): PS2MouseDevice: Mouse wheel enabled! init_stage2(1:1): PS2MouseDevice: 5 buttons enabled! [#0 init_stage2(1:1)]: Trying to unregister unused handler (?) [#0 init_stage2(1:1)]: Trying to unregister unused handler (?) [#0 init_stage2(1:1)]: Framebuffer Console: taking 3145728 bytes [#0 init_stage2(1:1)]: BochsGraphicsAdapter: resolution set to 1024x768 [#0 init_stage2(1:1)]: Framebuffer Console: taking 3145728 bytes [#0 init_stage2(1:1)]: Framebuffer 0: address=P00000000f8000000, pitch=4096, width=1024, height=768 [#0 init_stage2(1:1)]: Graphics adapter @ PCI [0000:00:02:00] is operating in VGA mode [#0 init_stage2(1:1)]: VC 0: Resized to 128 x 96 [#0 init_stage2(1:1)]: VC 1: Resized to 128 x 96 [#0 init_stage2(1:1)]: VC 2: Resized to 128 x 96 [#0 init_stage2(1:1)]: VC 3: Resized to 128 x 96 [#0 init_stage2(1:1)]: VC 4: Resized to 128 x 96 [#0 init_stage2(1:1)]: VC 5: Resized to 128 x 96 [#0 SyncTask(3:3)]: SyncTask is running init_stage2(1:1): UHCI: Controller found PCI::ID [8086:7020] @ PCI 0000:00:01:02: UHCI: I/O base IO e080 init_stage2(1:1): UHCI: Interrupt line: 11 [#0 init_stage2(1:1)]: UHCI: Allocated framelist at physical address P0000000000300000 [#0 init_stage2(1:1)]: UHCI: Framelist is at virtual address V0xc2a38000 [#0 init_stage2(1:1)]: UHCI: QH(0xc2a3afe0) @ 0x29c6fe0: link_ptr=0x29c6fc2, element_link_ptr=0x0001 [#0 init_stage2(1:1)]: UHCI: QH(0xc2a3afc0) @ 0x29c6fc0: link_ptr=0x29c6fa2, element_link_ptr=0x0001 [#0 init_stage2(1:1)]: UHCI: QH(0xc2a3afa0) @ 0x29c6fa0: link_ptr=0x29c6f82, element_link_ptr=0x0001 [#0 init_stage2(1:1)]: UHCI: QH(0xc2a3af80) @ 0x29c6f80: link_ptr=0x29c6f62, element_link_ptr=0x0001 [#0 init_stage2(1:1)]: UHCI: QH(0xc2a3af60) @ 0x29c6f60: link_ptr=0x29c5fe1, element_link_ptr=0x0001 [#0 init_stage2(1:1)]: UHCI: Reset completed [#0 init_stage2(1:1)]: UHCI: Started [#0 init_stage2(1:1)]: BIOSSysFSDirectory: SMBIOS 32bit Entry point @ P00000000000f5ab0 [#0 init_stage2(1:1)]: BIOSSysFSDirectory: Data table @ P00000000000f5ad0 [#0 init_stage2(1:1)]: VirtIOConsole: Found @ PCI [0000:00:03:00] [#0 init_stage2(1:1)]: Trying to unregister unused handler (?) [#0 init_stage2(1:1)]: VirtIOConsole: Features not accepted by host! [#0 init_stage2(1:1)]: VirtIORNG: Found @ PCI [0000:00:04:00] [#0 init_stage2(1:1)]: VirtIORNG: Features not accepted by host! init_stage2(1:1): E1000: Found @ PCI 0000:01:00:00: E1000: port base: IO d000 init_stage2(1:1): E1000: MMIO base: P00000000fe680000 init_stage2(1:1): E1000: MMIO base size: 131072 bytes init_stage2(1:1): E1000: Interrupt line: 10 init_stage2(1:1): E1000: Has EEPROM? true init_stage2(1:1): E1000: MAC address: 52:54:00:12:34:56 [#0 init_stage2(1:1)]: Trying to unregister unused handler (?) init_stage2(1:1): E1000: Found @ PCI 0000:00:07:00: E1000: port base: IO e040 init_stage2(1:1): E1000: MMIO base: P00000000fea80000 init_stage2(1:1): E1000: MMIO base size: 131072 bytes init_stage2(1:1): E1000: Interrupt line: 11 init_stage2(1:1): E1000: Has EEPROM? true init_stage2(1:1): E1000: MAC address: 52:54:00:12:34:57 init_stage2(1:1): SB16: Found version 4.5 init_stage2(1:1): SB16: IRQ 5 [#0 init_stage2(1:1)]: SB16: Changing sample rate to 44100 Hz [#0 init_stage2(1:1)]: IDE controller @ PCI [0000:00:01:01]: bus master base was set to IO e0c0 [#0 init_stage2(1:1)]: IDE controller @ PCI [0000:00:01:01]: interrupt line was set to 0 [#0 init_stage2(1:1)]: IDE controller @ PCI [0000:00:01:01]: ISA Compatibility mode-only controller, supports bus mastering [#0 init_stage2(1:1)]: IDE controller @ PCI [0000:00:01:01]: primary channel DMA capable? 0 [#0 init_stage2(1:1)]: IDE controller @ PCI [0000:00:01:01]: secondary channel DMA capable? 0 [#0 init_stage2(1:1)]: Trying to unregister unused handler (?) init_stage2(1:1): PCI [0000:03:01:00]: AHCI controller reset [#0 init_stage2(1:1)]: PCI [0000:03:01:00]: AHCI command list entries count - 32 init_stage2(1:1): AHCI Port 1: Device not detected, Phy not enabled init_stage2(1:1): AHCI Port 2: Device not detected, Phy not enabled init_stage2(1:1): AHCI Port 3: Device not detected, Phy not enabled init_stage2(1:1): AHCI Port 4: Device not detected, Phy not enabled init_stage2(1:1): AHCI Port 5: Device not detected, Phy not enabled init_stage2(1:1): AHCI Port 6: Device not detected, Phy not enabled init_stage2(1:1): ASSERTION FAILED: !m_storage_devices.is_empty() init_stage2(1:1): ../../Kernel/Storage/StorageManagement.cpp:97 in void Kernel::StorageManagement::enumerate_disk_partitions() const init_stage2(1:1): KERNEL PANIC! :^(

init_stage2(1:1): at ../../Kernel/Arch/x86/common/CPU.cpp:33 in void abort() init_stage2(1:1): Kernel + 0x004c2283 Kernel::__panic(char const, unsigned int, char const) +0x11c init_stage2(1:1): Kernel + 0x00687b14 illegal_instruction_asm_entry +0x0 init_stage2(1:1): Kernel + 0x006878cb abort.localalias +0x0 init_stage2(1:1): Kernel + 0x0077d9da Kernel::StorageManagement::enumerate_disk_partitions() const [clone .localalias] +0x38a init_stage2(1:1): Kernel + 0x00780381 Kernel::StorageManagement::initialize(AK::String, bool) +0x51f init_stage2(1:1): Kernel + 0x0067a45d Kernel::init_stage2(void*) +0x1428 init_stage2(1:1): Kernel + 0x0068d79c exit_kernel_thread +0x0

supercomputer7 commented 2 years ago

I'll try to reproduce this on my local setup. It seems like it doesn't detect any harddrive, which is weird, but I guess it's something simple to debug :)

supercomputer7 commented 2 years ago

I tried to run QEMU without KVM multiple times, both in i440FX and Q35 chipset modes, without any success to reproduce this.

supercomputer7 commented 2 years ago

It'd be cool if you can give more details what you tried to do, and what is the QEMU version you run with.

thankyouverycool commented 2 years ago

Using qemu 5.2.0 I was able to panic 3 times out of the last 50 boots. I just realized my boot alias uses the old ninja install && ninja image && ninja run commands inside /Build/i686. Dunno if this is still acceptable since superbuild changes.

supercomputer7 commented 2 years ago

I usually invoke build and running with:

sudo SERENITY_RUN=q35 SERENITY_KERNEL_CMDLINE= SERENITY_SCREENS= SERENITY_QEMU_DISPLAY_BACKEND=gtk SERENITY_RAM_SIZE=4096M ninja setup-and-run

in the Build/i686 directory (I remove the q35 if needed to boot with i440fx PC chipset). It's definitely a race condition, I'm just unsure where it's actually the place of that race. I also run with QEMU 6.0.0, so I suggest to check it to see if it helps :)

bgianfo commented 2 years ago

This is also impacting CI occasionally:

awesomekling commented 2 years ago

Dang, this happened again on CI. So not fixed by 05ed8d17385e6965fd66c42758d332bd7af741c0, reopening.

IzKuipers commented 3 days ago

Hi all, I know this issue doesn't appear to be active anymore, and I'm sorry if this is not the place to put it (though the discord invite is invalid?), but I thought I would document my findings regarding this specific issue. Over the course of today I've attempted to run Serenity OS (commit hash 1146284d86, built using the build guide on Linux Mint) on various devices, ranging from Pentium 4 to Core i5. Unfortunately, all devices panicked in the exact same way, no matter the configuration.

Tests were conducted using the GRUB image on both a 16GB USB 2.0 Silicon Power flash drive and a 32GB SanDisk USB 3.0 drive. The devices are listed in this neat table. Keep in mind that AHCI was disabled for all my tests:

Device CPU CPU speed Disk controller Flash drive(s)
HP Compaq CQ2100NL Intel Atom 230 1.6GHz SATA SanDisk & SP
HP Compaq 6200 Pro Intel Core i5-2400 3.1GHz SATA SP
Custom build AMD Athlon II X2-260 3.21GHz SATA SP
Custom build Pentium 4 2.4GHz IDE SanDisk & SP
Dell Latitude E6500 Core 2 Duo T9400 2.53GHz SATA SanDisk & SP
Lenovo ThinkPad T510 Core i5-520m 2.4GHz SATA SanDisk & SP

The crash:

[init_stage2(1:1)]: ASSERTION FAILED: !m_controllers.is_empty()
[init_stage2(1:1)]: ./Kernel/Devices/Storage/StorageManagement.cpp:172 in void Kernel::StorageManagement::enumerate_storage_devices()
[init_stage2(1:1)]: KERNEL PANIC! :^(
[init_stage2(1:1)]: Aborted
[init_stage2(1:1)]: at ./Kernel/Arch/x86_64/CPU.cpp:36 in void abort()
[init_stage2(1:1)]: Kernel + 0x000000000067f97a  Kernel::__panic(char const*, unsigned int, char const*) +0x9a
[init_stage2(1:1)]: Kernel + 0x0000000000ed9226  abort +0x22e
[init_stage2(1:1)]: Kernel + 0x0000000000ed8ff8  abort +0x0
[init_stage2(1:1)]: Kernel + 0x000000000112e58f  Kernel::StorageManagement::enumerate_storage_devices() +0xbef
[init_stage2(1:1)]: Kernel + 0x0000000001131682  Kernel::StorageManagement::initialize(bool) +0x32
[init_stage2(1:1)]: Kernel + 0x0000000000003ef4  Kernel::init_stage2(void*) +0x1554

I have no clue if any of this is useful to anyone, but what the hell.

spholz commented 3 days ago

Hi! Our custom discord invite doesn't appear to work anymore, so here is an old invite link which seems to still work: https://discord.com/invite/29gCcKsXkF. This issue is probably not the right place for this (this issue also probably can be closed).

Keep in mind that AHCI was disabled for all my tests

We don't support IDE anymore since 2cb86c13097c89194af51b185d1f7d3ee8be4882. AHCI and NVMe should work though. Also note that while we do support UHCI and USB mass storage devices (xHCI support isn't merged yet), nobody ever tested the mass storage driver on bare metal I think. So you probably would need to manually write the disk image from another OS onto one of your drives.