SerenityOS / serenity

The Serenity Operating System 🐞
https://serenityos.org
BSD 2-Clause "Simplified" License
30.59k stars 3.19k forks source link

ASSERTION FAILED: VirtualAddress(fault_at) >= VirtualAddress(dest_ptr) #10382

Open diversys opened 3 years ago

diversys commented 3 years ago

macOS 10.14.6 host, Qemu 6.0.0

[init_stage2(1:1)]: ASSERTION FAILED: VirtualAddress(fault_at) >= VirtualAddress(dest_ptr) && VirtualAddress(fault_at) <= VirtualAddress((FlatPtr)dest_ptr + n)
[init_stage2(1:1)]: ./Kernel/StdLib.cpp:174 in Kernel::KResult copy_to_user(void*, const void*, size_t)
[init_stage2(1:1)]: KERNEL PANIC! :^(
[init_stage2(1:1)]: Aborted
[init_stage2(1:1)]: at ./Kernel/Arch/x86/common/CPU.cpp:35 in void abort()
IdanHo commented 3 years ago

x86 or x86_64? Another check was recently added to safe_memcpy that checks for canonical x86_64 addresses.

diversys commented 3 years ago

x86_64.

IdanHo commented 3 years ago

Aka that assertion means the kernel address was not canonical (or that the check I added is flawed), you can try swapping the dbgln just after the assertion and the assertion itself (if you can reproduce it) so that we can confirm.

diversys commented 3 years ago

It is reproducible on every boot. What exact file I can find the assert?

IdanHo commented 3 years ago

./Kernel/StdLib.cpp:174 (as written in the log you posted)

diversys commented 3 years ago
diff --git a/Kernel/StdLib.cpp b/Kernel/StdLib.cpp
index edfe4bfc5..3e46626fc 100644
--- a/Kernel/StdLib.cpp
+++ b/Kernel/StdLib.cpp
@@ -171,8 +171,8 @@ KResult copy_to_user(void* dest_ptr, void const* src_ptr, size_t n)
     Kernel::SmapDisabler disabler;
     void* fault_at;
     if (!Kernel::safe_memcpy(dest_ptr, src_ptr, n, fault_at)) {
-        VERIFY(VirtualAddress(fault_at) >= VirtualAddress(dest_ptr) && VirtualAddress(fault_at) <= VirtualAddress((FlatPtr)dest_ptr + n));
         dbgln("copy_to_user({:p}, {:p}, {}) failed at {}", dest_ptr, src_ptr, n, VirtualAddress { fault_at });
+        VERIFY(VirtualAddress(fault_at) >= VirtualAddress(dest_ptr) && VirtualAddress(fault_at) <= VirtualAddress((FlatPtr)dest_ptr + n));
         return EFAULT;
[#0 init_stage2(1:1)]: copy_to_user(0x000000001128d648, 0x000000200712d648, 3800) failed at V0x000000200712d648
[init_stage2(1:1)]: ASSERTION FAILED: VirtualAddress(fault_at) >= VirtualAddress(dest_ptr) && VirtualAddress(fault_at) <= VirtualAddress((FlatPtr)dest_ptr + n)
[init_stage2(1:1)]: ./Kernel/StdLib.cpp:175 in Kernel::KResult copy_to_user(void*, const void*, size_t)
[init_stage2(1:1)]: KERNEL PANIC! :^(
[init_stage2(1:1)]: Aborted
[init_stage2(1:1)]: at ./Kernel/Arch/x86/common/CPU.cpp:35 in void abort()
IdanHo commented 3 years ago

Early on in the boot you should have gotten a Virtual address bit width: xxx log message, can you check what your virtual address bit width is?

diversys commented 3 years ago

[Kernel]: CPU[0]: Physical address bit width: 36 [Kernel]: CPU[0]: Virtual address bit width: 32

IdanHo commented 3 years ago

Well then yes, that kernel address is not canonical, but that's a very weird virtual address bit width, since most newer x86_64 CPUs have a 48 bit virtual address width (I have a feeling that our current Prekernel assumes that 48 bits are available, and as such places parts of the kernel above the canonical address range), it's even weirder that your QEMU only supports 32 bits.

IdanHo commented 3 years ago

In fact, qemu's tcg doesn't even support a value lower than 48: https://github.com/qemu/qemu/blob/master/target/i386/cpu.c#L5187 Are you running using some kind of acceleration?

diversys commented 3 years ago

Yes, Meta/run.sh enables hvf acceleration on macOS: https://github.com/SerenityOS/serenity/blob/3b0da8b28c35cedaaccf68adfad0d3344c10877a/Meta/run.sh#L32

IdanHo commented 3 years ago

Can you confirm this works correctly with hvf acceleration turned off? If that's the case this is an issue with incorrect reporting of CPUID leaf 0x80000008 and should be reported upstream to qemu.

diversys commented 3 years ago

Now this is weird. If I start Qemu manually (with exactly the same arguments) kernel doesn't crash.

qemu-system-x86_64 -s -m 1G -smp 2 -display cocoa,gl=off -device VGA,vgamem_mb=64 -drive file=_disk_image,format=raw,index=0,media=disk -device virtio-serial,max_ports=2 -device virtconsole,chardev=stdout -device isa-debugcon,chardev=stdout -device virtio-rng-pci -audiodev coreaudio,id=snd0 -machine pcspk-audiodev=snd0 -device sb16,audiodev=snd0 -device pci-bridge,chassis_nr=1,id=bridge1 -device e1000,bus=bridge1 -device i82801b11-bridge,bus=bridge1,id=bridge2 -device sdhci-pci,bus=bridge2 -device i82801b11-bridge,id=bridge3 -device sdhci-pci,bus=bridge3 -device ich9-ahci,bus=bridge3 -chardev stdio,id=stdout,mux=on -cpu max,-x2apic -d guest_errors -usb --accel hvf -netdev user,id=breh,hostfwd=tcp:127.0.0.1:8888-10.0.2.15:8888,hostfwd=tcp:127.0.0.1:8823-10.0.2.15:23,hostfwd=tcp:127.0.0.1:8000-10.0.2.15:8000,hostfwd=tcp:127.0.0.1:2222-10.0.2.15:22 -device e1000,netdev=breh -kernel Kernel/Prekernel/Prekernel -initrd Kernel/Kernel -append hello

I also noticed that in this case Virtual address bit width is not printed at all.

IdanHo commented 3 years ago

That should be impossible as far as I can tell? The bit width is printed as part of the processor initialization, are you on latest master?

diversys commented 3 years ago

Turns out I was manually booting older image from Build/_disk_image instead of Build/x86_64/_disk_image. Removing --accel hvf from qemu-system-x86_64 fixes the crash, however it doesn't fully boot, leaving me with grey screen and working mouse cursor.

IdanHo commented 3 years ago

That's likely booting correctly, it's just probably so slow that if you wait a couple of minutes eventually it will show up (running with acceleration is painful), would you mind reporting the CPUID issue to qemu? I would do it, but I don't have a mac to help them reproduce it if they'd ask.

diversys commented 3 years ago

I don't mind but I don't know how to correctly describe an issue at hand.

IdanHo commented 3 years ago

I can open the issue and link it here so you can chime in as needed if you prefer?

diversys commented 3 years ago

Yes please :)

IdanHo commented 3 years ago

Before we do, one final check, can you run with hvf again, but this time hardcode the bit width so we can ensure the only issue is the cpuid reporting? Aka just add m_virtual_address_bit_width = 48; here: https://github.com/SerenityOS/serenity/blob/b819719860a93f3c7c3b73cd3a15632ee123d59d/Kernel/Arch/x86/common/Processor.cpp#L149

IdanHo commented 3 years ago

Oh, and can you also add a dbgln for the "max_extended_leaf" value? So we can see if that cpuid leaf is not reported at all, or reported incorrectly.

diversys commented 3 years ago

With this change kernel doesn't crash and I can boot all the way to the desktop. Cool!

diversys commented 3 years ago

Oh, and can you also add a dbgln for the "max_extended_leaf" value? So we can see if that cpuid leaf is not reported at all, or reported incorrectly.

What line and where? Sorry, I'm not a dev.

IdanHo commented 3 years ago

With this change kernel doesn't crash and I can boot all the way to the desktop. Cool!

Great news! That confirms that the only issue is incorrect reporting of the CPU's capabilities, aka all the features are in place and working, it just incorrectly reports it isn't :)

IdanHo commented 3 years ago

What line and where? Sorry, I'm not a dev.

The same place where you added the temporary = 48 fix, add the following line: dbgln("{:x}", max_extended_leaf);

diversys commented 3 years ago

With this change kernel doesn't start and it looks like it goes into reboot cycle. Nothing in the terminal.

IdanHo commented 3 years ago

With this change kernel doesn't start and it looks like it goes into reboot cycle. Nothing in the terminal.

Oh right, this is way too early in the boot to be logging stuff into the console, slightly more complicated option: Add a new member to Processor.h right after m_virtual_address_bit_width on line 132 of Kernel/Arch/x86/Processor.h e.g. u64 m_diversys_field;, then instead of the dbgln, set it's value to the max_extended_leaf like you set the 48 value, and finally then, add a print for that member field right after the print for virtual bit width member field at line 343 of Kernel/Arch/x86/common/Processor.cpp

diversys commented 3 years ago

Hopefully, this is what you meant:

diff --git a/Kernel/Arch/x86/Processor.h b/Kernel/Arch/x86/Processor.h
index a1b6bf97e..610e8b0ea 100644
--- a/Kernel/Arch/x86/Processor.h
+++ b/Kernel/Arch/x86/Processor.h
@@ -130,6 +130,7 @@ class Processor {
     static Atomic<u32> g_total_processors;
     u8 m_physical_address_bit_width;
     u8 m_virtual_address_bit_width;
+    u64 m_diversys_field;

     ProcessorInfo* m_info;
     Thread* m_current_thread;
diff --git a/Kernel/Arch/x86/common/Processor.cpp b/Kernel/Arch/x86/common/Processor.cpp
index fb97ab0d6..a17b7be9c 100644
--- a/Kernel/Arch/x86/common/Processor.cpp
+++ b/Kernel/Arch/x86/common/Processor.cpp
@@ -146,6 +146,8 @@ UNMAP_AFTER_INIT void Processor::cpu_detect()
         // Processors that do not support CPUID function 80000008H, support a linear-address width of 32.
         m_virtual_address_bit_width = 32;
     }
+    m_virtual_address_bit_width = 48;
+    m_diversys_field = max_extended_leaf;

     CPUID extended_features(0x7);
     if (extended_features.ebx() & (1 << 20))
@@ -339,7 +341,7 @@ UNMAP_AFTER_INIT void Processor::initialize(u32 cpu)
     if (!has_feature(CPUFeature::RDRAND))
         dmesgln("CPU[{}]: No RDRAND support detected, randomness will be poor", current_id());
     dmesgln("CPU[{}]: Physical address bit width: {}", current_id(), m_physical_address_bit_width);
-    dmesgln("CPU[{}]: Virtual address bit width: {}", current_id(), m_virtual_address_bit_width);
+    dmesgln("CPU[{}]: Virtual address bit width: {}", current_id(), m_virtual_address_bit_width, m_diversys_field);

     if (cpu == 0)
         idt_init();
IdanHo commented 3 years ago

That won't work actually, since you're trying to print two values with the same print, try replacing the print with: dmesgln("CPU[{}]: Diversys value: {}", current_id(), m_diversys_field);

diversys commented 3 years ago

[Kernel]: CPU[0]: Diversys value: 2147483649

IdanHo commented 3 years ago

2147483649

So it's not reporting this information at all, thanks for the help collecting the info, I'll report it upstream now :)

IdanHo commented 3 years ago

Can you fill in these details from the qemu issue template?

 - Operating system: (Windows 10 21H1, Fedora 34, etc.)
 - OS/kernel version: (For POSIX hosts, use `uname -a`)
 - Architecture: (x86, ARM, s390x, etc.)
 - QEMU flavor: (qemu-system-x86_64, qemu-aarch64, qemu-img, etc.)
 - QEMU version: (e.g. `qemu-system-x86_64 --version`)
diversys commented 3 years ago
 - Operating system: macOS 10.14.6
 - OS/kernel version: Darwin MacBookPro.local 18.7.0 Darwin Kernel Version 18.7.0: Tue Jun 22 19:37:08 PDT 2021; root:xnu-4903.278.70~1/RELEASE_X86_64 x86_64
 - Architecture: x86_64
 - QEMU flavor: qemu-system-x86_64
 - QEMU version: QEMU emulator version 6.0.0
IdanHo commented 3 years ago

Also can you check that you get the same Diversys value when running qemu directly with this command? ./qemu-system-x86_64 -m 1G -cpu max,-x2apic -d guest_errors --accel hvf -kernel Kernel/Prekernel/Prekernel -initrd Kernel/Kernel (It's the original command that you posted, I just tried to remove as much extra fluff as possible)

diversys commented 3 years ago

image

IdanHo commented 3 years ago

Right, my bad, try adding -drive file=_disk_image,format=raw,index=0,media=disk as well

diversys commented 3 years ago

It now boots but with no terminal output.

IdanHo commented 3 years ago

It now boots but with no terminal output.

Eh, never mind, I'll just give them the whole command and try minimizing it if they ask me to instead of doing work speculatively :)

IdanHo commented 3 years ago

https://gitlab.com/qemu-project/qemu/-/issues/664

IdanHo commented 3 years ago

gitlab.com/qemu-project/qemu/-/issues/664

@diversys Hey, one of the QEMU maintainers is asking for help with debugging the issue in there ^, can you try helping him please?