Open diversys opened 3 years ago
x86 or x86_64? Another check was recently added to safe_memcpy that checks for canonical x86_64 addresses.
x86_64.
Aka that assertion means the kernel address was not canonical (or that the check I added is flawed), you can try swapping the dbgln just after the assertion and the assertion itself (if you can reproduce it) so that we can confirm.
It is reproducible on every boot. What exact file I can find the assert?
./Kernel/StdLib.cpp:174
(as written in the log you posted)
diff --git a/Kernel/StdLib.cpp b/Kernel/StdLib.cpp
index edfe4bfc5..3e46626fc 100644
--- a/Kernel/StdLib.cpp
+++ b/Kernel/StdLib.cpp
@@ -171,8 +171,8 @@ KResult copy_to_user(void* dest_ptr, void const* src_ptr, size_t n)
Kernel::SmapDisabler disabler;
void* fault_at;
if (!Kernel::safe_memcpy(dest_ptr, src_ptr, n, fault_at)) {
- VERIFY(VirtualAddress(fault_at) >= VirtualAddress(dest_ptr) && VirtualAddress(fault_at) <= VirtualAddress((FlatPtr)dest_ptr + n));
dbgln("copy_to_user({:p}, {:p}, {}) failed at {}", dest_ptr, src_ptr, n, VirtualAddress { fault_at });
+ VERIFY(VirtualAddress(fault_at) >= VirtualAddress(dest_ptr) && VirtualAddress(fault_at) <= VirtualAddress((FlatPtr)dest_ptr + n));
return EFAULT;
[#0 init_stage2(1:1)]: copy_to_user(0x000000001128d648, 0x000000200712d648, 3800) failed at V0x000000200712d648
[init_stage2(1:1)]: ASSERTION FAILED: VirtualAddress(fault_at) >= VirtualAddress(dest_ptr) && VirtualAddress(fault_at) <= VirtualAddress((FlatPtr)dest_ptr + n)
[init_stage2(1:1)]: ./Kernel/StdLib.cpp:175 in Kernel::KResult copy_to_user(void*, const void*, size_t)
[init_stage2(1:1)]: KERNEL PANIC! :^(
[init_stage2(1:1)]: Aborted
[init_stage2(1:1)]: at ./Kernel/Arch/x86/common/CPU.cpp:35 in void abort()
Early on in the boot you should have gotten a Virtual address bit width: xxx
log message, can you check what your virtual address bit width is?
[Kernel]: CPU[0]: Physical address bit width: 36 [Kernel]: CPU[0]: Virtual address bit width: 32
Well then yes, that kernel address is not canonical, but that's a very weird virtual address bit width, since most newer x86_64 CPUs have a 48 bit virtual address width (I have a feeling that our current Prekernel assumes that 48 bits are available, and as such places parts of the kernel above the canonical address range), it's even weirder that your QEMU only supports 32 bits.
In fact, qemu's tcg doesn't even support a value lower than 48: https://github.com/qemu/qemu/blob/master/target/i386/cpu.c#L5187 Are you running using some kind of acceleration?
Yes, Meta/run.sh enables hvf acceleration on macOS: https://github.com/SerenityOS/serenity/blob/3b0da8b28c35cedaaccf68adfad0d3344c10877a/Meta/run.sh#L32
Can you confirm this works correctly with hvf acceleration turned off? If that's the case this is an issue with incorrect reporting of CPUID leaf 0x80000008 and should be reported upstream to qemu.
Now this is weird. If I start Qemu manually (with exactly the same arguments) kernel doesn't crash.
qemu-system-x86_64 -s -m 1G -smp 2 -display cocoa,gl=off -device VGA,vgamem_mb=64 -drive file=_disk_image,format=raw,index=0,media=disk -device virtio-serial,max_ports=2 -device virtconsole,chardev=stdout -device isa-debugcon,chardev=stdout -device virtio-rng-pci -audiodev coreaudio,id=snd0 -machine pcspk-audiodev=snd0 -device sb16,audiodev=snd0 -device pci-bridge,chassis_nr=1,id=bridge1 -device e1000,bus=bridge1 -device i82801b11-bridge,bus=bridge1,id=bridge2 -device sdhci-pci,bus=bridge2 -device i82801b11-bridge,id=bridge3 -device sdhci-pci,bus=bridge3 -device ich9-ahci,bus=bridge3 -chardev stdio,id=stdout,mux=on -cpu max,-x2apic -d guest_errors -usb --accel hvf -netdev user,id=breh,hostfwd=tcp:127.0.0.1:8888-10.0.2.15:8888,hostfwd=tcp:127.0.0.1:8823-10.0.2.15:23,hostfwd=tcp:127.0.0.1:8000-10.0.2.15:8000,hostfwd=tcp:127.0.0.1:2222-10.0.2.15:22 -device e1000,netdev=breh -kernel Kernel/Prekernel/Prekernel -initrd Kernel/Kernel -append hello
I also noticed that in this case Virtual address bit width
is not printed at all.
That should be impossible as far as I can tell? The bit width is printed as part of the processor initialization, are you on latest master?
Turns out I was manually booting older image from Build/_disk_image
instead of Build/x86_64/_disk_image
.
Removing --accel hvf
from qemu-system-x86_64 fixes the crash, however it doesn't fully boot, leaving me with grey screen and working mouse cursor.
That's likely booting correctly, it's just probably so slow that if you wait a couple of minutes eventually it will show up (running with acceleration is painful), would you mind reporting the CPUID issue to qemu? I would do it, but I don't have a mac to help them reproduce it if they'd ask.
I don't mind but I don't know how to correctly describe an issue at hand.
I can open the issue and link it here so you can chime in as needed if you prefer?
Yes please :)
Before we do, one final check, can you run with hvf again, but this time hardcode the bit width so we can ensure the only issue is the cpuid reporting?
Aka just add m_virtual_address_bit_width = 48;
here: https://github.com/SerenityOS/serenity/blob/b819719860a93f3c7c3b73cd3a15632ee123d59d/Kernel/Arch/x86/common/Processor.cpp#L149
Oh, and can you also add a dbgln for the "max_extended_leaf" value? So we can see if that cpuid leaf is not reported at all, or reported incorrectly.
With this change kernel doesn't crash and I can boot all the way to the desktop. Cool!
Oh, and can you also add a dbgln for the "max_extended_leaf" value? So we can see if that cpuid leaf is not reported at all, or reported incorrectly.
What line and where? Sorry, I'm not a dev.
With this change kernel doesn't crash and I can boot all the way to the desktop. Cool!
Great news! That confirms that the only issue is incorrect reporting of the CPU's capabilities, aka all the features are in place and working, it just incorrectly reports it isn't :)
What line and where? Sorry, I'm not a dev.
The same place where you added the temporary = 48
fix, add the following line: dbgln("{:x}", max_extended_leaf);
With this change kernel doesn't start and it looks like it goes into reboot cycle. Nothing in the terminal.
With this change kernel doesn't start and it looks like it goes into reboot cycle. Nothing in the terminal.
Oh right, this is way too early in the boot to be logging stuff into the console, slightly more complicated option: Add a new member to Processor.h right after m_virtual_address_bit_width
on line 132 of Kernel/Arch/x86/Processor.h e.g. u64 m_diversys_field;
, then instead of the dbgln, set it's value to the max_extended_leaf
like you set the 48 value, and finally then, add a print for that member field right after the print for virtual bit width member field at line 343 of Kernel/Arch/x86/common/Processor.cpp
Hopefully, this is what you meant:
diff --git a/Kernel/Arch/x86/Processor.h b/Kernel/Arch/x86/Processor.h
index a1b6bf97e..610e8b0ea 100644
--- a/Kernel/Arch/x86/Processor.h
+++ b/Kernel/Arch/x86/Processor.h
@@ -130,6 +130,7 @@ class Processor {
static Atomic<u32> g_total_processors;
u8 m_physical_address_bit_width;
u8 m_virtual_address_bit_width;
+ u64 m_diversys_field;
ProcessorInfo* m_info;
Thread* m_current_thread;
diff --git a/Kernel/Arch/x86/common/Processor.cpp b/Kernel/Arch/x86/common/Processor.cpp
index fb97ab0d6..a17b7be9c 100644
--- a/Kernel/Arch/x86/common/Processor.cpp
+++ b/Kernel/Arch/x86/common/Processor.cpp
@@ -146,6 +146,8 @@ UNMAP_AFTER_INIT void Processor::cpu_detect()
// Processors that do not support CPUID function 80000008H, support a linear-address width of 32.
m_virtual_address_bit_width = 32;
}
+ m_virtual_address_bit_width = 48;
+ m_diversys_field = max_extended_leaf;
CPUID extended_features(0x7);
if (extended_features.ebx() & (1 << 20))
@@ -339,7 +341,7 @@ UNMAP_AFTER_INIT void Processor::initialize(u32 cpu)
if (!has_feature(CPUFeature::RDRAND))
dmesgln("CPU[{}]: No RDRAND support detected, randomness will be poor", current_id());
dmesgln("CPU[{}]: Physical address bit width: {}", current_id(), m_physical_address_bit_width);
- dmesgln("CPU[{}]: Virtual address bit width: {}", current_id(), m_virtual_address_bit_width);
+ dmesgln("CPU[{}]: Virtual address bit width: {}", current_id(), m_virtual_address_bit_width, m_diversys_field);
if (cpu == 0)
idt_init();
That won't work actually, since you're trying to print two values with the same print, try replacing the print with:
dmesgln("CPU[{}]: Diversys value: {}", current_id(), m_diversys_field);
[Kernel]: CPU[0]: Diversys value: 2147483649
2147483649
So it's not reporting this information at all, thanks for the help collecting the info, I'll report it upstream now :)
Can you fill in these details from the qemu issue template?
- Operating system: (Windows 10 21H1, Fedora 34, etc.)
- OS/kernel version: (For POSIX hosts, use `uname -a`)
- Architecture: (x86, ARM, s390x, etc.)
- QEMU flavor: (qemu-system-x86_64, qemu-aarch64, qemu-img, etc.)
- QEMU version: (e.g. `qemu-system-x86_64 --version`)
- Operating system: macOS 10.14.6
- OS/kernel version: Darwin MacBookPro.local 18.7.0 Darwin Kernel Version 18.7.0: Tue Jun 22 19:37:08 PDT 2021; root:xnu-4903.278.70~1/RELEASE_X86_64 x86_64
- Architecture: x86_64
- QEMU flavor: qemu-system-x86_64
- QEMU version: QEMU emulator version 6.0.0
Also can you check that you get the same Diversys value
when running qemu directly with this command? ./qemu-system-x86_64 -m 1G -cpu max,-x2apic -d guest_errors --accel hvf -kernel Kernel/Prekernel/Prekernel -initrd Kernel/Kernel
(It's the original command that you posted, I just tried to remove as much extra fluff as possible)
Right, my bad, try adding -drive file=_disk_image,format=raw,index=0,media=disk
as well
It now boots but with no terminal output.
It now boots but with no terminal output.
Eh, never mind, I'll just give them the whole command and try minimizing it if they ask me to instead of doing work speculatively :)
@diversys Hey, one of the QEMU maintainers is asking for help with debugging the issue in there ^, can you try helping him please?
macOS 10.14.6 host, Qemu 6.0.0