29jm / SnowflakeOS

"It is very special"
https://jmnl.xyz
MIT License
316 stars 18 forks source link

Running on real hardware #18

Open circutrider21 opened 3 years ago

circutrider21 commented 3 years ago

I ran Snowflake OS on my computer (specs are down below) and Snowflake booted up and the terminal app was open.

The big problem was that it was frozen. No movement on screen, and mouse/keyboard didn't work at all.

Any Explanation would be helpful as to why that happened!

Specs

Dell Optiplex 9010 with Intel core i7 (3rd Generation I think). 6GB ram. 500 GB hard drive (Don't think that's useful but still)

the-grue commented 3 years ago

Hi @circutrider21 ! I haven't tried on real hardware yet, but I experienced similar issues with trying to run it on VirtualBox. Logging output to a virtual serial port showed that the mouse and keyboard would not configure correctly. I verified my configuration was using PS/2 keyboard and mouse, since you have the option in VirtualBox to use USB devices. It wasn't consistent either, here are results from two consecutive runs:

[pmm.c] memory stats: available: 127 MiB [pmm.c] unavailable: 393 KiB [pmm.c] taken by modules: 10 MiB [kernel.c] SnowflakeOS 0.6 [kernel.c] kernel is 219 KiB large [ps2.c] initializing PS/2 devices [ps2.c] failed to reset device 0 [ps2.c] mouse [mouse.c] enabled scroll wheel [mouse.c] five buttons enabled [ext2.c] initialized volume of size 10240 KiB [paging.c] page fault caused by instruction at 0xC010BA5F from process 2: [paging.c] the page at 0xDFDD2590 wasn't present [paging.c] when a process tried to write to it [paging.c] this process was in kernel mode

and then

[pmm.c] memory stats: available: 127 MiB [pmm.c] unavailable: 393 KiB [pmm.c] taken by modules: 10 MiB [kernel.c] SnowflakeOS 0.6 [kernel.c] kernel is 219 KiB large [ps2.c] initializing PS/2 devices [ps2.c] failed to reset device 0 [ps2.c] mouse [ps2.c] device failed to acknowledge command [mouse.c] unable to enable scroll wheel [ext2.c] initialized volume of size 10240 KiB [paging.c] page fault caused by instruction at 0xC010BA5F from process 2: [paging.c] the page at 0xDFDD2590 wasn't present [paging.c] when a process tried to write to it [paging.c] this process was in kernel mode

I haven't dug into it any further at this time since qemu works for me but I work on most of my open source projects using VirtualBox and then real hardware. I have an old laptop with a serial port on it and an old Wyse terminal around somewhere, so that might help with debugging. Does your rig have a serial port?

29jm commented 3 years ago

Thanks for testing @circutrider21 ! I got the same result recently on an old laptop. I did some superficial debugging then and I think the real crashing happens when a process is switched back to by the scheduler. They work when first run, get interrupted, then if some other process is to start, it does, but when the first program is resumed, it crashes.

Thank you for mentioning VirtualBox @the-grue, having a vm that crashes like real hardware is going to help a lot, serial ports are a rare thing these days and I'm not sure I've ever seen one myself ^^ I've just tested it and I don't get the page fault prints though, mine stop at [kernel.c] loading modules, being on 39c0d60. Having the page fault handler print something would be extra useful, what commit are you on ?
Something you can check is what function corresponds to 0xC010BA5F from your "kernel/symbols.map" file, it's the closest symbol at an address lower than that. You can also try to move the call to stacktrace_print before the call to paging_get_page in the fault handler, before the handler crashes completely (probably what happens, or it would have printed more stuff).

A likely culprit is the use of uninitialized memory somewhere as if it were zeroed out, which it is in qemu and bochs, but not on real hardware. Another option is that the hardware is in a different state than expected during process switching, things like a missing bit in %cr4 that causes some registers to be pushed or not, I don't know.

The PS/2 error is interesting too, somehow the keyboard fails to acknowledge a reset command. This can't really cause a crash later on so it's probably a separate issue, I'll open one.

29jm commented 3 years ago

After investigation, I found that SnowflakeOS wasn't crashed as I thought it was, it just looked frozen because the info syscall always returned an uptime of zero, which was an FPU issue resolved in fb50a0a59e4751c24b78ad37d59ef8b3f182defb, and also because PS/2 devices don't work. In VirtualBox, the mouse sort of works, but breaks when pressing a keyboard key. More work needed but we'll get there :)

circutrider21 commented 3 years ago

Sounds Great!

29jm commented 3 years ago

If anyone is up for testing, that'd be awesome!

Keyboard and mouse now work correctly on VirtualBox in addition to qemu and bochs, and I have hopes they may work on some hardware too. Here's an iso :)

circutrider21 commented 3 years ago

Ok, I'll get to it tomorrow hopefully

circutrider21 commented 3 years ago

By the way, you should use keep.sh for file transfer, it is so easy to use.

29jm commented 3 years ago

Ok, I'll get to it tomorrow hopefully

By the way, you should use keep.sh for file transfer, it is so easy to use.

Great! Can't wait for the results :) I'd never heard of keep.sh btw, it looks good I'll keep it in mind.

circutrider21 commented 3 years ago

I tried the ISO on 2 Computers, Both of which failed. The computer boots up into grub, but when I click the OS, I get a black screen with no sign of life. This is also the case in bochs, which also gives me a black screen. I thought it was the ISO, so I ran it in qemu, which worked perfectly.

Quick recap

29jm commented 3 years ago

Oh wow that's not good ^^' I'm able to reproduce that black screen issue on an old laptop when compiling with -O1 and not -O2, which indicates the presence of a hellish bug. It's recent though, I should be able to track which commit introduced it. One weird thing is that things always work in bochs on my end, version 2.6.11.

If I can borrow some more of your time, here is an iso compiled with -O2 from the tip of the diag-mb2 branch, with more PS/2 fixes and without the commit introducing multiboot2 (I thought it might have been related bu it seems not), maybe this one won't give you a black screen?

Edit: did not see the link had expired... here's a new one.

circutrider21 commented 3 years ago

Ok, I'll try it.

circutrider21 commented 3 years ago

Sorry, I haven't responded for quite a while, kinda got carried away with life. Anyhow, I tried the ISO you gave me but with no luck, I still am getting a black screen on real hardware and bochs, but not qemu.

29jm commented 3 years ago

Alright, thank you for testing this. I need to figure this out, but it's a mysterious bug. Bisected it back to this commit e88e2711a52beb3f28ddfd848e1c2d187b053657, but I can't figure out anything wrong with it yet.

circutrider21 commented 2 years ago

@29jm Hey, how's it going, long time no see!

29jm commented 2 years ago

Hey @circutrider21 ! There has been some progress on this issue, though it is hard to tell because depending on hardware, different issues pop up. Basically

circutrider21 commented 2 years ago

Good to know there's still progress going on, I just haven't seen progress within SnowflakeOS for a while, and I got carried away with my own OSDEV journeys as well 😅