bobuhiro11 / gokvm

KVM based tiny x86 hypervisor written in pure golang, which can boot Linux
https://blog.bobuhiro11.net/tags/gokvm.html
MIT License
210 stars 24 forks source link

Pvh boot #148

Closed ChriMarMe closed 1 year ago

ChriMarMe commented 1 year ago

This PR adds the PVH Boot mechanism described here.

It adds some devices, setting MSR and CPUIDs, registers, segment registers and other stuff to allow to boot into EDK2/CloudHv EFI-Shell.

Further work to boot into an operating system is required.

Edit: Now supports the loading of Linux + Initrd as well

Edit2: It actually does not support loading of Linux/Initrd yet.....something's not right.

ChriMarMe commented 1 year ago

Im aware of the errors, will get rid of them the next couple of days.

ChriMarMe commented 1 year ago
main.go:18:1: cyclomatic complexity 32 of func `main` is high (> 30) (gocyclo)

func main() {

^

main.go:50:2: `if c.Firmware != ""` has complex nested blocks (complexity: 6) (nestif)

    if c.Firmware != "" {

    ^

Not really sure how to handle the last two errors. Need some guidance here.

ChriMarMe commented 1 year ago

Trying to extend it to boot Linux compiled as ELF with pvh support.

ChriMarMe commented 1 year ago

I'm aware there are no unit tests in place yet.

ChriMarMe commented 1 year ago

Not sure. This is kinda messy, but I also have no clue how to improve structure. Problem I see is the following:

Multiple boot modes, two in one function, and pvh in a different function. Not ideal. I know a restructure is required here, but I also don't know how yet. Maybe you have an idea.

ChriMarMe commented 1 year ago

150 must be merged first.

PVH loads linux kernel and initrd now as well.

Last problem to solve: How to handle the boot modes and cmdline arguments for kernel, initrd and firmware image?

ChriMarMe commented 1 year ago

Ok, somehow CI works and not at the same time. Still the issue with commandline args persists.

See: #151

bobuhiro11 commented 1 year ago

Thanks for the great suggestion! I am going on a trip for a few days. I may be able to reply to each PR on Friday.

ChriMarMe commented 1 year ago

This works with 3G memory for EDK2, and Linux load does not work at the moment. Some memory layout related issue.

So HVM_MemMap_Table_Entries not properly set up. Need to investigate.

ChriMarMe commented 1 year ago

Ok, CI runs, but still I have to investigate the memory issues and add a test case to test pvh boot protocol execution

ChriMarMe commented 1 year ago

Still not loading kernel/initrd with pvh boot. Still dunno what goes wrong.

[  398.885484][    C0] INFO: NMI handler (perf_event_nmi_handler) took too long to run: 53.676 msecs
[  398.885484][    C0] perf: interrupt took too long (1759727 > 1686288), lowering kernel.perf_event_max_sample_rate to 250
[  398.889484][   T21] Trying to unpack rootfs image as initramfs...
[  398.905485][    T1] platform rtc_cmos: registered platform RTC device (no PNP device found)

After that the VM exits...

ChriMarMe commented 1 year ago

Still not loading kernel/initrd with pvh boot. Still dunno what goes wrong.

[  398.885484][    C0] INFO: NMI handler (perf_event_nmi_handler) took too long to run: 53.676 msecs
[  398.885484][    C0] perf: interrupt took too long (1759727 > 1686288), lowering kernel.perf_event_max_sample_rate to 250
[  398.889484][   T21] Trying to unpack rootfs image as initramfs...
[  398.905485][    T1] platform rtc_cmos: registered platform RTC device (no PNP device found)

After that the VM exits...

CMOS Device was the issue.....

EDK2/CloudHv boots with 1G-3G, but not 4G.

Linux/Initrd now boots, but only with 4G of memory. For 1G-3G I get errors.

Still investigating.

ChriMarMe commented 1 year ago
[  369.783040][    C0] watchdog: Watchdog detected hard LOCKUP on cpu 0
[  369.783040][    C0] irq event stamp: 6352
[  369.783040][    C0] hardirqs last  enabled at (6351): [<ffffffff814c132d>] _raw_spin_unlock_irq+0x23/0x34
[  369.783040][    C0] hardirqs last disabled at (6352): [<ffffffff814b9614>] exc_nmi+0x83/0x107
[  369.783040][    C0] softirqs last  enabled at (6038): [<ffffffff814c1e98>] __do_softirq+0x298/0x2c4
[  369.783040][    C0] softirqs last disabled at (6029): [<ffffffff81051772>] irq_exit_rcu+0x9/0x11
[  369.783040][    C0] CPU: 0 PID: 21 Comm: kworker/u128:1 Not tainted 6.5.0-rc5+ #7 7d885148b887b779f025cd7d8b8daf9b731f1e9e
[  369.783040][    C0] Workqueue: events_unbound async_run_entry_fn
[  369.783040][    C0] RIP: 0010:lzma_match+0x2b5/0x2ff
[  369.783040][    C0] Code: 18 0f b6 04 07 8b 7b 04 c1 e7 08 01 f8 89 43 04 44 8b 03 0f b7 3a 01 f6 44 89 c0 c1 e8 0b 0f af c7 8b 7b 04 39 c7 73 19 89 03 <44> 0f b7 02 44 89 df 44 29 c7 44 89 c0 c1 b
[  369.783040][    C0] RSP: 0018:ffffc900002a3c80 EFLAGS: 00000297
[  369.783040][    C0] RAX: 000000003963d43a RBX: ffff888003ad8000 RCX: 0000000000000003
[  369.783040][    C0] RDX: ffff888003ad8700 RSI: 000000000000001c RDI: 000000002d36f3c2
[  369.783040][    C0] RBP: ffff888003ad8090 R08: 00000000814d1600 R09: ffff888003ad86e4
[  369.783040][    C0] R10: 0000000000000001 R11: 0000000000000800 R12: 00000000000097bc
[  369.783040][    C0] R13: 0000000800000003 R14: ffffffff81f72fc4 R15: 0000000000000000
[  369.783040][    C0] FS:  0000000000000000(0000) GS:ffff8880bbe00000(0000) knlGS:0000000000000000
[  369.783040][    C0] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  369.783040][    C0] CR2: ffff888002c00000 CR3: 0000000001c38000 CR4: 00000000003506b0
[  369.783040][    C0] Call Trace:
[  369.783040][    C0]  <NMI>
[  369.783040][    C0]  ? watchdog_hardlockup_check+0xaf/0x12f
[  369.783040][    C0]  ? __perf_event_overflow+0x163/0x1a6
[  369.783040][    C0]  ? x86_pmu_handle_irq+0xf4/0x12f
[  369.783040][    C0]  ? vscnprintf+0x14/0x22
[  369.783040][    C0]  ? prb_final_commit+0x16/0x21
[  369.783040][    C0]  ? look_up_lock_class+0x7c/0x95
[  369.783040][    C0]  ? lock_acquire+0x191/0x21d
[  369.783040][    C0]  ? amd_pmu_handle_irq+0x28/0x37
[  369.783040][    C0]  ? perf_event_nmi_handler+0x25/0x3e
[  369.783040][    C0]  ? nmi_handle+0xb9/0x1fc
[  369.783040][    C0]  ? default_do_nmi+0x42/0x20a
[  369.783040][    C0]  ? exc_nmi+0x9e/0x107
[  369.783040][    C0]  ? end_repeat_nmi+0x16/0x67
[  369.783040][    C0]  ? initrd_load+0x61/0x61
[  369.783040][    C0]  ? lzma_match+0x2b5/0x2ff
[  369.783040][    C0]  ? lzma_match+0x2b5/0x2ff
[  369.783040][    C0]  ? lzma_match+0x2b5/0x2ff
[  369.783040][    C0]  </NMI>
[  369.783040][    C0]  <TASK>
[  369.783040][    C0]  lzma_main+0x14d/0x1bf
[  369.783040][    C0]  lzma2_lzma+0x14f/0x1c1
[  369.783040][    C0]  xz_dec_lzma2_run+0x2d5/0x46e
[  369.783040][    C0]  dec_main+0x1b4/0x4da
[  369.783040][    C0]  ? do_copy+0x10b/0x10d
[  369.783040][    C0]  ? initrd_load+0x61/0x61
[  369.783040][    C0]  xz_dec_run+0x2e/0x82
[  369.783040][    C0]  ? write_buffer+0x2f/0x2f
[  369.783040][    C0]  unxz+0x14d/0x2c6
[  369.783040][    C0]  ? write_buffer+0x2f/0x2f
[  369.783040][    C0]  ? unlzo+0x3bc/0x3bc
[  369.783040][    C0]  unpack_to_rootfs+0x15c/0x278
[  369.783040][    C0]  ? initrd_load+0x61/0x61
[  369.783040][    C0]  do_populate_rootfs+0x65/0x13d
[  369.783040][    C0]  async_run_entry_fn+0x58/0xfa
[  369.783040][    C0]  process_one_work+0x1e8/0x334
[  369.783040][    C0]  worker_thread+0x159/0x202
[  369.783040][    C0]  ? process_scheduled_works+0x27/0x27
[  369.783040][    C0]  kthread+0xd3/0xdb
[  369.783040][    C0]  ? kthread_exit+0x1e/0x1e
[  369.783040][    C0]  ret_from_fork+0x20/0x32
[  369.783040][    C0]  ? kthread_exit+0x1e/0x1e
[  369.783040][    C0]  ret_from_fork_asm+0x11/0x40
[  369.783040][    C0]  </TASK>
[  369.783040][    C0] Kernel panic - not syncing: Hard LOCKUP
[  369.783040][    C0] Kernel Offset: disabled
[  369.783040][    C0] 
[  369.783040][    C0] ================================
[  369.783040][    C0] WARNING: inconsistent lock state
[  369.783040][    C0] 6.5.0-rc5+ #7 Not tainted
[  369.783040][    C0] --------------------------------
[  369.783040][    C0] inconsistent {INITIAL USE} -> {IN-NMI} usage.
[  369.783040][    C0] kworker/u128:1/21 [HC1[1]:SC0[0]:HE0:SE1] takes:
[  369.783040][    C0] ffffffff81cc4858 ((console_sem).lock){-.-.}-{2:2}, at: down_trylock+0xf/0x2d
[  369.783040][    C0] {INITIAL USE} state was registered at:
[  369.783040][    C0] irq event stamp: 6352
[  369.783040][    C0] hardirqs last  enabled at (6351): [<ffffffff814c132d>] _raw_spin_unlock_irq+0x23/0x34
[  369.783040][    C0] hardirqs last disabled at (6352): [<ffffffff814b9614>] exc_nmi+0x83/0x107
[  369.783040][    C0] softirqs last  enabled at (6038): [<ffffffff814c1e98>] __do_softirq+0x298/0x2c4
[  369.783040][    C0] softirqs last disabled at (6029): [<ffffffff81051772>] irq_exit_rcu+0x9/0x11
[  369.783040][    C0] 
[  369.783040][    C0] other info that might help us debug this:
[  369.783040][    C0]  Possible unsafe locking scenario:
[  369.783040][    C0] 
[  369.783040][    C0]        CPU0
[  369.783040][    C0]        ----
[  369.783040][    C0]   lock((console_sem).lock);
[  369.783040][    C0]   <Interrupt>
[  369.783040][    C0]     lock((console_sem).lock);
[  369.783040][    C0] 
[  369.783040][    C0]  *** DEADLOCK ***
[  369.783040][    C0] 
[  369.783040][    C0] 2 locks held by kworker/u128:1/21:
[  369.783040][    C0]  #0: ffff888000158d38 ((wq_completion)events_unbound){+.+.}-{0:0}, at: process_one_work+0x123/0x334
[  369.783040][    C0]  #1: ffffc900002a3e78 ((work_completion)(&entry->work)){+.+.}-{0:0}, at: process_one_work+0x123/0x334
[  369.783040][    C0] 
[  369.783040][    C0] stack backtrace:
[  369.783040][    C0] CPU: 0 PID: 21 Comm: kworker/u128:1 Not tainted 6.5.0-rc5+ #7 7d885148b887b779f025cd7d8b8daf9b731f1e9e
[  369.783040][    C0] Workqueue: events_unbound async_run_entry_fn
[  369.783040][    C0] Call Trace:
[  369.783040][    C0]  <NMI>
[  369.783040][    C0]  dump_stack_lvl+0x6e/0x9f
[  369.783040][    C0]  lock_acquire+0x1f6/0x21d
[  369.783040][    C0]  ? _printk+0x47/0x49
[  369.783040][    C0]  ? down_trylock+0xf/0x2d
[  369.783040][    C0]  _raw_spin_lock_irqsave+0x3a/0x49
[  369.783040][    C0]  ? down_trylock+0xf/0x2d
[  369.783040][    C0]  ? panic+0x148/0x267
[  369.783040][    C0]  down_trylock+0xf/0x2d
[  369.783040][    C0]  __down_trylock_console_sem+0x1f/0x95
[  369.783040][    C0]  console_unblank+0x19/0x8f
[  369.783040][    C0]  panic+0x148/0x267
[  369.783040][    C0]  nmi_panic+0x27/0x31
[  369.783040][    C0]  watchdog_hardlockup_check+0x10f/0x12f
[  369.783040][    C0]  __perf_event_overflow+0x163/0x1a6
[  369.783040][    C0]  x86_pmu_handle_irq+0xf4/0x12f
[  369.783040][    C0]  ? vscnprintf+0x14/0x22
[  369.783040][    C0]  ? prb_final_commit+0x16/0x21
[  369.783040][    C0]  ? look_up_lock_class+0x7c/0x95
[  369.783040][    C0]  ? lock_acquire+0x191/0x21d
[  369.783040][    C0]  amd_pmu_handle_irq+0x28/0x37
[  369.783040][    C0]  perf_event_nmi_handler+0x25/0x3e
[  369.783040][    C0]  nmi_handle+0xb9/0x1fc
[  369.783040][    C0]  default_do_nmi+0x42/0x20a
[  369.783040][    C0]  exc_nmi+0x9e/0x107
[  369.783040][    C0]  end_repeat_nmi+0x16/0x67
[  369.783040][    C0] RIP: 0010:lzma_match+0x2b5/0x2ff
[  369.783040][    C0] Code: 18 0f b6 04 07 8b 7b 04 c1 e7 08 01 f8 89 43 04 44 8b 03 0f b7 3a 01 f6 44 89 c0 c1 e8 0b 0f af c7 8b 7b 04 39 c7 73 19 89 03 <44> 0f b7 02 44 89 df 44 29 c7 44 89 c0 c1 b
[  369.783040][    C0] RSP: 0018:ffffc900002a3c80 EFLAGS: 00000297
[  369.783040][    C0] RAX: 000000003963d43a RBX: ffff888003ad8000 RCX: 0000000000000003
[  369.783040][    C0] RDX: ffff888003ad8700 RSI: 000000000000001c RDI: 000000002d36f3c2
[  369.783040][    C0] RBP: ffff888003ad8090 R08: 00000000814d1600 R09: ffff888003ad86e4
[  369.783040][    C0] R10: 0000000000000001 R11: 0000000000000800 R12: 00000000000097bc
[  369.783040][    C0] R13: 0000000800000003 R14: ffffffff81f72fc4 R15: 0000000000000000
[  369.783040][    C0]  ? initrd_load+0x61/0x61
[  369.783040][    C0]  ? lzma_match+0x2b5/0x2ff
[  369.783040][    C0]  ? lzma_match+0x2b5/0x2ff
[  369.783040][    C0]  </NMI>
[  369.783040][    C0]  <TASK>
[  369.783040][    C0]  lzma_main+0x14d/0x1bf
[  369.783040][    C0]  lzma2_lzma+0x14f/0x1c1
[  369.783040][    C0]  xz_dec_lzma2_run+0x2d5/0x46e
[  369.783040][    C0]  dec_main+0x1b4/0x4da
[  369.783040][    C0]  ? do_copy+0x10b/0x10d
[  369.783040][    C0]  ? initrd_load+0x61/0x61
[  369.783040][    C0]  xz_dec_run+0x2e/0x82
[  369.783040][    C0]  ? write_buffer+0x2f/0x2f
[  369.783040][    C0]  unxz+0x14d/0x2c6
[  369.783040][    C0]  ? write_buffer+0x2f/0x2f
[  369.783040][    C0]  ? unlzo+0x3bc/0x3bc
[  369.783040][    C0]  unpack_to_rootfs+0x15c/0x278
[  369.783040][    C0]  ? initrd_load+0x61/0x61
[  369.783040][    C0]  do_populate_rootfs+0x65/0x13d
[  369.783040][    C0]  async_run_entry_fn+0x58/0xfa
[  369.783040][    C0]  process_one_work+0x1e8/0x334
[  369.783040][    C0]  worker_thread+0x159/0x202
[  369.783040][    C0]  ? process_scheduled_works+0x27/0x27
[  369.783040][    C0]  kthread+0xd3/0xdb
[  369.783040][    C0]  ? kthread_exit+0x1e/0x1e
[  369.783040][    C0]  ret_from_fork+0x20/0x32
[  369.783040][    C0]  ? kthread_exit+0x1e/0x1e                                                                                                                  
[  369.783040][    C0]  ret_from_fork_asm+0x11/0x40 

Basically this is the error I get if memory size is between 1G and 3G

ChriMarMe commented 1 year ago

nowatchdog and nmi_watchdog=0 as commandline parameters fixed the issue with the linux kernel from 1G-3G

ChriMarMe commented 1 year ago

Added the linux_pvh.config. @bobuhiro11 can you build a bzImage and upload it to github, so we can use it in CI?

bobuhiro11 commented 1 year ago

Moin @ChriMarMe ,

It's really a great improvement! I am impressed. I can confirm that it works in my machine. I uploaded the bzImage, so please follow the instructions below.

$ wget https://github.com/bobuhiro11/bins/raw/main/gokvm/bzImage_PVH
$ ./extract-vmlinux ./bzImage_PVH > vmlinux_PVH
$ sudo ./gokvm boot  -k ./vmlinux_PVH -i ./initrd  -m 1G
IsPVH=True # This line was added by me for debugging.
…
2003/07/31 08:01:08 Welcome to u-root!

I used linux_pvh.config as a reference. It helped me a lot. Linux 6.5.0 was just officially released, so there is a bit of a difference. Please take it out as follows.

$ ./extract-ikconfig bzImage_PVH > linux_pvh.config
ChriMarMe commented 1 year ago

Updated the linux_pvh.config the way you suggested.

ChriMarMe commented 1 year ago

Add

ChriMarMe commented 1 year ago
-5.0# --- FAIL: TestNewAndLoadLinuxPVH (0.62s)

    machine_test.go:30: TUN TUNSETIFF: device or resource busy

FAIL

Hmm what to dooo?

ChriMarMe commented 1 year ago

:triumph:

ChriMarMe commented 1 year ago

whoops. xD

ChriMarMe commented 1 year ago

Ok. Test runs well. Lets clean this mess up and get it merged :)

ChriMarMe commented 1 year ago

That's so great. I am amazed at your development skills! I'm not used to reviewing patches of this magnitude, so I just commented on the improvements based on my gut feeling.

It will probably take me a long time to figure it all out. So if the testing is successful by CI and you think it's good, I'm willing to merge it 👍

Thank you for the compliment :)

Take your time for the review. Some parts are still very messy, but I'll do the best to clean up if I see something. I know this one commit is very huge, but I had no idea how to make the changes smaller or more atomic.

ChriMarMe commented 1 year ago

Hey, what is the status? Can you give me an intermediate review this week so I can start doing the corrections?

ChriMarMe commented 1 year ago

ignore the 3 commits, they are here until the other PRs are merged.

ChriMarMe commented 1 year ago

Added

ChriMarMe commented 1 year ago

Idk, but the CLOUDHV.fd file fails in the CI while working locally........

defaultsh-5.0# --- FAIL: TestNewAndLoadEDK2PVH (1.23s)

    machine_test.go:138: bad magic number '[123 34 112 97]' in record at byte 0x0

FAIL

This error comes from debug/elf/file.go:NewFile(f *os.File)

It checks the first 4 bytes for the magic ELFstring which ist '0x7F', 'E','L','F', which should be 127,69,76,70.

I have no clue what's going wrong here. I downloaded the file from your link and it is correct. Works locally, but not in the CI....

ChriMarMe commented 1 year ago

I have no clue why it works now, but it does. YEEEEEHAAAAAAAAWWWWWWW

ChriMarMe commented 1 year ago

I seperated the one commit into several to break up the changes a little. I hope now you'll have an easier time reviewing the changes.

ChriMarMe commented 1 year ago

Ok, lets go :)