Joshua-Riek / ubuntu-rockchip

Ubuntu for Rockchip RK35XX Devices
https://joshua-riek.github.io/ubuntu-rockchip-download/
GNU General Public License v3.0
2.27k stars 246 forks source link

24.04 beta: can't boot VMs on kvm, qemu works fine #731

Open ovalhub opened 5 months ago

ovalhub commented 5 months ago

I use my Orange PI 5 Plus board to run a bunch of VMs, mostly aarch64, some i686. I had no issues with this on 22.04, works great. After upgrading to 24.04 beta, creating a VM from an XML file saved before the upgrade fails with this error:

qemu-system-aarch64: Failed to put registers after init: Invalid argument

Booting the same VMs with qemu works fine.

Steps to reproduce from scratch (no pre-saved XML file necessary):

sudo apt install qemu-system-arm qemu-system-x86 libvirt-daemon-system virtinst
sudo systemctl enable libvirtd
sudo systemctl start libvirtd
curl -O https://cdn.openbsd.org/pub/OpenBSD/7.5/arm64/install75.img
virt-install \
--name=oluf \
--virt-type=kvm --hvm --autostart \
--memory=1024 \
--vcpus=4 \
--boot=loader=/usr/share/AAVMF/AAVMF_CODE.fd,loader.readonly=yes,loader.type=pflash,nvram.template=/usr/share/AAVMF/AAVMF_VARS.fd,loader_secure=no \
--osinfo=openbsd7.2 \
--controller=type=usb,model=none \
--graphics=none \
--memballoon=virtio \
--network=bridge=vmbr0,model=virtio \
--network=bridge=vmbr1,model=virtio \
--import --disk=install75.img

Starting install...
ERROR    internal error: process exited while connecting to monitor: 2024-04-21T01:24:24.546176Z qemu-system-aarch64: Failed to put registers after init: Invalid argument
Domain installation does not appear to have been successful.
If it was, you can restart your domain by running:
  virsh --connect qemu:///system start oluf
otherwise, please restart your installation.

Replacing --virt-type=kvm with --virt-type=qemu boots a (slow) VM just fine.

Joshua-Riek commented 5 months ago

Hey, does this issue happen if the VM is created after the upgrade?

ovalhub commented 5 months ago

Yes, see the reproduction steps provided: they don't depend on any pre-existing VM. Just creating a VM from scratch, on 24.04-beta, using an OpenBSD 7.5 install disk reproduces the bug.

Joshua-Riek commented 5 months ago

Interesting, I'd be happy to take a look at this problem after the Ubuntu 24.04 release. I'm working on a lot of last-minute bugs, when the cause of the problem is found I can push an update with the appropriate fix.

ovalhub commented 5 months ago

That's fine, it may be a bug with 24.04-beta itself, not your part. I'm watching for updates and I'm refreshing daily...

lukaszsobala commented 5 months ago

Have you tried disabling (taking offline) the A55 cores? It usually fixes the problem of kvm not working.

ovalhub commented 5 months ago

Well, there lies a clue probably, cpu-info thinks that all my cores are A55:

cpu-info
Packages:
    0: Unknown
Microarchitectures:
    8x Cortex-A55
Cores:
    0: 1 processor (0), ARM Cortex-A55
    1: 1 processor (1), ARM Cortex-A55
    2: 1 processor (2), ARM Cortex-A55
    3: 1 processor (3), ARM Cortex-A55
    4: 1 processor (4), ARM Cortex-A55
    5: 1 processor (5), ARM Cortex-A55
    6: 1 processor (6), ARM Cortex-A55
    7: 1 processor (7), ARM Cortex-A55
Logical processors (System ID):
    0 (0)
    1 (1)
    2 (2)
    3 (3)
    4 (4)
    5 (5)
    6 (6)
    7 (7)
ovalhub commented 5 months ago

And, guessing that the A75 cores are [0-3], I disabled them and now I can boot my VMs !

sudo chcpu --disable 0,1,2,3
CPU 0 disabled
CPU 1 disabled
CPU 2 disabled
CPU 3 disabled
virsh create /mnt/backups/2024-04-13/oluf.xml
Domain 'oluf' created from /mnt/backups/2024-04-13/oluf.xml
Joshua-Riek commented 5 months ago

That is quite interesting. I assume the KVM fails to boot since the system sees the CPUs as A55 cores.

ovalhub commented 5 months ago

Even more so, now that I disabled the A76 Cores and got a VM to boot, cpu-info thinks I've got 8 A76 Cores:

cpu-info
Packages:
    0: Unknown
Microarchitectures:
    8x Cortex-A76
Cores:
    0: 1 processor (0), ARM Cortex-A76
    1: 1 processor (1), ARM Cortex-A76
    2: 1 processor (2), ARM Cortex-A76
    3: 1 processor (3), ARM Cortex-A76
    4: 1 processor (4), ARM Cortex-A76
    5: 1 processor (5), ARM Cortex-A76
    6: 1 processor (6), ARM Cortex-A76
    7: 1 processor (7), ARM Cortex-A76
Logical processors (System ID):
    0 (4)
    1 (5)
    2 (6)
    3 (7)
    4 (0)
    5 (1)
    6 (2)
    7 (3)
Joshua-Riek commented 5 months ago

Maybe it's a bug with cpu-info, take a look at the lscpu output below on my Rock 5B.

ubuntu@ubuntu:~$ lscpu
Architecture:           aarch64
  CPU op-mode(s):       32-bit, 64-bit
  Byte Order:           Little Endian
CPU(s):                 8
  On-line CPU(s) list:  0-7
Vendor ID:              ARM
  Model name:           Cortex-A55
    Model:              0
    Thread(s) per core: 1
    Core(s) per socket: 4
    Socket(s):          1
    Stepping:           r2p0
    CPU(s) scaling MHz: 100%
    CPU max MHz:        1800.0000
    CPU min MHz:        408.0000
    BogoMIPS:           48.00
    Flags:              fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm lrcpc dcpop asimddp
  Model name:           Cortex-A76
    Model:              0
    Thread(s) per core: 1
    Core(s) per socket: 4
    Socket(s):          1
    Stepping:           r4p0
    CPU(s) scaling MHz: 100%
    CPU max MHz:        2352.0000
    CPU min MHz:        408.0000
    BogoMIPS:           48.00
    Flags:              fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm lrcpc dcpop asimddp
Caches (sum of all):    
  L1d:                  384 KiB (8 instances)
  L1i:                  384 KiB (8 instances)
  L2:                   2.5 MiB (8 instances)
  L3:                   3 MiB (1 instance)
Vulnerabilities:        
  Itlb multihit:        Not affected
  L1tf:                 Not affected
  Mds:                  Not affected
  Meltdown:             Not affected
  Mmio stale data:      Not affected
  Retbleed:             Not affected
  Spec store bypass:    Mitigation; Speculative Store Bypass disabled via prctl
  Spectre v1:           Mitigation; __user pointer sanitization
  Spectre v2:           Vulnerable: Unprivileged eBPF enabled
  Srbds:                Not affected
  Tsx async abort:      Not affected
ovalhub commented 5 months ago

Ok, I turned off my VM, re-enabled the [0-3] CPUs, here is what I get from lscpu:

lscpu
Architecture:           aarch64
  CPU op-mode(s):       32-bit, 64-bit
  Byte Order:           Little Endian
CPU(s):                 8
  On-line CPU(s) list:  0-7
Vendor ID:              ARM
  Model name:           Cortex-A55
    Model:              0
    Thread(s) per core: 1
    Core(s) per socket: 4
    Socket(s):          1
    Stepping:           r2p0
    CPU(s) scaling MHz: 100%
    CPU max MHz:        1800.0000
    CPU min MHz:        408.0000
    BogoMIPS:           48.00
    Flags:              fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp
                        asimdhp cpuid asimdrdm lrcpc dcpop asimddp
  Model name:           Cortex-A76
    Model:              0
    Thread(s) per core: 1
    Core(s) per socket: 4
    Socket(s):          1
    Stepping:           r4p0
    CPU(s) scaling MHz: 100%
    CPU max MHz:        2304.0000
    CPU min MHz:        408.0000
    BogoMIPS:           48.00
    Flags:              fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp
                        asimdhp cpuid asimdrdm lrcpc dcpop asimddp
Caches (sum of all):
  L1d:                  384 KiB (8 instances)
  L1i:                  384 KiB (8 instances)
  L2:                   2.5 MiB (8 instances)
  L3:                   3 MiB (1 instance)
Vulnerabilities:
  Itlb multihit:        Not affected
  L1tf:                 Not affected
  Mds:                  Not affected
  Meltdown:             Not affected
  Mmio stale data:      Not affected
  Retbleed:             Not affected
  Spec store bypass:    Mitigation; Speculative Store Bypass disabled via prctl
  Spectre v1:           Mitigation; __user pointer sanitization
  Spectre v2:           Vulnerable: Unprivileged eBPF enabled
  Srbds:                Not affected
  Tsx async abort:      Not affected
Joshua-Riek commented 5 months ago

I found the source for cpuinfo, does anyone know if kvm relies on cpu-info?

https://launchpad.net/ubuntu/+source/cpuinfo

lukaszsobala commented 5 months ago

No, cores 0-3 are A55, the other ones are A76

ovalhub commented 5 months ago

lscpu is a bit confusing too. If I disable [0-3], it shows all cores as A76. If I disable [4-7], it shows all cores as A55. In both cases, with 4 cores disabled, I can boot a VM.

ovalhub commented 5 months ago

To sum up, the problem with booting kvm VMs only happens when a mix of A55 and A76 are enabled. I confirmed that by disabling [0,1,4,5]. With only A76 cores or only A55 cores, booting a kvm VM works.

ovalhub commented 5 months ago

I now verified that with all cores enabled again, if I restrict the cores the VM can use to not include a mix of A55 and A76, it boots fine too. Here using only A55 cores:

<vcpu placement='static' cpuset='0-3'>4</vcpu>
lukaszsobala commented 5 months ago

I now verified that with all cores enabled again, if I restrict the cores the VM can use to not include a mix of A55 and A76, it boots fine too. Here using only A55 cores:

<vcpu placement='static' cpuset='0-3'>4</vcpu>

Good idea! Core pinning should work too, I did not know you could do it this way.

minetaro12 commented 5 months ago

I'm using a NanoPi R6S and when I use KVM for the exact same thing, the virtual machine won't boot.

$ ./run.sh 
qemu-system-aarch64: Failed to put registers after init: Invalid argument

I am using the qemu command directly, but I can start it successfully by specifying the CPU as [0-3] or [4-7] in the taskset.

$ taskset -c 0,1,2,3 ./run.sh
or
$ taskset -c 4,5,6,7 ./run.sh
sund3RRR commented 4 months ago

I have exactly the same issue on orange pi 5. Is there a solution without disabling cores?

lukaszsobala commented 4 months ago

@sund3RRR maybe it's a bug in qemu? You could ask there. There are some other defaults for ARM kvm in virt-manager that prevent VMs from starting. But that's another topic.

ovalhub commented 4 months ago

On Thu, 23 May 2024, sunder wrote:

I have exactly the same issue on orange pi 5. Is there a solution without disabling cores?

You don't need to disable cores, you only need to avoid using a mixture of A76 (cores 4-7) and A55 (cores 0-3). When creating a VM with virt-install, specify a cpuset along with the number of cpus to use: --vcpus=4,cpuset=4-7

sund3RRR commented 4 months ago

@sund3RRR maybe it's a bug in qemu? You could ask there. There are some other defaults for ARM kvm in virt-manager that prevent VMs from starting. But that's another topic.

I don't know. I installed virt-manager in docker container on my server and got this error. Then I installed cockpit from apt and ran into the same issue:

internal error: QEMU unexpectedly closed the monitor (vm='NixOS_23.11'): 2024-05-23T15:14:12.948830Z qemu-system-aarch64: Failed to put registers after init: Invalid argument

I can't specify cores in cockpit virtual machine ui, but want to use webUI to access virtual machines on my server.

lukaszsobala commented 4 months ago

@sund3RRR This is due to the broken defaults. You need to edit the XML and turn these options to "off", because they don't exist on ARM:

    <hyperv>
      <relaxed state="on"/>
      <vapic state="on"/>
      <spinlocks state="on" retries="8191"/>
    </hyperv>
sund3RRR commented 4 months ago

@sund3RRR This is due to the broken defaults. You need to edit the XML and turn these options to "off", because they don't exist on ARM:

    <hyperv>
      <relaxed state="on"/>
      <vapic state="on"/>
      <spinlocks state="on" retries="8191"/>
    </hyperv>

Thank you man, but this doesn't work for me :( still failed

lukaszsobala commented 4 months ago

@sund3RRR

This is with the cores disabled? The vcpu method did not work for me...

sund3RRR commented 4 months ago

@sund3RRR

This is with the cores disabled? The vcpu method did not work for me...

Now it works, yeah! Really cool. Hope qemu will work someday without this thing.

sund3RRR commented 4 months ago

I still can't boot any image, it freezes on tianocore boot logo

image
lukaszsobala commented 4 months ago

I've had this problem too. You can try:

Ruach commented 4 months ago

@minetaro12 Could you please share your run script?

minetaro12 commented 4 months ago

@minetaro12 Could you please share your run script?

Here is the startup script I am using I am running ubuntu 24.04 in a VM

#!/bin/bash
qemu-system-aarch64 -M virt -cpu host -enable-kvm \
 -smp 2 -m 2G \
 -vnc none \
 -serial telnet:127.0.0.1:8023,server,nowait \
 -drive if=pflash,format=raw,readonly=on,file=flash0.img \
 -drive if=pflash,format=raw,file=flash1.img \
 -drive format=raw,file=disk.img
arthurpro commented 4 months ago

I'm having the same issue on Ubuntu 22.04.4 on Orange Pi 5 Plus while trying to start any VM using Incus:

qemu-system-aarch64: Failed to put registers after init: Invalid argument
Ruach commented 4 months ago

@minetaro12 @ovalhub Hi, I would like to run my custom compile linux on guest VM. Since I already have the vmlinuz and initrd.img for the host OS (located both in /boot/), I would like to utilize them to boot the guest VM instead of downloading full os image from the web.

This is what I tried

qemu-system-aarch64 -M virt -cpu host --enable-kvm \ -smp2 -m 2G \ -drive if=pflash,format=raw,readonly=on,file=AAVMF_CODE.fd \ -drive if=pflash,format=raw,file=AAVMF_VAR.fd \ -drive format=qcow2,file=rootfs.img \ -kernel /boot/vmlinuz \ -initrd /boot/initrd.img \ -nographic

It stuck after printing EFI stub: Exiting boot services and installing virtual address map..

Also, what is the reason for passing AAVMF_CODE and AAVMF_VAR? I am not sure how it works..

Ruach commented 4 months ago

@minetaro12 Could you please elaborate how could you build the ubuntu 24.04 image to boot the guest os? I am wondering is it possible to run custom built linux can be loaded as guest OS kernel…

minetaro12 commented 4 months ago

@minetaro12 Could you please elaborate how could you build the ubuntu 24.04 image to boot the guest os? I am wondering is it possible to run custom built linux can be loaded as guest OS kernel…

https://0sn.net/posts/20240204/nanopir6s-qemukvm/#4-%e8%b5%b7%e5%8b%95%e3%82%b9%e3%82%af%e3%83%aa%e3%83%97%e3%83%88%e3%81%ae%e4%bd%9c%e6%88%90 This was done using the procedure on my blog Just specify the Ubuntu installation iso with the cdrom option

Ruach commented 4 months ago

@minetaro12 is it possible to generate iso image with my custom build linux image? Or can I use the custom build image instead of iso file? could you please take a look at my above questions with error message? the kernel boot stuck with EFI stub message..

minetaro12 commented 4 months ago

https://0sn.net/posts/20240204/nanopir6s-qemukvm/#2-uefi%e3%81%ae%e5%8b%95%e4%bd%9c%e3%81%ab%e5%bf%85%e8%a6%81%e3%81%aa%e3%83%95%e3%82%a1%e3%82%a4%e3%83%ab%e3%81%ae%e4%bd%9c%e6%88%90 How about using the dd command to create a UEFI image? This is also on my blog.

Ruach commented 4 months ago

@minetaro12 I followed all the steps and only changed the last part. removing the iso part and add --kernel /boot/vmlinuz and still see the same messages and kernel boot stuck.

ovalhub commented 4 months ago

@minetaro12 @ovalhub Hi, I would like to run my custom compile linux on guest VM. Since I already have the vmlinuz and initrd.img for the host OS (located both in /boot/), I would like to utilize them to boot the guest VM instead of downloading full os image from the web.

This is what I tried

qemu-system-aarch64 -M virt -cpu host --enable-kvm -smp2 -m 2G -drive if=pflash,format=raw,readonly=on,file=AAVMF_CODE.fd -drive if=pflash,format=raw,file=AAVMF_VAR.fd -drive format=qcow2,file=rootfs.img -kernel /boot/vmlinuz -initrd /boot/initrd.img -nographic

It stuck after printing EFI stub: Exiting boot services and installing virtual address map..

Also, what is the reason for passing AAVMF_CODE and AAVMF_VAR? I am not sure how it works..

I don't use qemu directly, I use libvirt's virt-install to create a VM. To skip it auto-downloading an OS, pass it --import and a bootable device to boot from. For example, to create an alpinelinux 3.19 VM called alpine, use these commands:

sudo lvcreate -V 32g --thinpool vm/data -n lv-alpine
curl -O https://dl-cdn.alpinelinux.org/alpine/v3.19/releases/aarch64/alpine-virt-3.19.1-aarch64.iso
virt-install \
--name=alpine \
--virt-type=kvm --hvm --autostart \
--memory=1024 \
--vcpus=4,cpuset=0-3 \
--boot=loader=/usr/share/AAVMF/AAVMF_CODE.fd,loader.readonly=yes,loader.type=pflash,nvram.template=/usr/share/AAVMF/AAVMF_VARS.fd,loader_secure=no \
--osinfo=alpinelinux3.19 \
--network=bridge=vmbr0,model=virtio \
--graphics=none \
--import \
--cdrom=alpine-virt-3.19.1-aarch64.iso \
--disk=path=/dev/vm/lv-alpine,format=raw,cache=none,target.bus=scsi \
--memballoon=none

Here --cdrom is the bootable device, and --disk refers to a blank virtual HD just created above. If your HD is already bootable, as you seem to say, don't pass --cdrom, just boot from your HD instead. I use --graphics=none because I don't want any graphics, normally, you should see console output in your terminal.

Ruach commented 4 months ago

@ovalhub Thanks a lot, I checked lots of tutorial and blogs but seems that all of them doesn't work to boot VM with custom built kernel image on orangepi-5-plus.. Would you happen to know how to generate bootable HD?..

ovalhub commented 4 months ago

@ovalhub Thanks a lot, I checked lots of tutorial and blogs but seems that all of them doesn't work to boot VM with custom built kernel image on orangepi-5-plus.. Would you happen to know how to generate bootable HD?..

To create a bootable HD, install an OS. I gave you the exact instructions earlier today for how to do that with Alpinelinux 3.19.1. Run the commands, login as root when prompted, then run setup-alpine and follow the prompts.

Ruach commented 4 months ago

@ovalhub Thanks, but I was wondering is that possible with the custom built kernel not the iso image downloaded from the web..

Mlocik97 commented 3 months ago

Had same problem, I changed from Virtio do QXL:

<video>
  <model type="qxl" ram="65536" vram="65536" vgamem="16384" heads="1" primary="yes"/>
  <alias name="video0"/>
  <address type="pci" domain="0x0000" bus="0x00" slot="0x01" function="0x0"/>
</video>

And it works fine now. After installing, you can change back to Virtio.