Open ruomengh opened 1 month ago
Hello, thanks, i did a test and can create a VM with your qemu command:
qemu-system-x86_64 -accel kvm -name process=tdxvm,debug-threads=on -m 16G -vga none -monitor pty -nodefaults -drive file=./tdx-guest-ubuntu-24.04-intel.qcow2,if=virtio,format=qcow2 -monitor telnet:127.0.0.1:9072,server,nowait -bios /usr/share/qemu/OVMF.fd -object tdx-guest,sept-ve-disable=on,id=tdx -cpu host,-kvm-steal-time,pmu=off,tsc-freq=1000000000 -machine q35,hpet=off,kernel_irqchip=split,memory-encryption=tdx -device virtio-net-pci,netdev=mynet0 -netdev user,id=mynet0,net=10.0.2.0/24,dhcpstart=10.0.2.15,hostfwd=tcp::10059-:22 -smp 4 -chardev stdio,id=mux,mux=on,logfile=/tmp/vm_log_2024-05-10T0232.log -device virtio-serial,romfile= -device virtconsole,chardev=mux -monitor chardev:mux -serial chardev:mux -nographic
does the failure happen all the time to you or only occasionally ?
The issue likely happened after input "reboot" in a TD console (actually it's not rebooted but down) and then start the TD again. Once the issue happens, the next a few boots will fail with the same error as well.
I reproduced this issue, occasionally, use my normal qemu command: qemu-system-x86_64: Failed to get registers: Input/output error qemu-system-x86_64: Failed to get registers: Input/output error qemu-system-x86_64: Failed to get registers: Input/output error qemu-system-x86_64: Failed to get registers: Input/output error qemu-system-x86_64: Failed to get registers: Input/output errorboot_td.sh: line 16: 25136 Segmentation fault (core dumped) qemu-system-x86_64 -name tdxvm,process=tdxvm,debug-threads=on -accel kvm -object tdx-guest,id=tdx -smp 8 -m 8G -cpu host -nodefaults -nographic -vga none -machine q35,kernel_irqchip=split,confidential-guest-support=tdx,hpet=off -drive file=$img,if=none,id=virtio-disk0 -device virtio-blk-pci,drive=virtio-disk0 -device virtio-net-pci,netdev=nic0 -netdev user,id=nic0,hostfwd=tcp::10022-:22 -bios /usr/share/qemu/OVMF.fd -chardev stdio,id=mux,mux=on,signal=off,logfile=test.log -device virtio-serial -device virtconsole,chardev=mux -serial chardev:mux
with below host dmesg call trace and segment fault:
[35681.431239] kauditd_printk_skb: 109 callbacks suppressed
[35681.431248] audit: type=1400 audit(1716225051.602:121): apparmor="DENIED" operation="mknod" class="file" profile="ubuntu_pro_apt_news" name="/usr/lib/python3/dist-packages/uaclient/pycache/apt_news.cpython-312.pyc.124103736237104" pid=23850 comm="python3" requested_mask="c" denied_mask="c" fsuid=0 ouid=0
[37597.944430] ------------[ cut here ]------------
[37597.944435] WARNING: CPU: 1 PID: 24908 at arch/x86/kvm/vmx/tdx.c:1494 tdx_mem_page_aug+0x102/0x1d0 [kvm_intel]
[37597.944478] Modules linked in: tls xt_CHECKSUM xt_MASQUERADE xt_conntrack ipt_REJECT nf_reject_ipv4 xt_tcpudp nft_compat nft_chain_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 nf_tables bridge stp llc vsock_loopback vhost_vsock vmw_vsock_virtio_transport_common vsock vhost vhost_iotlb kvm_intel ast wmi sunrpc binfmt_misc kvm irqbypass nls_iso8859_1 ipmi_ssif acpi_ipmi ipmi_si ipmi_devintf ipmi_msghandler pfr_telemetry pfr_update dm_multipath msr efi_pstore nfnetlink ip_tables x_tables autofs4 btrfs blake2b_generic raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor raid6_pq libcrc32c raid1 raid0 crct10dif_pclmul crc32_pclmul polyval_clmulni polyval_generic ghash_clmulni_intel igb sha256_ssse3 sha1_ssse3 nvme i2c_algo_bit xhci_pci dca nvme_core xhci_pci_renesas nvme_auth aesni_intel crypto_simd cryptd
[37597.944569] CPU: 1 PID: 24908 Comm: CPU 3/KVM Not tainted 6.8.0-1004-intel #11-Ubuntu
[37597.944572] Hardware name: Intel Corporation BeechnutCity/BeechnutCity, BIOS BHSDCRB1.IPC.0029.D28.2401081854 01/08/2024
[37597.944574] RIP: 0010:tdx_mem_page_aug+0x102/0x1d0 [kvm_intel]
[37597.944594] Code: 48 8b 8d 68 ff ff ff 89 c2 83 e2 07 44 39 ea 74 7f 45 0f b6 ac 24 31 9b 00 00 41 80 fd 01 0f 87 c5 00 00 00 41 83 e5 01 75 1d <0f> 0b b8 01 01 00 00 be 01 03 00 00 4c 89 e7 66 41 89 84 24 31 9b
[37597.944597] RSP: 0018:ff89ff4da1097850 EFLAGS: 00010246
[37597.944599] RAX: 0000000000000004 RBX: c0000b0d00000001 RCX: 8000030b5ded04f7
[37597.944600] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000
[37597.944601] RBP: ff89ff4da10978f0 R08: 0000000000000000 R09: 0000000000000000
[37597.944602] R10: 0000000000000000 R11: 0000000000000000 R12: ff89ff4da1019000
[37597.944603] R13: 0000000000000000 R14: ff89ff4da1097858 R15: 0000000000000001
[37597.944604] FS: 0000764d42a006c0(0000) GS:ff4d227e14280000(0000) knlGS:0000000000000000
[37597.944605] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[37597.944606] CR2: 0000000000000000 CR3: 0000030ae159e004 CR4: 0000000000f73ef0
[37597.944608] PKRU: 55555554
[37597.944609] Call Trace:
[37597.944611]
Thank you for reporting us your feedback!
The internal ticket has been created: https://warthogs.atlassian.net/browse/PEK-648.
This message was autogenerated
One more thing, the issue happened frequently if the TD has some workload running for a while and then quit the TD and boot it again. Configuration vCPU/memory: 4/16GB
[35681.431248] audit: type=1400 audit(1716225051.602:121): apparmor="DENIED" operation="mknod" class="file" profile="ubuntu_pro_apt_news" name="/usr/lib/python3/dist-packages/uaclient/pycache/apt_news.cpython-312.pyc.124103736237104" pid=23850 comm="python3" requested_mask="c" denied_mask="c" fsuid=0 ouid=0
@ruomengh is apparmor running in your system? Not sure if it is related, I will disable apparmor to try
ok, useless, still can be reproduced
TD boot failure with error as below. After the issue happens occasionally, TD cannot boot in the next a few attempts.
Error: qemu-system-x86_64: Failed to get registers: Input/output error qemu-system-x86_64: Failed to get registers: Input/output error qemu-system-x86_64: Failed to get registers: Input/output error qemu-system-x86_64: Failed to get registers: Input/output error qemu-system-x86_64: Failed to get registers: Input/output error qemu-system-x86_64: Failed to get registers: Input/output error qemu-system-x86_64:./qemu-test.sh: line 379: 2988094 Segmentation fault (core dumped) /usr/bin/qemu-system-x86_64 -accel kvm -name process=tdxvm,debug-threads=on -m 16G -vga none -monitor pty -nodefaults -drive file=/home/ruomeng/images/tdx-2404.qcow2,if=virtio,format=qcow2 -monitor telnet:127.0.0.1:9072,server,nowait -bios /usr/share/qemu/OVMF.fd -object tdx-guest,sept-ve-disable=on,id=tdx -cpu host,-kvm-steal-time,pmu=off,tsc-freq=1000000000 -machine q35,hpet=off,kernel_irqchip=split,memory-encryption=tdx -device virtio-net-pci,netdev=mynet0 -netdev user,id=mynet0,net=10.0.2.0/24,dhcpstart=10.0.2.15,hostfwd=tcp::10059-:22 -smp 4 -chardev stdio,id=mux,mux=on,logfile=/tmp/vm_log_2024-05-10T0232.log -device virtio-serial,romfile= -device virtconsole,chardev=mux -monitor chardev:mux -serial chardev:mux -nographic
dmesg of the host: [74411.414026] kvm: vcpu 0: requested 24992 ns lapic timer period limited to 200000 ns [74411.416450] kvm: vcpu 1: requested 24992 ns lapic timer period limited to 200000 ns [74411.418247] kvm: vcpu 2: requested 24992 ns lapic timer period limited to 200000 ns [74411.419948] kvm: vcpu 3: requested 24992 ns lapic timer period limited to 200000 ns [74414.466490] SEAMCALL (0x0000000000000006) failed: 0xc0000b0d00000001 RCX 0x8000004683bc70f7 RDX 0x0000000000000400 R8 0x0000004683bc7000 R9 0x0000000000000000 R10 0x0000000000000000 R11 0x0000000000000000 [74414.466498] SEAMCALL (0x0000000000000006) failed: 0xc0000b0d00000001 RCX 0x8000004683b710f7 RDX 0x0000000000000400 R8 0x0000004683b71000 R9 0x0000000000000000 R10 0x0000000000000000 R11 0x0000000000000000 [74415.068003] CPU 2/KVM[2993257]: segfault at 72d21ec00fe8 ip 000072d227e690dc sp 000072d21ec00ff0 error 6 in libc.so.6[72d227e28000+188000] likely on CPU 0 (core 0, socket 0) [74415.068022] Code: 48 89 45 c8 48 8b 05 3b 9d 19 00 f3 0f 6f 0a 64 8b 00 0f 11 8d b8 fb ff ff 89 85 08 fb ff ff 48 8b 42 10 48 89 85 c8 fb ff ff af f4 fb ff 48 89 de 4c 89 ef 48 89 c2 48 89 85 f8 fa ff ff 49