canonical / tdx

Intel confidential computing - TDX
GNU General Public License v3.0
59 stars 23 forks source link

TD boot failure with 24.04 host and guest #100

Open ruomengh opened 1 month ago

ruomengh commented 1 month ago

TD boot failure with error as below. After the issue happens occasionally, TD cannot boot in the next a few attempts.

Error: qemu-system-x86_64: Failed to get registers: Input/output error qemu-system-x86_64: Failed to get registers: Input/output error qemu-system-x86_64: Failed to get registers: Input/output error qemu-system-x86_64: Failed to get registers: Input/output error qemu-system-x86_64: Failed to get registers: Input/output error qemu-system-x86_64: Failed to get registers: Input/output error qemu-system-x86_64:./qemu-test.sh: line 379: 2988094 Segmentation fault (core dumped) /usr/bin/qemu-system-x86_64 -accel kvm -name process=tdxvm,debug-threads=on -m 16G -vga none -monitor pty -nodefaults -drive file=/home/ruomeng/images/tdx-2404.qcow2,if=virtio,format=qcow2 -monitor telnet:127.0.0.1:9072,server,nowait -bios /usr/share/qemu/OVMF.fd -object tdx-guest,sept-ve-disable=on,id=tdx -cpu host,-kvm-steal-time,pmu=off,tsc-freq=1000000000 -machine q35,hpet=off,kernel_irqchip=split,memory-encryption=tdx -device virtio-net-pci,netdev=mynet0 -netdev user,id=mynet0,net=10.0.2.0/24,dhcpstart=10.0.2.15,hostfwd=tcp::10059-:22 -smp 4 -chardev stdio,id=mux,mux=on,logfile=/tmp/vm_log_2024-05-10T0232.log -device virtio-serial,romfile= -device virtconsole,chardev=mux -monitor chardev:mux -serial chardev:mux -nographic

dmesg of the host: [74411.414026] kvm: vcpu 0: requested 24992 ns lapic timer period limited to 200000 ns [74411.416450] kvm: vcpu 1: requested 24992 ns lapic timer period limited to 200000 ns [74411.418247] kvm: vcpu 2: requested 24992 ns lapic timer period limited to 200000 ns [74411.419948] kvm: vcpu 3: requested 24992 ns lapic timer period limited to 200000 ns [74414.466490] SEAMCALL (0x0000000000000006) failed: 0xc0000b0d00000001 RCX 0x8000004683bc70f7 RDX 0x0000000000000400 R8 0x0000004683bc7000 R9 0x0000000000000000 R10 0x0000000000000000 R11 0x0000000000000000 [74414.466498] SEAMCALL (0x0000000000000006) failed: 0xc0000b0d00000001 RCX 0x8000004683b710f7 RDX 0x0000000000000400 R8 0x0000004683b71000 R9 0x0000000000000000 R10 0x0000000000000000 R11 0x0000000000000000 [74415.068003] CPU 2/KVM[2993257]: segfault at 72d21ec00fe8 ip 000072d227e690dc sp 000072d21ec00ff0 error 6 in libc.so.6[72d227e28000+188000] likely on CPU 0 (core 0, socket 0) [74415.068022] Code: 48 89 45 c8 48 8b 05 3b 9d 19 00 f3 0f 6f 0a 64 8b 00 0f 11 8d b8 fb ff ff 89 85 08 fb ff ff 48 8b 42 10 48 89 85 c8 fb ff ff af f4 fb ff 48 89 de 4c 89 ef 48 89 c2 48 89 85 f8 fa ff ff 49

hector-cao commented 1 month ago

Hello, thanks, i did a test and can create a VM with your qemu command:

qemu-system-x86_64 -accel kvm -name process=tdxvm,debug-threads=on -m 16G -vga none -monitor pty -nodefaults -drive file=./tdx-guest-ubuntu-24.04-intel.qcow2,if=virtio,format=qcow2 -monitor telnet:127.0.0.1:9072,server,nowait -bios /usr/share/qemu/OVMF.fd -object tdx-guest,sept-ve-disable=on,id=tdx -cpu host,-kvm-steal-time,pmu=off,tsc-freq=1000000000 -machine q35,hpet=off,kernel_irqchip=split,memory-encryption=tdx -device virtio-net-pci,netdev=mynet0 -netdev user,id=mynet0,net=10.0.2.0/24,dhcpstart=10.0.2.15,hostfwd=tcp::10059-:22 -smp 4 -chardev stdio,id=mux,mux=on,logfile=/tmp/vm_log_2024-05-10T0232.log -device virtio-serial,romfile= -device virtconsole,chardev=mux -monitor chardev:mux -serial chardev:mux -nographic

does the failure happen all the time to you or only occasionally ?

ruomengh commented 1 month ago

The issue likely happened after input "reboot" in a TD console (actually it's not rebooted but down) and then start the TD again. Once the issue happens, the next a few boots will fail with the same error as well.

fanchen2 commented 1 month ago

I reproduced this issue, occasionally, use my normal qemu command: qemu-system-x86_64: Failed to get registers: Input/output error qemu-system-x86_64: Failed to get registers: Input/output error qemu-system-x86_64: Failed to get registers: Input/output error qemu-system-x86_64: Failed to get registers: Input/output error qemu-system-x86_64: Failed to get registers: Input/output errorboot_td.sh: line 16: 25136 Segmentation fault (core dumped) qemu-system-x86_64 -name tdxvm,process=tdxvm,debug-threads=on -accel kvm -object tdx-guest,id=tdx -smp 8 -m 8G -cpu host -nodefaults -nographic -vga none -machine q35,kernel_irqchip=split,confidential-guest-support=tdx,hpet=off -drive file=$img,if=none,id=virtio-disk0 -device virtio-blk-pci,drive=virtio-disk0 -device virtio-net-pci,netdev=nic0 -netdev user,id=nic0,hostfwd=tcp::10022-:22 -bios /usr/share/qemu/OVMF.fd -chardev stdio,id=mux,mux=on,signal=off,logfile=test.log -device virtio-serial -device virtconsole,chardev=mux -serial chardev:mux

with below host dmesg call trace and segment fault: [35681.431239] kauditd_printk_skb: 109 callbacks suppressed [35681.431248] audit: type=1400 audit(1716225051.602:121): apparmor="DENIED" operation="mknod" class="file" profile="ubuntu_pro_apt_news" name="/usr/lib/python3/dist-packages/uaclient/pycache/apt_news.cpython-312.pyc.124103736237104" pid=23850 comm="python3" requested_mask="c" denied_mask="c" fsuid=0 ouid=0 [37597.944430] ------------[ cut here ]------------ [37597.944435] WARNING: CPU: 1 PID: 24908 at arch/x86/kvm/vmx/tdx.c:1494 tdx_mem_page_aug+0x102/0x1d0 [kvm_intel] [37597.944478] Modules linked in: tls xt_CHECKSUM xt_MASQUERADE xt_conntrack ipt_REJECT nf_reject_ipv4 xt_tcpudp nft_compat nft_chain_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 nf_tables bridge stp llc vsock_loopback vhost_vsock vmw_vsock_virtio_transport_common vsock vhost vhost_iotlb kvm_intel ast wmi sunrpc binfmt_misc kvm irqbypass nls_iso8859_1 ipmi_ssif acpi_ipmi ipmi_si ipmi_devintf ipmi_msghandler pfr_telemetry pfr_update dm_multipath msr efi_pstore nfnetlink ip_tables x_tables autofs4 btrfs blake2b_generic raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor raid6_pq libcrc32c raid1 raid0 crct10dif_pclmul crc32_pclmul polyval_clmulni polyval_generic ghash_clmulni_intel igb sha256_ssse3 sha1_ssse3 nvme i2c_algo_bit xhci_pci dca nvme_core xhci_pci_renesas nvme_auth aesni_intel crypto_simd cryptd [37597.944569] CPU: 1 PID: 24908 Comm: CPU 3/KVM Not tainted 6.8.0-1004-intel #11-Ubuntu [37597.944572] Hardware name: Intel Corporation BeechnutCity/BeechnutCity, BIOS BHSDCRB1.IPC.0029.D28.2401081854 01/08/2024 [37597.944574] RIP: 0010:tdx_mem_page_aug+0x102/0x1d0 [kvm_intel] [37597.944594] Code: 48 8b 8d 68 ff ff ff 89 c2 83 e2 07 44 39 ea 74 7f 45 0f b6 ac 24 31 9b 00 00 41 80 fd 01 0f 87 c5 00 00 00 41 83 e5 01 75 1d <0f> 0b b8 01 01 00 00 be 01 03 00 00 4c 89 e7 66 41 89 84 24 31 9b [37597.944597] RSP: 0018:ff89ff4da1097850 EFLAGS: 00010246 [37597.944599] RAX: 0000000000000004 RBX: c0000b0d00000001 RCX: 8000030b5ded04f7 [37597.944600] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000 [37597.944601] RBP: ff89ff4da10978f0 R08: 0000000000000000 R09: 0000000000000000 [37597.944602] R10: 0000000000000000 R11: 0000000000000000 R12: ff89ff4da1019000 [37597.944603] R13: 0000000000000000 R14: ff89ff4da1097858 R15: 0000000000000001 [37597.944604] FS: 0000764d42a006c0(0000) GS:ff4d227e14280000(0000) knlGS:0000000000000000 [37597.944605] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [37597.944606] CR2: 0000000000000000 CR3: 0000030ae159e004 CR4: 0000000000f73ef0 [37597.944608] PKRU: 55555554 [37597.944609] Call Trace: [37597.944611] [37597.944616] ? show_regs+0x6d/0x80 [37597.944625] ? warn+0x89/0x160 [37597.944635] ? tdx_mem_page_aug+0x102/0x1d0 [kvm_intel] [37597.944654] ? report_bug+0x17e/0x1b0 [37597.944663] ? handle_bug+0x51/0xa0 [37597.944670] ? exc_invalid_op+0x18/0x80 [37597.944671] ? asm_exc_invalid_op+0x1b/0x20 [37597.944676] ? tdx_mem_page_aug+0x102/0x1d0 [kvm_intel] [37597.944692] tdx_sept_set_private_spte+0x93/0x100 [kvm_intel] [37597.944708] set_private_spte_present.isra.0+0x10b/0x1d0 [kvm] [37597.944851] tdp_mmu_set_spte_atomic+0x19e/0x1e0 [kvm] [37597.944919] tdp_mmu_map_handle_target_level+0x22a/0x3a0 [kvm] [37597.944985] kvm_tdp_mmu_map+0x3af/0x3f0 [kvm] [37597.945068] kvm_tdp_mmu_page_fault+0x111/0x120 [kvm] [37597.945137] kvm_tdp_page_fault+0x100/0x120 [kvm] [37597.945210] ? kvm_set_msr_common+0xade/0x11d0 [kvm] [37597.945278] kvm_mmu_do_page_fault+0x1af/0x200 [kvm] [37597.945345] kvm_mmu_page_fault+0x91/0x200 [kvm] [37597.945412] tdx_handle_ept_violation+0xf3/0x240 [kvm_intel] [37597.945433] tdx_handle_exit+0x1f7/0x2d0 [kvm_intel] [37597.945451] vt_handle_exit+0x2a/0x40 [kvm_intel] [37597.945467] vcpu_enter_guest+0x5d5/0x1000 [kvm] [37597.945531] ? kvm_apic_local_deliver+0xa1/0xd0 [kvm] [37597.945600] vcpu_run+0x51/0x2a0 [kvm] [37597.945660] kvm_arch_vcpu_ioctl_run+0x1bd/0x4d0 [kvm] [37597.945720] kvm_vcpu_ioctl+0x248/0x890 [kvm] [37597.945766] ? kvm_on_user_return+0x8f/0x110 [kvm] [37597.945826] ? fire_user_return_notifiers+0x37/0x80 [37597.945833] ? syscall_exit_to_user_mode+0x86/0x260 [37597.945837] ? do_syscall_64+0x8c/0x180 [37597.945843] __x64_sys_ioctl+0xa0/0xf0 [37597.945849] x64_sys_call+0x143b/0x25c0 [37597.945853] do_syscall_64+0x7f/0x180 [37597.945856] ? do_syscall_64+0x8c/0x180 [37597.945858] ? do_syscall_64+0x8c/0x180 [37597.945861] ? do_syscall_64+0x8c/0x180 [37597.945863] ? do_syscall_64+0x8c/0x180 [37597.945866] entry_SYSCALL_64_after_hwframe+0x73/0x7b [37597.945872] RIP: 0033:0x764d4b524ded [37597.945945] Code: 04 25 28 00 00 00 48 89 45 c8 31 c0 48 8d 45 10 c7 45 b0 10 00 00 00 48 89 45 b8 48 8d 45 d0 48 89 45 c0 b8 10 00 00 00 0f 05 <89> c2 3d 00 f0 ff ff 77 1a 48 8b 45 c8 64 48 2b 04 25 28 00 00 00 [37597.945946] RSP: 002b:0000764d429ff440 EFLAGS: 00000246 ORIG_RAX: 0000000000000010 [37597.945948] RAX: ffffffffffffffda RBX: 00006250d1a25d20 RCX: 0000764d4b524ded [37597.945949] RDX: 0000000000000000 RSI: 000000000000ae80 RDI: 0000000000000013 [37597.945950] RBP: 0000764d429ff490 R08: 00006250d0029ad0 R09: 0000000000000000 [37597.945951] R10: 00006250d16fd1e0 R11: 0000000000000246 R12: 000000000000ae80 [37597.945952] R13: 0000000000000000 R14: 0000000000000001 R15: 0000764d4af1e001 [37597.945954] [37597.945955] ---[ end trace 0000000000000000 ]--- [37597.945962] SEAMCALL (0x0000000000000006) failed: 0xc0000b0d00000001 RCX 0x8000030b5ded04f7 RDX 0x0000000000000400 R8 0x0000030b5ded0000 R9 0x0000000000000000 R10 0x0000000000000000 R11 0x0000000000000000 [37598.331387] CPU 0/KVM[24905]: segfault at 764d48400fe8 ip 0000764d4b4690dc sp 0000764d48400ff0 error 6 in libc.so.6[764d4b428000+188000] likely on CPU 221 (core 29, socket 1) [37598.331419] Code: 48 89 45 c8 48 8b 05 3b 9d 19 00 f3 0f 6f 0a 64 8b 00 0f 11 8d b8 fb ff ff 89 85 08 fb ff ff 48 8b 42 10 48 89 85 c8 fb ff ff af f4 fb ff 48 89 de 4c 89 ef 48 89 c2 48 89 85 f8 fa ff ff 49 [37910.885484] SEAMCALL (0x0000000000000006) failed: 0xc0000b0d00000001 RCX 0x80000009028024f7 RDX 0x0000000000000400 R8 0x0000000902802000 R9 0x0000000000000000 R10 0x0000000000000000 R11 0x0000000000000000 [37911.259718] CPU 0/KVM[25141]: segfault at 709a42800fe8 ip 0000709a49e690dc sp 0000709a42800ff0 error 6 in libc.so.6[709a49e28000+188000] likely on CPU 96 (core 64, socket 1) [37911.259742] Code: 48 89 45 c8 48 8b 05 3b 9d 19 00 f3 0f 6f 0a 64 8b 00 0f 11 8d b8 fb ff ff 89 85 08 fb ff ff 48 8b 42 10 48 89 85 c8 fb ff ff af f4 fb ff 48 89 de 4c 89 ef 48 89 c2 48 89 85 f8 fa ff ff 49

syncronize-issues-to-jira[bot] commented 1 month ago

Thank you for reporting us your feedback!

The internal ticket has been created: https://warthogs.atlassian.net/browse/PEK-648.

This message was autogenerated

ruomengh commented 1 month ago

One more thing, the issue happened frequently if the TD has some workload running for a while and then quit the TD and boot it again. Configuration vCPU/memory: 4/16GB

fanchen2 commented 1 month ago

[35681.431248] audit: type=1400 audit(1716225051.602:121): apparmor="DENIED" operation="mknod" class="file" profile="ubuntu_pro_apt_news" name="/usr/lib/python3/dist-packages/uaclient/pycache/apt_news.cpython-312.pyc.124103736237104" pid=23850 comm="python3" requested_mask="c" denied_mask="c" fsuid=0 ouid=0

@ruomengh is apparmor running in your system? Not sure if it is related, I will disable apparmor to try

fanchen2 commented 1 month ago

ok, useless, still can be reproduced