canonical / tdx

Intel confidential computing - TDX
GNU General Public License v3.0
105 stars 42 forks source link

TD sometimes crashed when sending 100 getting quote requests in parallel #156

Open ruomengh opened 4 months ago

ruomengh commented 4 months ago

There is a scenario of sending 100 "get quote" requests in parallel in a TD. Sometimes TD crashed with below dmesg log.

[2396717.199760] pr_tdx_error: 503 callbacks suppressed [2396717.199763] SEAMCALL (0x000000000000000f) failed: 0x8000081000000000 RCX 0x0000000000000000 RDX 0x0000000000000000 R8 0x000000410f759000 R9 0x0000000000000000 R10 0x0000000000000000 R11 0x0000000000000000 [2396717.199805] SEAMCALL (0x000000000000001c) failed: 0xc000030000000001 RCX 0x0000000000000000 RDX 0x0000000000000000 R8 0x0000000000000000 R9 0x0000000000000000 R10 0x0000000000000000 R11 0x0000000000000000 [2396717.199831] SEAMCALL (0x000000000000001c) failed: 0xc000030000000001 RCX 0x0000000000000000 RDX 0x0000000000000000 R8 0x0000000000000000 R9 0x0000000000000000 R10 0x0000000000000000 R11 0x0000000000000000 [2396717.199850] SEAMCALL (0x000000000000001c) failed: 0xc000030000000001 RCX 0x0000000000000000 RDX 0x0000000000000000 R8 0x0000000000000000 R9 0x0000000000000000 R10 0x0000000000000000 R11 0x0000000000000000 [2396717.199861] SEAMCALL (0x000000000000001c) failed: 0xc000030000000001 RCX 0x0000000000000000 RDX 0x0000000000000000 R8 0x0000000000000000 R9 0x0000000000000000 R10 0x0000000000000000 R11 0x0000000000000000 [2396717.199870] SEAMCALL (0x000000000000001c) failed: 0xc000030000000001 RCX 0x0000000000000000 RDX 0x0000000000000000 R8 0x0000000000000000 R9 0x0000000000000000 R10 0x0000000000000000 R11 0x0000000000000000 [2396717.199879] SEAMCALL (0x000000000000001c) failed: 0xc000030000000001 RCX 0x0000000000000000 RDX 0x0000000000000000 R8 0x0000000000000000 R9 0x0000000000000000 R10 0x0000000000000000 R11 0x0000000000000000 [2396717.199888] SEAMCALL (0x000000000000001c) failed: 0xc000030000000001 RCX 0x0000000000000000 RDX 0x0000000000000000 R8 0x0000000000000000 R9 0x0000000000000000 R10 0x0000000000000000 R11 0x0000000000000000 [2396717.199897] SEAMCALL (0x000000000000001c) failed: 0xc000030000000001 RCX 0x0000000000000000 RDX 0x0000000000000000 R8 0x0000000000000000 R9 0x0000000000000000 R10 0x0000000000000000 R11 0x0000000000000000 [2396717.199906] SEAMCALL (0x000000000000001c) failed: 0xc000030000000001 RCX 0x0000000000000000 RDX 0x0000000000000000 R8 0x0000000000000000 R9 0x0000000000000000 R10 0x0000000000000000 R11 0x0000000000000000 [2396717.203038] ------------[ cut here ]------------ [2396717.203040] WARNING: CPU: 59 PID: 2508827 at arch/x86/kvm/mmu/tdp_mmu.c:395 handle_removed_pt+0x304/0x340 [kvm] [2396717.203151] Modules linked in: vxlan ip6_udp_tunnel udp_tunnel xt_nat xt_statistic xt_recent xt_mark xt_comment joydev input_leds hid_generic usbhid hid vsock_loopback vsock_diag tcp_diag udp_diag raw_diag inet_diag unix_diag veth vhost_vsock vmw_vsock_virtio_transport_common vsock ib_core vhost_net vhost vhost_iotlb tap tls xt_CHECKSUM ipt_REJECT nf_reject_ipv4 xt_conntrack xt_tcpudp xt_MASQUERADE nf_conntrack_netlink xfrm_user xfrm_algo nft_chain_nat xt_addrtype nf_nat nft_compat nf_conntrack nf_defrag_ipv6 br_netfilter nf_defrag_ipv4 nf_tables bridge stp llc overlay binfmt_misc kvm_intel nls_iso8859_1 kvm irqbypass ipmi_ssif wmi acpi_ipmi ipmi_si ipmi_devintf ipmi_msghandler acpi_pad pfr_telemetry pfr_update dm_multipath msr efi_pstore nfnetlink ip_tables x_tables autofs4 btrfs blake2b_generic raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor raid6_pq libcrc32c raid1 raid0 crct10dif_pclmul crc32_pclmul polyval_clmulni nvme polyval_generic ghash_clmulni_intel sha256_ssse3 nvme_core ahci [2396717.203194] sha1_ssse3 xhci_pci igc nvme_auth libahci xhci_pci_renesas aesni_intel crypto_simd cryptd [2396717.203199] CPU: 59 PID: 2508827 Comm: vhost-2508808 Tainted: G W 6.8.0-1004-intel #11-Ubuntu [2396717.203201] Hardware name: Intel Corporation ArcherCity/ArcherCity, BIOS EGSDCRB1.SYS.0105.D48.2308140026 08/14/2023 [2396717.203203] RIP: 0010:handle_removed_pt+0x304/0x340 [kvm] [2396717.203258] Code: 48 0f 44 f8 e9 c9 fd ff ff 41 0f b6 57 24 49 8b 4f 48 49 8b 77 28 48 8b 7d b8 83 e2 0f e8 24 5f 19 00 85 c0 0f 84 37 ff ff ff <0f> 0b 49 c7 47 48 00 00 00 00 e9 28 ff ff ff 48 8d 7a ff e9 90 fd [2396717.203259] RSP: 0018:ff58ac41054df8d0 EFLAGS: 00010282 [2396717.203261] RAX: 00000000fffffffb RBX: 0000000000001000 RCX: 0000000000000000 [2396717.203262] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000 [2396717.203263] RBP: ff58ac41054df930 R08: 0000000000000000 R09: 0000000000000000 [2396717.203264] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000001 [2396717.203264] R13: 0000000000000000 R14: 0000000000100200 R15: ff1b9d6a8f39c730 [2396717.203265] FS: 0000000000000000(0000) GS:ff1b9da8fab80000(0000) knlGS:0000000000000000 [2396717.203267] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [2396717.203268] CR2: 000000c001c99010 CR3: 0000004d50f5c003 CR4: 0000000000f73ef0 [2396717.203269] PKRU: 55555554 [2396717.203270] Call Trace: [2396717.203271] [2396717.203274] ? show_regs+0x6d/0x80 [2396717.203280] ? warn+0x89/0x160 [2396717.203288] ? handle_removed_pt+0x304/0x340 [kvm] [2396717.203341] ? report_bug+0x17e/0x1b0 [2396717.203348] ? handle_bug+0x51/0xa0 [2396717.203361] ? exc_invalid_op+0x18/0x80 [2396717.203362] ? asm_exc_invalid_op+0x1b/0x20 [2396717.203366] ? handle_removed_pt+0x304/0x340 [kvm] [2396717.203416] handle_changed_spte+0x5e2/0x850 [kvm] [2396717.203465] handle_removed_pt+0x1b1/0x340 [kvm] [2396717.203514] handle_changed_spte+0x5e2/0x850 [kvm] [2396717.203562] tdp_mmu_set_spte+0x111/0x240 [kvm] [2396717.203609] tdp_mmu_zap_root+0x1ee/0x210 [kvm] [2396717.203658] kvm_tdp_mmu_zap_all+0x3e/0x90 [kvm] [2396717.203708] kvm_arch_flush_shadow_all+0x103/0x110 [kvm] [2396717.203762] kvm_mmu_notifier_release+0x2f/0x60 [kvm] [2396717.203801] mmu_notifier_release+0x7b/0x200 [2396717.203809] exit_mmap+0x3a2/0x3e0 [2396717.203817] mmput+0x41/0x140 [2396717.203820] mmput+0x31/0x40 [2396717.203821] exit_mm+0xbe/0x130 [2396717.203826] do_exit+0x273/0x530 [2396717.203829] vhost_task_fn+0xc6/0xd0 [2396717.203835] ? pfx_vhost_task_fn+0x10/0x10 [2396717.203837] ret_from_fork+0x44/0x70 [2396717.203841] ? pfx_vhost_task_fn+0x10/0x10 [2396717.203843] ret_from_fork_asm+0x1b/0x30 [2396717.203846] RIP: 0033:0x0 [2396717.203870] Code: Unable to access opcode bytes at 0xffffffffffffffd6. [2396717.203871] RSP: 002b:0000000000000000 EFLAGS: 00000246 ORIG_RAX: 0000000000000010 [2396717.203873] RAX: 0000000000000000 RBX: 000000000000001d RCX: 000077721ff24ded [2396717.203873] RDX: 0000000000000000 RSI: 000000000000af01 RDI: 000000000000001d [2396717.203874] RBP: 00007fff1ee92d80 R08: 00007fff1ee92e70 R09: 0000000000000000 [2396717.203875] R10: 0000000000000001 R11: 0000000000000246 R12: 00007fff1ee92e70 [2396717.203876] R13: 000055e55bf810f8 R14: 0000000000000000 R15: 000055e55e9e65b8 [2396717.203877] [2396717.203878] ---[ end trace 0000000000000000 ]--- [2396717.338991] ------------[ cut here ]------------ [2396717.338994] WARNING: CPU: 59 PID: 2508827 at arch/x86/kvm/mmu/tdp_mmu.c:395 handle_removed_pt+0x304/0x340 [kvm] [2396717.339056] Modules linked in: vxlan ip6_udp_tunnel udp_tunnel xt_nat xt_statistic xt_recent xt_mark xt_comment joydev input_leds hid_generic usbhid hid vsock_loopback vsock_diag tcp_diag udp_diag raw_diag inet_diag unix_diag veth vhost_vsock vmw_vsock_virtio_transport_common vsock ib_core vhost_net vhost vhost_iotlb tap tls xt_CHECKSUM ipt_REJECT nf_reject_ipv4 xt_conntrack xt_tcpudp xt_MASQUERADE nf_conntrack_netlink xfrm_user xfrm_algo nft_chain_nat xt_addrtype nf_nat nft_compat nf_conntrack nf_defrag_ipv6 br_netfilter nf_defrag_ipv4 nf_tables bridge stp llc overlay binfmt_misc kvm_intel nls_iso8859_1 kvm irqbypass ipmi_ssif wmi acpi_ipmi ipmi_si ipmi_devintf ipmi_msghandler acpi_pad pfr_telemetry pfr_update dm_multipath msr efi_pstore nfnetlink ip_tables x_tables autofs4 btrfs blake2b_generic raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor raid6_pq libcrc32c raid1 raid0 crct10dif_pclmul crc32_pclmul polyval_clmulni nvme polyval_generic ghash_clmulni_intel sha256_ssse3 nvme_core ahci [2396717.339091] sha1_ssse3 xhci_pci igc nvme_auth libahci xhci_pci_renesas aesni_intel crypto_simd cryptd [2396717.339096] CPU: 59 PID: 2508827 Comm: vhost-2508808 Tainted: G W 6.8.0-1004-intel #11-Ubuntu [2396717.339097] Hardware name: Intel Corporation ArcherCity/ArcherCity, BIOS EGSDCRB1.SYS.0105.D48.2308140026 08/14/2023 [2396717.339098] RIP: 0010:handle_removed_pt+0x304/0x340 [kvm] [2396717.339150] Code: 48 0f 44 f8 e9 c9 fd ff ff 41 0f b6 57 24 49 8b 4f 48 49 8b 77 28 48 8b 7d b8 83 e2 0f e8 24 5f 19 00 85 c0 0f 84 37 ff ff ff <0f> 0b 49 c7 47 48 00 00 00 00 e9 28 ff ff ff 48 8d 7a ff e9 90 fd [2396717.339152] RSP: 0018:ff58ac41054df8d0 EFLAGS: 00010282 [2396717.339154] RAX: 00000000fffffffb RBX: 0000000000001000 RCX: 0000000000000000 [2396717.339155] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000 [2396717.339155] RBP: ff58ac41054df930 R08: 0000000000000000 R09: 0000000000000000 [2396717.339157] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000001 [2396717.339157] R13: 0000000000000000 R14: 0000000000180200 R15: ff1b9d6a8f39d648 [2396717.339158] FS: 0000000000000000(0000) GS:ff1b9da8fab80000(0000) knlGS:0000000000000000 [2396717.339159] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [2396717.339160] CR2: 000000c001c99010 CR3: 0000004d50f5c003 CR4: 0000000000f73ef0 [2396717.339161] PKRU: 55555554 [2396717.339162] Call Trace: [2396717.339162] [2396717.339163] ? show_regs+0x6d/0x80 [2396717.339166] ? warn+0x89/0x160 [2396717.339169] ? handle_removed_pt+0x304/0x340 [kvm] [2396717.339219] ? report_bug+0x17e/0x1b0 [2396717.339223] ? handle_bug+0x51/0xa0 [2396717.339225] ? exc_invalid_op+0x18/0x80 [2396717.339226] ? asm_exc_invalid_op+0x1b/0x20 [2396717.339228] ? handle_removed_pt+0x304/0x340 [kvm] [2396717.339276] handle_changed_spte+0x5e2/0x850 [kvm] [2396717.339323] handle_removed_pt+0x1b1/0x340 [kvm] [2396717.339422] handle_changed_spte+0x5e2/0x850 [kvm] [2396717.339470] tdp_mmu_set_spte+0x111/0x240 [kvm] [2396717.339516] tdp_mmu_zap_root+0x1ee/0x210 [kvm] [2396717.339563] kvm_tdp_mmu_zap_all+0x3e/0x90 [kvm] [2396717.339611] kvm_arch_flush_shadow_all+0x103/0x110 [kvm] [2396717.339665] kvm_mmu_notifier_release+0x2f/0x60 [kvm] [2396717.339704] mmu_notifier_release+0x7b/0x200 [2396717.339709] exit_mmap+0x3a2/0x3e0 [2396717.339715] mmput+0x41/0x140 [2396717.339717] mmput+0x31/0x40 [2396717.339718] exit_mm+0xbe/0x130 [2396717.339721] do_exit+0x273/0x530 [2396717.339724] vhost_task_fn+0xc6/0xd0 [2396717.339727] ? pfx_vhost_task_fn+0x10/0x10 [2396717.339729] ret_from_fork+0x44/0x70 [2396717.339732] ? pfx_vhost_task_fn+0x10/0x10 [2396717.339733] ret_from_fork_asm+0x1b/0x30 [2396717.339735] RIP: 0033:0x0 [2396717.339748] Code: Unable to access opcode bytes at 0xffffffffffffffd6. [2396717.339749] RSP: 002b:0000000000000000 EFLAGS: 00000246 ORIG_RAX: 0000000000000010 [2396717.339750] RAX: 0000000000000000 RBX: 000000000000001d RCX: 000077721ff24ded [2396717.339751] RDX: 0000000000000000 RSI: 000000000000af01 RDI: 000000000000001d [2396717.339752] RBP: 00007fff1ee92d80 R08: 00007fff1ee92e70 R09: 0000000000000000 [2396717.339752] R10: 0000000000000001 R11: 0000000000000246 R12: 00007fff1ee92e70 [2396717.339753] R13: 000055e55bf810f8 R14: 0000000000000000 R15: 000055e55e9e65b8 [2396717.339754] [2396717.339755] ---[ end trace 0000000000000000 ]--- [2396717.404826] ------------[ cut here ]------------ [2396717.404828] WARNING: CPU: 187 PID: 2508827 at arch/x86/kvm/mmu/tdp_mmu.c:395 handle_removed_pt+0x304/0x340 [kvm] [2396717.404881] Modules linked in: vxlan ip6_udp_tunnel udp_tunnel xt_nat xt_statistic xt_recent xt_mark xt_comment joydev input_leds hid_generic usbhid hid vsock_loopback vsock_diag tcp_diag udp_diag raw_diag inet_diag unix_diag veth vhost_vsock vmw_vsock_virtio_transport_common vsock ib_core vhost_net vhost vhost_iotlb tap tls xt_CHECKSUM ipt_REJECT nf_reject_ipv4 xt_conntrack xt_tcpudp xt_MASQUERADE nf_conntrack_netlink xfrm_user xfrm_algo nft_chain_nat xt_addrtype nf_nat nft_compat nf_conntrack nf_defrag_ipv6 br_netfilter nf_defrag_ipv4 nf_tables bridge stp llc overlay binfmt_misc kvm_intel nls_iso8859_1 kvm irqbypass ipmi_ssif wmi acpi_ipmi ipmi_si ipmi_devintf ipmi_msghandler acpi_pad pfr_telemetry pfr_update dm_multipath msr efi_pstore nfnetlink ip_tables x_tables autofs4 btrfs blake2b_generic raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor raid6_pq libcrc32c raid1 raid0 crct10dif_pclmul crc32_pclmul polyval_clmulni nvme polyval_generic ghash_clmulni_intel sha256_ssse3 nvme_core ahci [2396717.404913] sha1_ssse3 xhci_pci igc nvme_auth libahci xhci_pci_renesas aesni_intel crypto_simd cryptd [2396717.404918] CPU: 187 PID: 2508827 Comm: vhost-2508808 Tainted: G W 6.8.0-1004-intel #11-Ubuntu [2396717.404919] Hardware name: Intel Corporation ArcherCity/ArcherCity, BIOS EGSDCRB1.SYS.0105.D48.2308140026 08/14/2023 [2396717.404920] RIP: 0010:handle_removed_pt+0x304/0x340 [kvm] [2396717.404965] Code: 48 0f 44 f8 e9 c9 fd ff ff 41 0f b6 57 24 49 8b 4f 48 49 8b 77 28 48 8b 7d b8 83 e2 0f e8 24 5f 19 00 85 c0 0f 84 37 ff ff ff <0f> 0b 49 c7 47 48 00 00 00 00 e9 28 ff ff ff 48 8d 7a ff e9 90 fd [2396717.404966] RSP: 0018:ff58ac41054df8d0 EFLAGS: 00010282 [2396717.404967] RAX: 00000000fffffffb RBX: 0000000000001000 RCX: 0000000000000000 [2396717.404968] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000 [2396717.404969] RBP: ff58ac41054df930 R08: 0000000000000000 R09: 0000000000000000 [2396717.404970] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000001 [2396717.404970] R13: 0000000000000000 R14: 00000000001c0200 R15: ff1b9d6a8f39ca10 [2396717.404971] FS: 0000000000000000(0000) GS:ff1b9da8fbb80000(0000) knlGS:0000000000000000 [2396717.404972] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [2396717.404973] CR2: 000000c000226000 CR3: 000000adfa83c003 CR4: 0000000000f73ef0 [2396717.404974] PKRU: 55555554 [2396717.404974] Call Trace: [2396717.404975] [2396717.404977] ? show_regs+0x6d/0x80 [2396717.404979] ? warn+0x89/0x160 [2396717.404982] ? handle_removed_pt+0x304/0x340 [kvm] [2396717.405028] ? report_bug+0x17e/0x1b0 [2396717.405031] ? handle_bug+0x51/0xa0 [2396717.405032] ? exc_invalid_op+0x18/0x80 [2396717.405034] ? asm_exc_invalid_op+0x1b/0x20 [2396717.405036] ? handle_removed_pt+0x304/0x340 [kvm] [2396717.405079] handle_changed_spte+0x5e2/0x850 [kvm] [2396717.405122] handle_removed_pt+0x1b1/0x340 [kvm] [2396717.405164] handle_changed_spte+0x5e2/0x850 [kvm] [2396717.405207] tdp_mmu_set_spte+0x111/0x240 [kvm] [2396717.405250] tdp_mmu_zap_root+0x1ee/0x210 [kvm] [2396717.405293] kvm_tdp_mmu_zap_all+0x3e/0x90 [kvm] [2396717.405336] kvm_arch_flush_shadow_all+0x103/0x110 [kvm] [2396717.405395] kvm_mmu_notifier_release+0x2f/0x60 [kvm] [2396717.405432] mmu_notifier_release+0x7b/0x200 [2396717.405436] exit_mmap+0x3a2/0x3e0 [2396717.405440] mmput+0x41/0x140 [2396717.405442] mmput+0x31/0x40 [2396717.405443] exit_mm+0xbe/0x130 [2396717.405445] do_exit+0x273/0x530 [2396717.405448] vhost_task_fn+0xc6/0xd0 [2396717.405450] ? pfx_vhost_task_fn+0x10/0x10 [2396717.405451] ret_from_fork+0x44/0x70 [2396717.405453] ? pfx_vhost_task_fn+0x10/0x10 [2396717.405455] ret_from_fork_asm+0x1b/0x30 [2396717.405456] RIP: 0033:0x0 [2396717.405467] Code: Unable to access opcode bytes at 0xffffffffffffffd6. [2396717.405467] RSP: 002b:0000000000000000 EFLAGS: 00000246 ORIG_RAX: 0000000000000010 [2396717.405469] RAX: 0000000000000000 RBX: 000000000000001d RCX: 000077721ff24ded [2396717.405469] RDX: 0000000000000000 RSI: 000000000000af01 RDI: 000000000000001d [2396717.405470] RBP: 00007fff1ee92d80 R08: 00007fff1ee92e70 R09: 0000000000000000 [2396717.405471] R10: 0000000000000001 R11: 0000000000000246 R12: 00007fff1ee92e70 [2396717.405471] R13: 000055e55bf810f8 R14: 0000000000000000 R15: 000055e55e9e65b8 [2396717.405473] [2396717.405474] ---[ end trace 0000000000000000 ]--- [2396717.654859] audit: type=1400 audit(1719973016.454:355): apparmor="STATUS" operation="profile_remove" profile="unconfined" name="libvirt-f72db2ad-0398-4fdf-b4e1-271bdc7bb08f" pid=2798486 comm="apparmor_parser"

syncronize-issues-to-jira[bot] commented 4 months ago

Thank you for reporting us your feedback!

The internal ticket has been created: https://warthogs.atlassian.net/browse/PEK-817.

This message was autogenerated

bktan8 commented 4 months ago

@ruomengh - Can you please please provide steps to reproduce?

ruomengh commented 4 months ago

@ruomengh - Can you please please provide steps to reproduce?

Run the getting quote tool in parallel. In my case, I use Linux parallel to trigger 100 requests

bktan8 commented 1 week ago

Hi @ruomengh - Have you tried the latest TDX module and see if this resolves the issue?