canonical / tdx

Intel confidential computing - TDX
GNU General Public License v3.0
99 stars 40 forks source link

TD VM reboot with `virsh reboot` is not working #233

Closed hector-cao closed 1 month ago

hector-cao commented 1 month ago

Describe the bug

When we use the tdvirsh tool that comes along with this repo, we can successfully reboot the TD by issuing tdvirsh reboot <domain-id>.

However, if we use the traditionnal virsh tool, the TD does not reboot well and gets destroyed (disappears from virsh list --all)

We can see in the libvirtd log an error message:

Sep 25 07:49:18 corsair-741103 libvirtd[2486868]: unsupported configuration: Security driver model '(null)' is not available

To Reproduce

1) Use virsh create <conf> to create a TD The conf file is an instanciation of the template file guest-tools/trust_domain.xml.template with the variables DOMAIN and OVERLAY_IMG_PATH set to appropriate values. 2) Use virsh list --all to confirm that the TD has been successfully created and is in running state 3) Use virsh console <id> to wait for the TD to fully boot up (login prompt) 4) Use virsh reboot <domain-id> to reboot the TD 5) Check that the TD disappears using virsh list --all

NB: To reproduce the issue, it is important to wait for the TD to fully boot up (by using the step 3)

Expected behavior

virsh reboot would successfully reboot the TD

System report

Git ref

54efe7b5d4408ad1a5b7e35ae2292b5328c0ae1d

Operating system details

Distributor ID: Ubuntu
Description:    Ubuntu 24.04 LTS
Release:    24.04
Codename:   noble

Kernel version

6.8.0-1010-intel #17-Ubuntu SMP PREEMPT_DYNAMIC Fri Aug  9 10:21:48 UTC 2024 x86_64 x86_64 GNU/Linux

TDX kernel logs

[    1.585899] virt/tdx: BIOS enabled: private KeyID range [64, 128)
[    1.585903] virt/tdx: Disable ACPI S3. Turn off TDX in the BIOS to use ACPI S3.
[   11.796965] virt/tdx: TDX module: attributes 0x0, vendor_id 0x8086, major_version 1, minor_version 5, build_date 20240129, build_num 698
[   11.796968] virt/tdx: CMR: [0x100000, 0x77800000)
[   11.797571] virt/tdx: CMR: [0x100000000, 0x107a000000)
[   11.797726] virt/tdx: CMR: [0x1080000000, 0x207c000000)
[   11.797874] virt/tdx: CMR: [0x2080000000, 0x307c000000)
[   11.798013] virt/tdx: CMR: [0x3080000000, 0x407c000000)
[   13.031110] virt/tdx: 1050644 KB allocated for PAMT
[   13.031606] virt/tdx: module initialized

TDX CPU instruction support

CPU supports TDX according to /proc/cpuinfo

Model specific registers (MSRs)

MK_TME_ENABLED bit: 1 (expected value: 1)
SEAM_RR bit: 1 (expected value: 1)
NUM_TDX_PRIV_KEYS: 40
SGX_AND_MCHECK_STATUS: 0 (expected value: 0)
Production platform: Pre-production (expected value: Production)

CPU details

 INTEL(R) XEON(R) PLATINUM 8592+

QEMU package details

Status: Installed
Package: qemu-system-x86
Version: 1:8.2.2+ds-0ubuntu2+tdx1.0
APT-Sources: https://ppa.launchpadcontent.net/kobuk-team/tdx/ubuntu noble/main amd64 Packages

Libvirt package details

Status: Installed
Package: libvirt-clients
Version: 10.0.0-2ubuntu8.3+tdx1.1
APT-Sources: https://ppa.launchpadcontent.net/kobuk-team/tdx/ubuntu noble/main amd64 Packages

OVMF package details

Status: Installed
Package: ovmf
Version: 2024.02-3+tdx1.0
APT-Sources: https://ppa.launchpadcontent.net/kobuk-team/tdx-release/ubuntu noble/main amd64 Packages

sgx-dcap-pccs package details

Status: Installed
Package: sgx-dcap-pccs
Version: 1.21-0ubuntu1
APT-Sources: https://ppa.launchpadcontent.net/kobuk-team/tdx-attestation/ubuntu noble/main amd64 Packages

tdx-qgs package details

Status: Installed
Package: tdx-qgs
Version: 1.21-0ubuntu2.1
APT-Sources: https://ppa.launchpadcontent.net/kobuk-team/tdx-attestation-release/ubuntu noble/main amd64 Packages

sgx-ra-service package details

Status: Installed
Package: sgx-ra-service
Version: 1.21-0ubuntu2.1
APT-Sources: https://ppa.launchpadcontent.net/kobuk-team/tdx-attestation-release/ubuntu noble/main amd64 Packages
Description: Intel(R) Software Guard Extensions Multi-Package Registration Agent Service

sgx-pck-id-retrieval-tool package details

Status: Installed
Package: sgx-pck-id-retrieval-tool
Version: 1.21-0ubuntu2.1
APT-Sources: https://ppa.launchpadcontent.net/kobuk-team/tdx-attestation-release/ubuntu noble/main amd64 Packages

QGSD service status

● qgsd.service - Intel(R) TD Quoting Generation Service
     Loaded: loaded (/usr/lib/systemd/system/qgsd.service; enabled; preset: enabled)
     Active: active (running) since Mon 2024-09-23 20:50:01 UTC; 1 day 12h ago
   Main PID: 1413025 (qgs)
      Tasks: 5 (limit: 308481)
     Memory: 648.0K (peak: 3.0M)
        CPU: 141ms
     CGroup: /system.slice/qgsd.service
             └─1413025 /usr/bin/qgs

Sep 23 20:50:01 corsair-741103 systemd[1]: Starting qgsd.service - Intel(R) TD Quoting Generation Service...
Sep 23 20:50:01 corsair-741103 systemd[1]: Started qgsd.service - Intel(R) TD Quoting Generation Service.
Sep 23 20:50:01 corsair-741103 qgsd[1413025]: Added signal handler
Sep 23 20:50:01 corsair-741103 qgsd[1413025]: About to create QgsServer with num_thread = 4
Sep 23 20:50:01 corsair-741103 qgsd[1413025]: About to start main loop

PCCS service status

● pccs.service - Provisioning Certificate Caching Service (PCCS)
     Loaded: loaded (/usr/lib/systemd/system/pccs.service; enabled; preset: enabled)
     Active: active (running) since Wed 2024-09-18 13:14:27 UTC; 6 days ago
       Docs: https://github.com/intel/SGXDataCenterAttestationPrimitives/blob/master/QuoteGeneration/pccs/README.md
   Main PID: 614782 (node)
      Tasks: 15 (limit: 308481)
     Memory: 54.7M (peak: 60.1M)
        CPU: 13.567s
     CGroup: /system.slice/pccs.service
             └─614782 /usr/bin/node /opt/intel/sgx-dcap-pccs/pccs_server.js

Sep 25 01:00:01 corsair-741103 node[614782]: 2024-09-25 01:00:01.002 [info]: Request-ID is : 78cedcc571f24830889dbfe344f95584
Sep 25 01:00:01 corsair-741103 node[614782]: 2024-09-25 01:00:01.309 [info]: Request-ID is : 70f2e5b4db564b30a2efce4785b60152
Sep 25 01:00:01 corsair-741103 node[614782]: 2024-09-25 01:00:01.650 [info]: Request-ID is : a046b36074ca4d159d09a8aca48398ee
Sep 25 01:00:01 corsair-741103 node[614782]: 2024-09-25 01:00:01.991 [info]: Request-ID is : 53608061882c4e97843cdfd1efad35e1
Sep 25 01:00:02 corsair-741103 node[614782]: 2024-09-25 01:00:02.340 [info]: Request-ID is : 40307a46db974596adfff0c9b64f84ca
Sep 25 01:00:02 corsair-741103 node[614782]: 2024-09-25 01:00:02.664 [info]: Request-ID is : 725817f022204a7c93feb0d7cfea5e73
Sep 25 01:00:02 corsair-741103 node[614782]: 2024-09-25 01:00:02.992 [info]: Request-ID is : 90c9c52e188048f5995bacfddb9721d7
Sep 25 01:00:03 corsair-741103 node[614782]: 2024-09-25 01:00:03.346 [info]: Request-ID is : 7a47f576663e412d94e3ad51b5c76cb5
Sep 25 01:00:03 corsair-741103 node[614782]: 2024-09-25 01:00:03.661 [info]: Request-ID is : 92404bcf76b24ebdaa4a7f3186fe4a4f
Sep 25 01:00:03 corsair-741103 node[614782]: 2024-09-25 01:00:03.760 [info]: Scheduled cache refresh is completed successfully.

MPA registration logs (last 30 lines)

[17-09-2024 08:03:21] ERROR: Registration Flow - Registration status indicates registration is completed unsuccessfully, and the error code is 165. 
[17-09-2024 08:03:21] INFO: Finished Registration Agent Flow.
[17-09-2024 08:03:50] INFO: SGX Registration Agent version: 1.21.100.3
[17-09-2024 08:03:50] INFO: Starts Registration Agent Flow.
[17-09-2024 08:03:50] ERROR: Registration Flow - Registration status indicates registration is completed unsuccessfully, and the error code is 165. 
[17-09-2024 08:03:50] INFO: Finished Registration Agent Flow.
[17-09-2024 08:04:06] INFO: SGX Registration Agent version: 1.21.100.3
[17-09-2024 08:04:06] INFO: Starts Registration Agent Flow.
[17-09-2024 08:04:06] ERROR: Registration Flow - Registration status indicates registration is completed unsuccessfully, and the error code is 165. 
[17-09-2024 08:04:06] INFO: Finished Registration Agent Flow.
[17-09-2024 08:04:22] INFO: SGX Registration Agent version: 1.21.100.3
[17-09-2024 08:04:22] INFO: Starts Registration Agent Flow.
[17-09-2024 08:04:22] ERROR: Registration Flow - Registration status indicates registration is completed unsuccessfully, and the error code is 165. 
[17-09-2024 08:04:22] INFO: Finished Registration Agent Flow.
[17-09-2024 08:43:16] INFO: SGX Registration Agent version: 1.21.100.3
[17-09-2024 08:43:16] INFO: Starts Registration Agent Flow.
[17-09-2024 08:43:16] ERROR: Registration Flow - Registration status indicates registration is completed unsuccessfully, and the error code is 165. 
[17-09-2024 08:43:16] INFO: Finished Registration Agent Flow.
[17-09-2024 08:43:31] INFO: SGX Registration Agent version: 1.21.100.3
[17-09-2024 08:43:31] INFO: Starts Registration Agent Flow.
[17-09-2024 08:43:31] ERROR: Registration Flow - Registration status indicates registration is completed unsuccessfully, and the error code is 165. 
[17-09-2024 08:43:31] INFO: Finished Registration Agent Flow.
[17-09-2024 08:43:45] INFO: SGX Registration Agent version: 1.21.100.3
[17-09-2024 08:43:45] INFO: Starts Registration Agent Flow.
[17-09-2024 08:43:45] ERROR: Registration Flow - Registration status indicates registration is completed unsuccessfully, and the error code is 165. 
[17-09-2024 08:43:45] INFO: Finished Registration Agent Flow.
[17-09-2024 08:44:01] INFO: SGX Registration Agent version: 1.21.100.3
[17-09-2024 08:44:01] INFO: Starts Registration Agent Flow.
[17-09-2024 08:44:01] ERROR: Registration Flow - Registration status indicates registration is completed unsuccessfully, and the error code is 165. 
[17-09-2024 08:44:01] INFO: Finished Registration Agent Flow.
syncronize-issues-to-jira[bot] commented 1 month ago

Thank you for reporting us your feedback!

The internal ticket has been created: https://warthogs.atlassian.net/browse/PEK-1286.

This message was autogenerated

hector-cao commented 1 month ago

More details after a deeper analysis:

When we create a TD with virsh create, if the seclabel is not specified in the configuration, libvirt will automatically generate seclabel tags, for example given current configuration of the Ubuntu libvirt package, we use apparmor , per consequence, we will find following tags in the TD conf (output of virsh edit <domain>):

  <seclabel type='dynamic' model='apparmor' relabel='yes'>                                                                           
    <label>libvirt-f4d02a54-6cd5-4e1a-be2b-7c5a2f3e06f4</label>                                                                      
    <imagelabel>libvirt-f4d02a54-6cd5-4e1a-be2b-7c5a2f3e06f4</imagelabel>                                                            
  </seclabel>

When we reboot the TD with virsh reboot, the seclabel tags are released, you can find here the call stack:

...
qemuProcessHandleShutdown                                                                                                            
  qemuProcessShutdownOrReboot                                                                                                        
    qemuProcessHardReboot                                                                                                            
      qemuProcessStop                                                                                                                
        qemuSecurityReleaseLabel / virSecurityManagerReleaseLabel                                                                    
          -> call hook: AppArmorReleaseSecurityLabel 

When the TD is started again, libvirts checks the security driver tags and see the apparmor seclabel has the model set to null (because it has been freed). libvirt issues the error log and declares the TD reboot fails and destroys the TD.

hector-cao commented 1 month ago

To fix the issue, we can avoid releasing the seclabel tags if we are doing a hardReboot (this reboot mode has been added especially for TDX), here is the patch:

diff --git a/src/qemu/qemu_process.c b/src/qemu/qemu_process.c
index 9471bbdb4..e1ede63c5 100644
--- a/src/qemu/qemu_process.c
+++ b/src/qemu/qemu_process.c
@@ -8785,7 +8785,10 @@ void qemuProcessStop(virQEMUDriver *driver,
         }
     }

-    qemuSecurityReleaseLabel(driver->securityManager, vm->def);
+    /** if hardReboot, do not release seclabel tags */
+    if (!priv->hardReboot) {
+        qemuSecurityReleaseLabel(driver->securityManager, vm->def);
+    }

     /* clear all private data entries which are no longer needed */
     qemuDomainObjPrivateDataClear(priv);
hector-cao commented 1 month ago

this issue has been fixed in the libvirt version 10.0.0-2ubuntu8.3+tdx1.2, closing ...