canonical / tdx

Intel confidential computing - TDX
GNU General Public License v3.0
85 stars 35 forks source link

Failed to get the quote #155

Closed matti closed 2 months ago

matti commented 3 months ago

tdx is initialized

sudo dmesg | grep -i tdx
[    0.000000] Command line: BOOT_IMAGE=/vmlinuz-6.8.0-1004-intel root=/dev/mapper/ubuntu--vg-ubuntu--lv ro kvm_intel.tdx=1 nohibernate
[    1.082568] Kernel command line: BOOT_IMAGE=/vmlinuz-6.8.0-1004-intel root=/dev/mapper/ubuntu--vg-ubuntu--lv ro kvm_intel.tdx=1 nohibernate
[    1.398899] virt/tdx: BIOS enabled: private KeyID range [32, 64)
[    1.398902] virt/tdx: Disable ACPI S3. Turn off TDX in the BIOS to use ACPI S3.
[    5.048029] systemd[1]: Hostname set to <tdx-1>.
[    6.367710] virt/tdx: TDX module: attributes 0x0, vendor_id 0x8086, major_version 1, minor_version 5, build_date 20231008, build_num 595
[    6.367714] virt/tdx: CMR: [0x100000, 0x77800000)
[    6.367716] virt/tdx: CMR: [0x100000000, 0x2076000000)
[    6.791305] virt/tdx: 525320 KB allocated for PAMT
[    6.791312] virt/tdx: module initialized

in TD guest:

root@tdx-guest:~# dmesg | grep -i tdx
[    0.000000] tdx: Guest detected
[    0.355803] process: using TDX aware idle routine
[    0.413109] Memory Encryption Features active: Intel TDX
[   19.907864] systemd[1]: Detected confidential virtualization tdx.
[   19.919713] systemd[1]: Hostname set to <tdx-guest>.
root@tdx-guest:~# mkdir -p /sys/kernel/config/tsm/report/testreport0
cat /sys/kernel/config/tsm/report/testreport0/provider
tdx_guest

this is Production hardware

root@tdx-1:~/tdx/attestation# sudo ./check-production.sh
Production

/dev/sgx_* is there

$ ls -l /dev/sgx_*
crw-rw---- 1 root sgx     10, 125 Jul  2 18:16 /dev/sgx_enclave
crw-rw---- 1 root sgx_prv 10, 126 Jul  2 18:16 /dev/sgx_provision
crw-rw---- 1 root sgx     10, 124 Jul  2 18:16 /dev/sgx_vepc

QGS service is running properly

$ sudo systemctl status qgsd
● qgsd.service - Intel(R) TD Quoting Generation Service
     Loaded: loaded (/usr/lib/systemd/system/qgsd.service; enabled; preset: enabled)
     Active: active (running) since Tue 2024-07-02 18:16:41 UTC; 39min ago
    Process: 1345 ExecStartPre=/usr/share/qgs/linksgx.sh (code=exited, status=0/SUCCESS)
    Process: 1424 ExecStart=/usr/bin/qgs (code=exited, status=0/SUCCESS)
   Main PID: 1455 (qgs)
      Tasks: 5 (limit: 153936)
     Memory: 3.9M (peak: 4.9M)
        CPU: 53ms
     CGroup: /system.slice/qgsd.service
             └─1455 /usr/bin/qgs

Jul 02 18:16:41 tdx-1 systemd[1]: Starting qgsd.service - Intel(R) TD Quoting Generation Service...
Jul 02 18:16:41 tdx-1 qgsd[1455]: Added signal handler
Jul 02 18:16:41 tdx-1 qgsd[1455]: About to create QgsServer with num_thread = 4
Jul 02 18:16:41 tdx-1 systemd[1]: Started qgsd.service - Intel(R) TD Quoting Generation Service.
Jul 02 18:16:41 tdx-1 qgsd[1455]: About to start main loop

PCCS is running properly:

$ sudo systemctl status pccs
● pccs.service - Provisioning Certificate Caching Service (PCCS)
     Loaded: loaded (/usr/lib/systemd/system/pccs.service; enabled; preset: enabled)
     Active: active (running) since Tue 2024-07-02 18:16:41 UTC; 40min ago
       Docs: https://github.com/intel/SGXDataCenterAttestationPrimitives/blob/master/QuoteGeneration/pccs/README.md
   Main PID: 1344 (node)
      Tasks: 15 (limit: 153936)
     Memory: 111.8M (peak: 120.6M)
        CPU: 2.684s
     CGroup: /system.slice/pccs.service
             └─1344 /usr/bin/node /opt/intel/sgx-dcap-pccs/pccs_server.js

Jul 02 18:16:41 tdx-1 systemd[1]: Started pccs.service - Provisioning Certificate Caching Service (PCCS).
Jul 02 18:16:43 tdx-1 node[1344]: 2024-07-02 18:16:43.215 [info]: HTTPS Server is running on: https://localhost:8081

mpa_registration_tool

$ systemctl status mpa_registration_tool
○ mpa_registration_tool.service - Intel MPA Registration
     Loaded: loaded (/usr/lib/systemd/system/mpa_registration_tool.service; enabled; preset: enabled)
     Active: inactive (dead) since Tue 2024-07-02 18:18:41 UTC; 39min ago
   Duration: 30ms
    Process: 1623 ExecStart=/usr/bin/mpa_registration (code=exited, status=0/SUCCESS)
   Main PID: 1623 (code=exited, status=0/SUCCESS)
        CPU: 7ms

Jul 02 18:18:41 tdx-1 systemd[1]: Started mpa_registration_tool.service - Intel MPA Registration.
Jul 02 18:18:41 tdx-1 systemd[1]: mpa_registration_tool.service: Deactivated successfully.

and logs

cat /var/log/mpa_registration.log
[02-07-2024 05:55:13] INFO: SGX Registration Agent version: 1.20.100.2
[02-07-2024 05:55:13] INFO: Starts Registration Agent Flow.
[02-07-2024 05:55:13] INFO: Registration Flow - Registration status indicates registration is completed successfully. MPA has nothing to do.
[02-07-2024 05:55:13] INFO: Finished Registration Agent Flow.
[02-07-2024 05:59:09] INFO: SGX Registration Agent version: 1.20.100.2
[02-07-2024 05:59:09] INFO: Starts Registration Agent Flow.
[02-07-2024 05:59:09] INFO: Registration Flow - Registration status indicates registration is completed successfully. MPA has nothing to do.
[02-07-2024 05:59:09] INFO: Finished Registration Agent Flow.
[02-07-2024 06:18:41] INFO: SGX Registration Agent version: 1.20.100.2
[02-07-2024 06:18:41] INFO: Starts Registration Agent Flow.
[02-07-2024 06:18:41] INFO: Registration Flow - Registration status indicates registration is completed successfully. MPA has nothing to do.
[02-07-2024 06:18:41] INFO: Finished Registration Agent Flow.

But then in guest "Failed to get the quote"

root@tdx-guest:~# cd /usr/share/doc/libtdx-attest-dev/examples/
./test_tdx_attest

  TDX report data

 00000000: 93 c5 f4 61 d2 31 45 ce 5e 69 eb 5e e9 42 db 0d
 00000010: 00 83 36 4a 32 89 cc 8b ff f3 74 c4 88 a6 4a 1b
 00000020: 6b 3e 7c 3d 6f c1 0c cd 2b f7 2c 14 39 07 21 3a
 00000030: 8a 57 84 bd e0 50 48 df 44 bc a3 cc 62 ed e7 ce

Wrote TD Report to report.dat

Failed to get the quote

and then on the host qgsd says:

Jul 02 18:59:18 tdx-1 qgsd[1455]: unpack message successfully in thread [77b48567d740]
Jul 02 18:59:18 tdx-1 qgsd[1455]: call tee_att_create_context
Jul 02 18:59:18 tdx-1 qgsd[1455]: create context in thread[77b484e006c0]
Jul 02 18:59:18 tdx-1 qgsd[1455]: [QCNL] Error creating directory '/var/opt/qgsd/.dcap-qcnl/'.
Jul 02 18:59:19 tdx-1 qgsd[1455]: [QPL] No certificate data for this platform.
Jul 02 18:59:19 tdx-1 qgsd[1455]: [get_platform_quote_cert_data ../td_ql_logic.cpp:319] Error returned from the p_sgx_get_quote_config API. 0xe011
Jul 02 18:59:19 tdx-1 qgsd[1455]: tee_att_init_quote return 0x11001
Jul 02 18:59:19 tdx-1 qgsd[1455]: resp_size is 0
Jul 02 18:59:19 tdx-1 qgsd[1455]: About to shutdown and close socket
Jul 02 18:59:19 tdx-1 qgsd[1455]: erased a connection, now [0]
syncronize-issues-to-jira[bot] commented 3 months ago

Thank you for reporting us your feedback!

The internal ticket has been created: https://warthogs.atlassian.net/browse/PEK-773.

This message was autogenerated

hector-cao commented 3 months ago

@matti Can you provide the pccs log after the quote generation failure please ?

hector-cao commented 2 months ago

@matti can you also provide the output of head -n 10 /proc/cpuinfo

matti commented 2 months ago
-- Boot e509f37adb6341bea9f371ad302a366a --
Jul 02 18:16:41 tdx-1 systemd[1]: Started pccs.service - Provisioning Certificate Caching Service (PCCS).
Jul 02 18:16:43 tdx-1 node[1344]: 2024-07-02 18:16:43.215 [info]: HTTPS Server is running on: https://localhost:8081
Jul 02 18:59:18 tdx-1 node[1344]: 2024-07-02 18:59:18.593 [info]: Client Request-ID : 7075bfefc5cf49438f240bbdf5421750
Jul 02 18:59:19 tdx-1 node[1344]: 2024-07-02 18:59:19.678 [info]: Request-ID is : e78f3adb736248f3b61b8938696ef0e9
Jul 02 18:59:19 tdx-1 node[1344]: 2024-07-02 18:59:19.679 [error]: Intel PCS server returns error(404).
Jul 02 18:59:19 tdx-1 node[1344]: 2024-07-02 18:59:19.681 [error]: Error: No cache data for this platform.
Jul 02 18:59:19 tdx-1 node[1344]:     at Module.getPckCertFromPCS (file:///opt/intel/sgx-dcap-pccs/services/logic/commonCacheLogic.js:88:11)
Jul 02 18:59:19 tdx-1 node[1344]:     at process.processTicksAndRejections (node:internal/process/task_queues:95:5)
Jul 02 18:59:19 tdx-1 node[1344]:     at async LazyCachingMode.getPckCertFromPCS (file:///opt/intel/sgx-dcap-pccs/services/caching_modes/cachingMode.js:126:12)
Jul 02 18:59:19 tdx-1 node[1344]:     at async Module.getPckCert (file:///opt/intel/sgx-dcap-pccs/services/pckcertService.js:115:16)
Jul 02 18:59:19 tdx-1 node[1344]:     at async getPckCert (file:///opt/intel/sgx-dcap-pccs/controllers/pckcertController.js:77:25)
Jul 02 18:59:19 tdx-1 node[1344]: 2024-07-02 18:59:19.695 [info]: 127.0.0.1 - - [02/Jul/2024:18:59:19 +0000] "GET /sgx/certification/v4/pckcert?qeid=7529208E4DE4E65A6A8B22761D95C9F4&encrypted_p>
Jul 03 01:00:00 tdx-1 node[1344]: 2024-07-03 01:00:00.886 [info]: Request-ID is : 671b0d56294347afb97cf465b08b12a5
Jul 03 01:00:01 tdx-1 node[1344]: 2024-07-03 01:00:01.493 [info]: Request-ID is : c2afc6f54a894e13800a4d2db0cc0a50
Jul 03 01:00:02 tdx-1 node[1344]: 2024-07-03 01:00:02.044 [info]: Request-ID is : 75973a0d2885469ca070c1a4e019edd8
Jul 03 01:00:02 tdx-1 node[1344]: 2024-07-03 01:00:02.570 [info]: Request-ID is : 7570358ca7aa46dd85d0fa1483a9eb9e
Jul 03 01:00:03 tdx-1 node[1344]: 2024-07-03 01:00:03.087 [info]: Request-ID is : 5931d1b5932a49319143bc74364df328
Jul 03 01:00:03 tdx-1 node[1344]: 2024-07-03 01:00:03.230 [info]: Scheduled cache refresh is completed successfully.
Jul 03 07:21:18 tdx-1 node[1344]: 2024-07-03 07:21:18.603 [info]: Client Request-ID : 57830eff780a47f197f13fc0b33f67aa
Jul 03 07:21:19 tdx-1 node[1344]: 2024-07-03 07:21:19.587 [info]: Request-ID is : d9149f310d214317bf4941a0de56d7cf
Jul 03 07:21:19 tdx-1 node[1344]: 2024-07-03 07:21:19.588 [error]: Intel PCS server returns error(404).
Jul 03 07:21:19 tdx-1 node[1344]: 2024-07-03 07:21:19.588 [error]: Error: No cache data for this platform.
Jul 03 07:21:19 tdx-1 node[1344]:     at Module.getPckCertFromPCS (file:///opt/intel/sgx-dcap-pccs/services/logic/commonCacheLogic.js:88:11)
Jul 03 07:21:19 tdx-1 node[1344]:     at process.processTicksAndRejections (node:internal/process/task_queues:95:5)
Jul 03 07:21:19 tdx-1 node[1344]:     at async LazyCachingMode.getPckCertFromPCS (file:///opt/intel/sgx-dcap-pccs/services/caching_modes/cachingMode.js:126:12)
Jul 03 07:21:19 tdx-1 node[1344]:     at async Module.getPckCert (file:///opt/intel/sgx-dcap-pccs/services/pckcertService.js:115:16)
Jul 03 07:21:19 tdx-1 node[1344]:     at async getPckCert (file:///opt/intel/sgx-dcap-pccs/controllers/pckcertController.js:77:25)
Jul 03 07:21:19 tdx-1 node[1344]: 2024-07-03 07:21:19.593 [info]: 127.0.0.1 - - [03/Jul/2024:07:21:19 +0000] "GET /sgx/certification/v4/pckcert?qeid=7529208E4DE4E65A6A8B22761D95C9F4&encrypted_p>
Jul 03 07:50:12 tdx-1 node[1344]: 2024-07-03 07:50:12.423 [info]: Client Request-ID : 0b5432203b134afbbc92e5946e1c1519
Jul 03 07:50:13 tdx-1 node[1344]: 2024-07-03 07:50:13.295 [info]: Request-ID is : 4edc9c5373544b358ede65f57abfd625
Jul 03 07:50:13 tdx-1 node[1344]: 2024-07-03 07:50:13.295 [error]: Intel PCS server returns error(404).
Jul 03 07:50:13 tdx-1 node[1344]: 2024-07-03 07:50:13.296 [error]: Error: No cache data for this platform.
Jul 03 07:50:13 tdx-1 node[1344]:     at Module.getPckCertFromPCS (file:///opt/intel/sgx-dcap-pccs/services/logic/commonCacheLogic.js:88:11)
Jul 03 07:50:13 tdx-1 node[1344]:     at process.processTicksAndRejections (node:internal/process/task_queues:95:5)
Jul 03 07:50:13 tdx-1 node[1344]:     at async LazyCachingMode.getPckCertFromPCS (file:///opt/intel/sgx-dcap-pccs/services/caching_modes/cachingMode.js:126:12)
Jul 03 07:50:13 tdx-1 node[1344]:     at async Module.getPckCert (file:///opt/intel/sgx-dcap-pccs/services/pckcertService.js:115:16)
Jul 03 07:50:13 tdx-1 node[1344]:     at async getPckCert (file:///opt/intel/sgx-dcap-pccs/controllers/pckcertController.js:77:25)
Jul 03 07:50:13 tdx-1 node[1344]: 2024-07-03 07:50:13.298 [info]: 127.0.0.1 - - [03/Jul/2024:07:50:13 +0000] "GET /sgx/certification/v4/pckcert?qeid=7529208E4DE4E65A6A8B22761D95C9F4&encrypted_p>
Jul 04 01:00:00 tdx-1 node[1344]: 2024-07-04 01:00:00.947 [info]: Request-ID is : 8b76d3da2be54c6b9a03b528fc03645e
Jul 04 01:00:01 tdx-1 node[1344]: 2024-07-04 01:00:01.556 [info]: Request-ID is : 663f490a32f14644adaf0633ae6288ed
Jul 04 01:00:02 tdx-1 node[1344]: 2024-07-04 01:00:02.074 [info]: Request-ID is : b8bc560002834b95b0fee840d10b1058
Jul 04 01:00:02 tdx-1 node[1344]: 2024-07-04 01:00:02.595 [info]: Request-ID is : 4d537397350c45f596fab426afc3a5c4
Jul 04 01:00:03 tdx-1 node[1344]: 2024-07-04 01:00:03.120 [info]: Request-ID is : 38a947203b0a43e7bf915c1db10a7f3c
Jul 04 01:00:03 tdx-1 node[1344]: 2024-07-04 01:00:03.559 [info]: Scheduled cache refresh is completed successfully.

and cpuinfo

processor   : 0
vendor_id   : GenuineIntel
cpu family  : 6
model       : 207
model name  : INTEL(R) XEON(R) SILVER 4514Y
stepping    : 2
microcode   : 0x21000230
cpu MHz     : 798.304
cache size  : 30720 KB
physical id : 0
siblings    : 32
core id     : 0
cpu cores   : 16
apicid      : 0
initial apicid  : 0
fpu     : yes
fpu_exception   : yes
cpuid level : 32
wp      : yes
flags       : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc art arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc cpuid aperfmperf tsc_known_freq pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 sdbg fma cx16 xtpr pdcm pcid dca sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm abm 3dnowprefetch cpuid_fault epb cat_l3 cat_l2 cdp_l3 tdx_host_platform intel_ppin cdp_l2 ssbd mba ibrs ibpb stibp ibrs_enhanced tpr_shadow flexpriority ept vpid ept_ad fsgsbase tsc_adjust sgx bmi1 avx2 smep bmi2 erms invpcid cqm rdt_a avx512f avx512dq rdseed adx smap avx512ifma clflushopt clwb intel_pt avx512cd sha_ni avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves cqm_llc cqm_occup_llc cqm_mbm_total cqm_mbm_local split_lock_detect user_shstk avx_vnni avx512_bf16 wbnoinvd dtherm ida arat pln pts hfi vnmi avx512vbmi umip pku ospke waitpkg avx512_vbmi2 gfni vaes vpclmulqdq avx512_vnni avx512_bitalg tme avx512_vpopcntdq la57 rdpid bus_lock_detect cldemote movdiri movdir64b enqcmd sgx_lc fsrm md_clear serialize tsxldtrk pconfig arch_lbr ibt amx_bf16 avx512_fp16 amx_tile amx_int8 flush_l1d arch_capabilities
vmx flags   : vnmi preemption_timer posted_intr invvpid ept_x_only ept_ad ept_1gb flexpriority apicv tsc_offset vtpr mtf vapic ept vpid unrestricted_guest vapic_reg vid ple shadow_vmcs pml ept_mode_based_exec tsc_scaling usr_wait_pause notify_vm_exiting ipi_virt
bugs        : spectre_v1 spectre_v2 spec_store_bypass swapgs eibrs_pbrsb tdx_pw_mce bhi
bogomips    : 4000.00
clflush size    : 64
cache_alignment : 64
address sizes   : 46 bits physical, 57 bits virtual
power management:

@hector-cao I can also arrange you guys full ssh/root access to this machine if you'd like to debug this yourself or have a screenshare or so?

hector-cao commented 2 months ago

@matti Thanks ! It would be great if we can ssh access this machine, i can see that it is Emerals Rapids (EMR)

matti commented 2 months ago

@hector-cao can you email me your ssh key matti.paksula@gmail.com

hector-cao commented 2 months ago

@matti Do you have any update for this issue ?

matti commented 2 months ago

@hector-cao so after reinstall asking if I should use main or the release from May #166

hector-cao commented 2 months ago

For the issue you are having, it is fine to use the latest release ; you should be able to generate a quote with this release

matti commented 2 months ago

@hector-cao thank you, so the missing link was that after you re-install, you need to

"If an error is reported in one of the logs, boot into the BIOS, go to Socket Configuration > Processor Configuration > Software Guard Extension (SGX), and set

SGX Factory Reset to Enabled SGX Auto MP Registration to Enabled"

as stated in the README. The problem was that this stuff is way before the quote generation and it's very easy to miss that when following the README.

So now, please see #168