AMDESE / AMDSEV

AMD Secure Encrypted Virtualization
302 stars 87 forks source link

SEV-SNP: Attestation workflow unclear #212

Open nslvn opened 7 months ago

nslvn commented 7 months ago

Hi

First of all, thanks for the legwork and the opportunities that arise from them!

Unfortunately, I seem to be unable to get attestation to work correctly on SEV-SNP.

I believe there might be an issue with me taking information from different sources because I have not found a comprehensive example walkthrough yet on SEV-SNP in particular.

I've compiled the system from the build.sh, running the following versions: Host: 6.9.0-rc1-snp-host-f9b5bc22b945, running a Milan 7443 CPU, PSP version 1.55, loaded and active in dmesg, snphost ok shows all green Qemu: Commit c139a28810964fe90804421561fb6fb0ab2c5056 Guest (not sure if relevant): 6.7.0-snp-guest-98543c2aa649

I've created the single-file OVMF as described in #93.

Steps I'm running:

  1. First, I get the PDH from the host using sevctl export --full /tmp/pdh
  2. Second, I verify the PDH on the guest owner using sevctl verify --sev pdh.bin
  3. Then, I create the session blob using sevctl session --name myvm pdh.bin. This gives me the four files myvm_session.b64, myvm_godh.b64, myvm_tik.bin and myvm_tek.bin.
  4. I then create id and author keys using openssl ecparam -name secp384r1 -genkey -noout -out <key_name>
  5. I calculate the launch digest using the sev-snp-measure python library (slightly modified to not re-hash the kernel and initrd every time, but use a precomputed hash. The hashes are obtained using sha256sum.
    ld = snp_calc_launch_digest_from_hashes(
        "OVMF_SNP.fd", kernel_hash, initrd_hash, append_hash, num_vcpu=4, 
        cpu_type=vcpu_types.CPU_SIGS["EPYC-v4"], guest_features=0x21, vmm_type=guest.VMMType.QEMU
    )
  6. I then calculate the id_block and extract the first output line for qemu:

    idblock = id_block.snp_calc_id_block(ld, "idkey.pem", "authorkey.pem")
    qemu_param_line = idblock.splitlines()[0] #provides id-block and id-auth

    First question arises here: Does the myvm_session.b64 include the myvm_godh.b64 for SNP? When I try to pass the dh-cert-file-parameter, I get an "invalid parameter" error from qemu.

    Just starting with normal parameters, I start it as:

    ./usr/local/bin/qemu-system-x86_64 -enable-kvm -cpu EPYC-v4 -machine pc-q35-7.1 -trace kvm_sev* \
    -boot d -global isa-debugcon.iobase=0x402 -debugcon file:ovmf.log  \
    -smp 4,maxcpus=4 -m 2048M,slots=5,maxmem=10240M -no-reboot \
    -bios ./usr/local/share/qemu/OVMF_SNP.fd -netdev user,id=vmnic,hostfwd=tcp::4022-:22  \
    -device virtio-net-pci,disable-legacy=on,iommu_platform=true,netdev=vmnic,romfile= \
    -machine memory-encryption=sev0,vmport=off -object memory-backend-memfd,id=ram1,size=2048M,share=true,prealloc=false \
    -machine memory-backend=ram1 -object sev-snp-guest,id=sev0,cbitpos=51,reduced-phys-bits=1,certs-path=/tmp/myvm_session.b64,kernel-hashes=on,auth-key-enabled=on,id-block=<snip>,id-auth=<snip> \
    -kernel vmlinuz-6.7.0-snp-guest-98543c2aa649 -append "root=/dev/mapper/ubuntu--vg-ubuntu--lv ro" \
    -initrd initrd.img-6.7.0-snp-guest-98543c2aa649 \
    -nographic -vnc :0 -monitor pty -monitor unix:monitor,server,nowait
    

    (id-auth and id-block are long base64 strings, omitted for clarity) Are the steps so far correct?

    With this mode, I get the following trace output:

    Launching VM ...
    /tmp/cmdline.423763
    char device redirected to /dev/pts/7 (label compat_monitor0)
    kvm_sev_init 
    kvm_sev_snp_launch_start policy 0x30000 gosvw (null)
    kvm_sev_change_state uninit -> launch-update
    kvm_sev_snp_launch_update addr 0x72a4d3800000 gpa 0xffc00000 len 0x400000 (Normal page)
    kvm_sev_snp_launch_update addr 0x72a4d4600000 gpa 0x800000 len 0xa000 (Zero page)
    kvm_sev_snp_launch_update addr 0x72a4d460b000 gpa 0x80b000 len 0x3000 (Zero page)
    kvm_sev_snp_launch_update addr 0x72a4d460e000 gpa 0x80e000 len 0x1000 (Secrets page)
    kvm_sev_snp_launch_update addr 0x72a4d460f000 gpa 0x80f000 len 0x1000 (Cpuid page)
    kvm_sev_snp_launch_update addr 0x72a4d4610000 gpa 0x810000 len 0x1000 (Zero page)
    kvm_sev_snp_launch_update addr 0x72a4d4611000 gpa 0x811000 len 0xf000 (Zero page)
    kvm_sev_snp_launch_finish id_block <snip> id_auth <snip> host_data (null)
    qemu-system-x86_64: SNP_LAUNCH_FINISH ret=-5 fw_error=11 'Bad measurement'

    Do you see an error in the steps that lead to this bad measurement? Is there a way to determine which element (policy, hashes, ...) triggers the bad measurement?

    As a note, it seems to be the case that setting kernel-hashes=on without specifying the -kernel argument results in an assertion on kernel_hashes_data. This is logical (as there cannot be any hashes), but may be confusing and profit from an earlier check (kernel-hashes=on should not be allowed without -kernel.

    Is there any reference implementation out there that does this so I could compare with?

nslvn commented 7 months ago

Hi

I've since cycled through a variety of settings, cross-referenced with direct script invocations of sev-snp-measure, created a fresh, upstream guest kernel (6.9rc) and initrd, tried both guest policies 0x21 (the infamous DEBUG_SWAP) and 0x1 and still could not get direct boot to work.

I've sifted through the SNP ABI, which does not list SEV_RET_BAD_MEASUREMENT in the status code table[^1], but mentions in the text that

If ID_BLOCK_EN is 1, the firmware checks that the LD field of the ID block is equal to GCTX.LD. If not, the firmware returns BAD_MEASUREMENT. The firmware then checks that the POLICY field of the ID block is equal to GCTX.Policy. If not, the firmware returns POLICY_FAILURE. The firmware then validates the signature of the ID block using the ID public key. If AUTH_KEY_EN is also 1, the firmware validates the signature of the ID key using the Author public key. If either signature fails to validate, the firmware returns BAD_SIGNATURE.

Therefore, the issue must be in the LD calculation. However, the id-blocks created by sev-snp-measure do not work either (the look the same, but the signatures vary (of course) due to the added randomness).

From what I've understood reading the code: The id-block contains a pre-measurement, including, among many other things, the hashes of kernel and initrd. Those hashes are SHA256, whereas everything else in SNP is SHA384. Those hashes are also independent of the target load address. To calculate the LD, both qemu and pre-measurement tools inject the SHA256-hashes into pre-determined locations in a suitable OVMF image. QEMU then creates a hash-chain (SHA384) of the pages, their contents, and their GPA (guest-physical address) to verify correctness[^2][^3]. Do I understand correctly that the idea is that the kernel, initrd and cmdline hashes are injected into the firmware memory when loading, such that the content measurements will include the hashes of the other parts and fail - i.e. the PSP will just consider the kernel and initrd hashes as blobs, and - given changes to both qemu and the premeasurement calculation, any hashing function could be used?

To summarize, am I correct that the problem is not related to any runtime interactions or potential bugs in the OVMF, but must be solely located in either the sev-snp-measure tool, qemu, or the "user problem" (me using the tools incorrectly)?

/EDIT: I was able to verify that the kernel, initrd, and cmdline hashes agree between qemu and sev-snp-measure.

[^1]: Feature request: Add numeric values to the status codes or add a lookup table to the ABI. Currently, it seems that they need to be retrieved from linux/psp-sev.h, as they are not listed in the SNP ABI specification document. [^2]: sev-snp-measure calculates this here: https://github.com/virtee/sev-snp-measure/blob/main/sevsnpmeasure/gctx.py#L51 [^3]: QEMU here in sev_snp_launch_update: https://github.com/AMDESE/qemu/blob/snp-latest/target/i386/sev.c#L810, the actual kernel hashes (SHA256) here: https://github.com/AMDESE/qemu/blob/snp-latest/target/i386/sev.c#L1694

larrydewey commented 7 months ago

I believe there might be an issue with me taking information from different sources because I have not found a comprehensive example walkthrough yet on SEV-SNP in particular.

We have a guide which is in-progress for being published. That being said, there are two general workflows:

Standard Attestation

With standard attestation a guest owner will use software within the guest to retrieve an attestation report from the AMD Secure processor. Taking the evidence produced by that piece of software (something like snpguest), the Guest owner may request the AMD certificate chain from the AMD Key Distribution Server (KDS). Then, on a trusted platform (not within the guest which generated the report), the guest owner may use additional software (something like snpguest) to attest the guest meets the security policies required.

standard-attestation

Following is the attestation process being performed by the Guest owner (handled by snpguest):

SNP_Attestion_Workflow

Extended Attestation

While the standard attestation workflow works well for small deployments, it struggle to scale for larger deployments because the KDS implements rate-limiting. To mitigate this, we created the Extended Attestation workflow, which provides a mechanism for platform owners to store certificate chains in a cache. Because the certificates have been cryptographically signed by AMD, when the guest owner retrieves these certificates from the cache, the trustworthiness of the certificates may be verified.

In the extended attestation flow, the platform owner will use software to generate a Guest Hypervisor Communication Block compliant binary containing the certificates the platform owner would like to provide for the guest to request (something like snphost provides this capability) and forwards this binary to their VMM. The VMM will utilize an IOCTL to store these certificates. The guest owner will then utilize software to retrieve this certificate binary from the cache, instead of calling out to the KDS (again, snpguest provides this functionality). The rest of the attestation process would remain the same.

extended-attestation

Following is the attestation process being performed by the Guest owner (handled by snpguest):

SNP_Extended_Attestion_Workflow

All of this being said, we have still noticed a discrepancy with the latest kernel version we released and the measurement being calculated by sev-snp-measure. I am digging into this more today, and will hopefully have more of an answer as to why the measurement isn't matching.

One additional question I had, though, was are you looking to run legacy SEV, or SEV-SNP? In your example, you mentioned you were generating a PDH, etc, which is part of the legacy SEV workflow, and is not included with SEV-SNP.

nslvn commented 7 months ago

Hi

First of all, thanks for the very clear and simple workflow description!

I'm looking to run SEV-SNP - I was erroneously assuming (perhaps due to the similar naming, perhaps because of reading resources referring simply to "SEV") that SEV-SNP was an evolution of SEV-ES - i.e. SEV-ES+Integrity. However, from your description, I gather that SNP and SEV-ES share no part of the protocol, except perhaps the process of retrieving certificates from the KDS. Perhaps, out of curiosity: Was the previous protocol incompatible with required guarantees or are there other benefits in the new protocol that the old protocol could not provide?

Just to verify: This also means that there is no longer a way to add guest-owner provided secrets at launch, and the idea is instead to authenticate the VM first and then negotiate a key (e.g. by having snpguest include a public key in the signed attestation report)?

What is the role of providing a measurement at launch through the id-block in the attestation process? From the diagrams you provided, it seems that this process requires a booted and running VM, but the ID-block check seems to fail before calling VMRUN.

Thank you and let me know if I can help localizing the issue!

larrydewey commented 6 months ago

Hi

First of all, thanks for the very clear and simple workflow description!

Of course, it was my pleasure :slightly_smiling_face: !

I'm looking to run SEV-SNP - I was erroneously assuming (perhaps due to the similar naming, perhaps because of reading resources referring simply to "SEV") that SEV-SNP was an evolution of SEV-ES - i.e. SEV-ES+Integrity. However, from your description, I gather that SNP and SEV-ES share no part of the protocol, except perhaps the process of retrieving certificates from the KDS. Perhaps, out of curiosity: Was the previous protocol incompatible with required guarantees or are there other benefits in the new protocol that the old protocol could not provide?

There are definitely more improvements to security in the newer protocol. SEV-SNP utilizes SEV-ES in its underlying technology, so you get all of the benefits of SEV-ES without as much work to get it working.

Just to verify: This also means that there is no longer a way to add guest-owner provided secrets at launch, and the idea is instead to authenticate the VM first and then negotiate a key (e.g. by having snpguest include a public key in the signed attestation report)?

Actually, the guest-owner may provide a nonce which will be included in the attestation report. I would recommend reading over section 6 which documents how to include 64 bytes of guest-owner-supplied data in your report.

What is the role of providing a measurement at launch through the id-block in the attestation process? From the diagrams you provided, it seems that this process requires a booted and running VM, but the ID-block check seems to fail before calling VMRUN.

The diagrams above do not include the Identity Block workflow. Essentially, utilizing the ID-block will provide the pre-launch attestation you received through the legacy SEV/SEV-ES workflow.

Thank you and let me know if I can help localizing the issue!

Will do! We actually have a PR opened for sev-snp-measure to add some floating-point registers which were recently adjusted in the VMSA of the latest version of the AMD kernel patches. We have tested the patch, and it seems to address the mis-matching measurement.

nslvn commented 6 months ago

Hi

Actually, the guest-owner may provide a nonce which will be included in the attestation report.

If I understand this section correctly, this is from within the SNP-enabled guest?

I'm referring to mechanics as described in the non-SNP specification, where it is possible to provide the guest with secrets at launch using LAUNCH_SECRET [^1]: image If I understand correctly, with SNP lacking the PDH-mechanics, the new idea to pass secrets into a VM is to manage (potentially DH) key exchange from within the VM itself and establish a secure channel to load secrets. I'm currently trying to implement SNP into a high-churn systems, where VMs are rather short-lived, and I'm wondering if there was a way to minimize the time from launch to attestation to bootstrap a larger system, e.g. to provide disk decryption keys only to attested instances at the earliest opportunity. Currently, it seems that the earliest possible point to do this in SNP would be within OVMF or, for convenience reasons, in the initramfs. If I am reading the API correctly, the DH-exchange previously happened before guest boot, enabling secret injection into the efivars, therefore not requiring guest-level negotiation. The new model is much more flexible, of course, but the "secure disk" problem seems a bit more involved. I assume that the idea is to replace the "secret efivars" with the VMPL and vTPMs, which are unfortunately not a good solution in my use case. But maybe I just need to wrap my head around this, first.

Will do! We actually have a https://github.com/virtee/sev-snp-measure/pull/48 for sev-snp-measure to add some floating-point registers which were recently adjusted in the VMSA of the latest version of the AMD kernel patches. We have tested the patch, and it seems to address the mis-matching measurement.

Thanks for finding this issue, I wouldn't even know where to start with this! I've applied your PR locally to a fresh checkout of sev-snp-measure. ~~Unfortunately, it does not seem to change the output in my case. Maybe I should try to reproduce your tests, it must be a stupid mistake on my side. Can you share how you are testing the launch? Are you also using the direct kernel launch in QEMU or are you testing with a launch from disk? Are you using the snp-latest or one of the newer wip branches?~~ It was, in fact, a stupid user error: I was using the debug_swap guest-feature flags (0x21/33) instead of the new-default-again (0x1/1). I'm happy to report that both id-block checks and attestation from within-guest work correctly again, thanks!

[^1]: Secure Encrypted Virtualization API