confidential-containers / cloud-api-adaptor

Ability to create Kata pods using cloud provider APIs aka the peer-pods approach
Apache License 2.0
47 stars 79 forks source link

guest-components: Bump guest-components dependency #1865

Closed mkulke closed 3 months ago

mkulke commented 3 months ago

There has been a change in build flags in the way attestation-agent is built. cc_kbc is now always enabled as part of the coco-as and kbs features. a new ATTESTER Makefile flag has been introduced to pick the attesters that should be included in the attestation-agent build. By default all attesters are being built, which won't build ootb, since it's missing dependencies (e.g. sgx libraries)

For peerpods only a limited set of attesters actually make sense and usually you'd want to define it at build time for a given TEE architecture (e.g. azure vtpm or ibm se attester modules), so we default to ATTESTER=sample in most cases.

The AA_KBC param is now only used for templating the aa-kbc-params value in the podvm's static kata-agent config.

mkulke commented 3 months ago

oh my. you cannot specify a sample attester. ~you have to pick one attester that will definitely compile~, otherwise the AA makefile will assign all-attesters 😭

edit: apparently ATTESTER=none will do that (sample is implicitly included)

mkulke commented 3 months ago

hrm. the libvirt e2e test suite is still sort of flaky, sadly. I think this has to do with large kbs images, which is being addressed atm. But if it does eventually execute there is also a real problem: The new ASR will fail with statuscode 500 on policy rejections, which will make the wget process that the e2e test uses fail (not tweakable for busybox wget). Have to decide on how to fix that:

I'm leaning towards the last option. opinions?

Edit: fixing the ASR response code is not trivial, since information is lost in the RPC with CDH. trying to use curl, which tolerates a 500 response and opened an issue on GC.

stevenhorsman commented 3 months ago

hrm. the libvirt e2e test suite is still sort of flaky, sadly. I think this has to do with large kbs images, which is being addressed atm. But if it does eventually execute there is also a real problem: The new ASR will fail with statuscode 500 on policy rejections, which will make the wget process that the e2e test uses fail (not tweakable for busybox wget). Have to decide on how to fix that:

  • Add support for asserting non-0 exit codes in the e2e framework (probably better than fishing out a substring from the response and hoping this doesn't change)
  • Disable/Remove the negative test with a deny-all policy (not sure if CAA should have any business testing the policy engine of AS, there's nothing on the CAA side that you can fix if a "deny-all" policy produces a secret)
  • Fix ASR (i hope it's just that) to not produce a 500 on policy rejections. 500s should not be a valid error code in any case (sort of contradicting my previous point that this particular e2e test might not be useful 😅 )

I'm leaning towards the last option. opinions?

I'm not an expert on the attestation flow, but I guess my initial thoughts are both 1 & 3 - improving the e2e test to make it less brittle (or more brittle by combining error code and message checking!) sounds good, but improving the ASR behaviour sounds like a good improvement too.

Based on 3 being tricky we can start with 1 and then hope 3 happens later.

WRT 2 - I encouraged QiFeng to add the negative test, I'm probably just not trusting enough of other components and testing, and wanted to ensure that we hadn't misconfigured something such that we were getting false positives on the other test, but maybe that's something I need to work on personally and not project into CAA!