confidential-containers / cloud-api-adaptor

Ability to create Kata pods using cloud provider APIs aka the peer-pods approach
Apache License 2.0
44 stars 71 forks source link

attestation-agent-config: generate attestation-agent config when createVM instance #1868

Closed huoqifeng closed 5 days ago

huoqifeng commented 2 weeks ago

Fixes: #1852

huoqifeng commented 2 weeks ago

@mkulke @stevenhorsman @liudalibj @bpradipt This is still in draft, I need some time to verify the code. Idea is to generate the agent-config.toml similar as cdh.toml. described in https://github.com/confidential-containers/cloud-api-adaptor/issues/1852

huoqifeng commented 1 week ago

looks good in general. however, we were discussing in slack that we probably don't need to template the agent-config file anymore, after we have a config file for AA, since all the fields but AAKBCParams (which we wouldn't need anymore) are static and can be set at buildtime. what do you think?

@mkulke because AAKBCParams is a dynamic endpoint, which might be changed per cluster. means we need a PeerPod VM image per cluster if we build it in PeerPod VM image. Shall we make AAKBCParams configurable in configmap?

mkulke commented 1 week ago

@mkulke because AAKBCParams is a dynamic endpoint, which might be changed per cluster. means we need a PeerPod VM image per cluster if we build it in PeerPod VM image. Shall we make AAKBCParams configurable in configmap?

AAKBCParams is already a parameter in configmap (and a cli param for the CAA binary), so we don't need to do anything here.

My reasoning was:

huoqifeng commented 1 week ago

@mkulke because AAKBCParams is a dynamic endpoint, which might be changed per cluster. means we need a PeerPod VM image per cluster if we build it in PeerPod VM image. Shall we make AAKBCParams configurable in configmap?

AAKBCParams is already a parameter in configmap (and a cli param for the CAA binary), so we don't need to do anything here.

My reasoning was:

* If we have a CDH and an AA config file, we don't need AAKBCParams the agent config anymore, because it's not used.

* If we don't have an AAKbcParam in the agent config file it is completely static and the same in each deployment.

* If the file is static, we don't need a cloud-config directive for it but can include it at build time.

@mkulke which is reasonable to me, I like this approach. How does the CDH and AA use AAKBCParams which is set in k8s configmap? Is some changes happening in guest-components?

mkulke commented 1 week ago

@mkulke which is reasonable to me, I like this approach. How does the CDH and AA use AAKBCParams which is set in k8s configmap? Is some changes happening in guest-components?

we already do that for cdh.toml, for AA atm, we use agent-config.toml, which is a workaround for when there was no AA config file yet. now AA supports a config file, and we also want to use it because the ${kbc}::${uri} schema of AAKBCParams is not sufficient anymore, we also need to include a kbs certificate in the config.

huoqifeng commented 1 week ago

@mkulke which is reasonable to me, I like this approach. How does the CDH and AA use AAKBCParams which is set in k8s configmap? Is some changes happening in guest-components?

we already do that for cdh.toml, for AA atm, we use agent-config.toml, which is a workaround for when there was no AA config file yet. now AA supports a config file, and we also want to use it because the ${kbc}::${uri} schema of AAKBCParams is not sufficient anymore, we also need to include a kbs certificate in the config.

OK, my understanding is we need generate an attestation-agent.toml rather than a kata-agent.toml (named agent-config.toml today) in cloud-api-adaptor side.

mkulke commented 1 week ago

OK, my understanding is we need generate an attestation-agent.toml rather than a kata-agent.toml (named agent-config.toml today) in cloud-api-adaptor side.

yup, exactly

note: there is a nuisance when using a aa.toml file for CAA currently: an aa config file is only valid if it has a token_config entry, which only makes sense for cc_kbc, at the moment. for offline_fs_kbc you cannot specifiy --config, which means we need some machinery to do that:

/usr/bin/attestation-agent "${AA_CONFIG_FILE+--config $AA_CONFIG_FILE}"
huoqifeng commented 1 week ago

@mkulke I found we still need set cc_kbc::http://IP:Port in file /etc/agent-config.toml because confidential-data-hub is still using it. If aa_kbc_params = "offline_fs_kbc::null" set in file /etc/agent-config.toml, cdh will report error like:

Jun 20 07:52:24 podvm-nginx-nydus-54c6f8c6cc-nm6td-e21b40ee confidential-data-hub[2825]: [2024-06-20T07:52:24Z ERROR ttrpc_cdh::ttrpc_server] [ttRPC CDH] GetResource :
Jun 20 07:52:24 podvm-nginx-nydus-54c6f8c6cc-nm6td-e21b40ee confidential-data-hub[2825]:     get resource failed
Jun 20 07:52:24 podvm-nginx-nydus-54c6f8c6cc-nm6td-e21b40ee confidential-data-hub[2825]:     Caused by: Kbs client error: offline-fs-kbc: resource not found one/two/key

While it's OK if files /etc/agent-config does not exist at all. Anything we have not completed yet in confidential-data-hub configuration?

huoqifeng commented 1 week ago

etc/agent-config

Sounds like CDH lost a function - Never use /etc/agent-config if a config file provided by -c like /usr/local/bin/confidential-data-hub -c /run/confidential-containers/cdh.toml

mkulke commented 1 week ago

@mkulke I found we still need set cc_kbc::http://IP:Port in file /etc/agent-config.toml because confidential-data-hub is still using it. If aa_kbc_params = "offline_fs_kbc::null" set in file /etc/agent-config.toml, cdh will report error like:

Jun 20 07:52:24 podvm-nginx-nydus-54c6f8c6cc-nm6td-e21b40ee confidential-data-hub[2825]: [2024-06-20T07:52:24Z ERROR ttrpc_cdh::ttrpc_server] [ttRPC CDH] GetResource :
Jun 20 07:52:24 podvm-nginx-nydus-54c6f8c6cc-nm6td-e21b40ee confidential-data-hub[2825]:     get resource failed
Jun 20 07:52:24 podvm-nginx-nydus-54c6f8c6cc-nm6td-e21b40ee confidential-data-hub[2825]:     Caused by: Kbs client error: offline-fs-kbc: resource not found one/two/key

While it's OK if files /etc/agent-config does not exist at all. Anything we have not completed yet in confidential-data-hub configuration?

does CDH use the config file? In CAA we've been using the config file for CDH for a while now and it shouldn't rely on the agent-config.toml any more. This is the respective logic in CDH.

mkulke commented 1 week ago

Sounds like CDH lost a function - Never use /etc/agent-config if a config file provided by -c like /usr/local/bin/confidential-data-hub -c /run/confidential-containers/cdh.toml

where do you see that CDH is falling back to kata-agent-config?

huoqifeng commented 1 week ago

Sounds like CDH lost a function - Never use /etc/agent-config if a config file provided by -c like /usr/local/bin/confidential-data-hub -c /run/confidential-containers/cdh.toml

where do you see that CDH is falling back to kata-agent-config?

I did not find the code yet, it's confusion because I saw CDH logs like this and the function does not work if /etc/agent-config.toml exists.

[2024-06-20T09:08:13Z INFO  ttrpc_cdh] Use configuration file /run/confidential-containers/cdh.toml
[2024-06-20T09:08:13Z DEBUG attestation_agent::config::aa_kbc_params] get aa_kbc_params from file
[2024-06-20T09:08:13Z DEBUG attestation_agent::config::aa_kbc_params] reading agent config from /etc/agent-config.toml
[2024-06-20T09:08:13Z DEBUG attestation_agent::config::aa_kbc_params] get aa_kbc_params from file
[2024-06-20T09:08:13Z DEBUG attestation_agent::config::aa_kbc_params] reading agent config from /etc/agent-config.toml
[2024-06-20T09:08:13Z INFO  ttrpc_cdh] [ttRPC] Confidential Data Hub starts to listen to request: unix:///run/confidential-containers/cdh.sock
huoqifeng commented 1 week ago

Sounds like CDH lost a function - Never use /etc/agent-config if a config file provided by -c like /usr/local/bin/confidential-data-hub -c /run/confidential-containers/cdh.toml

where do you see that CDH is falling back to kata-agent-config?

I did not find the code yet, it's confusion because I saw CDH logs like this and the function does not work if /etc/agent-config.toml exists.

[2024-06-20T09:08:13Z INFO  ttrpc_cdh] Use configuration file /run/confidential-containers/cdh.toml
[2024-06-20T09:08:13Z DEBUG attestation_agent::config::aa_kbc_params] get aa_kbc_params from file
[2024-06-20T09:08:13Z DEBUG attestation_agent::config::aa_kbc_params] reading agent config from /etc/agent-config.toml
[2024-06-20T09:08:13Z DEBUG attestation_agent::config::aa_kbc_params] get aa_kbc_params from file
[2024-06-20T09:08:13Z DEBUG attestation_agent::config::aa_kbc_params] reading agent config from /etc/agent-config.toml
[2024-06-20T09:08:13Z INFO  ttrpc_cdh] [ttRPC] Confidential Data Hub starts to listen to request: unix:///run/confidential-containers/cdh.sock

Sounds like https://github.com/confidential-containers/guest-components/blob/main/confidential-data-hub/hub/src/bin/config/mod.rs#L130 https://github.com/confidential-containers/guest-components/blob/main/attestation-agent/attestation-agent/src/config/aa_kbc_params.rs#L57

mkulke commented 1 week ago

Sounds like https://github.com/confidential-containers/guest-components/blob/main/confidential-data-hub/hub/src/bin/config/mod.rs#L130 https://github.com/confidential-containers/guest-components/blob/main/attestation-agent/attestation-agent/src/config/aa_kbc_params.rs#L57

Thanks, for searching, I didn't spot it myself. I suspect this function is dead code, but I'm not 100% sure.

this fn sets AA_KBC_PARAMS + KBS_CERT envs as a side-effect and will do nothing else:

pub fn set_configuration_envs(&self) {...}

I was suspecting that image-rs is using it for encrypted images, but it doesn't seem to be used in guest-components:

/dev/guest-components (main)$ git grep AA_KBC_PARAMS
attestation-agent/attestation-agent/src/config/aa_kbc_params.rs:    if let Ok(params) = env::var("AA_KBC_PARAMS") {
confidential-data-hub/README.md:* **AA_KBC_PARAMS** environment variable
confidential-data-hub/hub/src/bin/config/mod.rs:                "AA_KBC_PARAMS",
mkulke commented 1 week ago

Sounds like https://github.com/confidential-containers/guest-components/blob/main/confidential-data-hub/hub/src/bin/config/mod.rs#L130 https://github.com/confidential-containers/guest-components/blob/main/attestation-agent/attestation-agent/src/config/aa_kbc_params.rs#L57

Thanks, for searching, I didn't spot it myself. I suspect this function is dead code, but I'm not 100% sure.

this fn sets AA_KBC_PARAMS + KBS_CERT envs as a side-effect and will do nothing else:

pub fn set_configuration_envs(&self) {...}

I was suspecting that image-rs is using it for encrypted images, but it doesn't seem to be used in guest-components:

/dev/guest-components (main)$ git grep AA_KBC_PARAMS
attestation-agent/attestation-agent/src/config/aa_kbc_params.rs:    if let Ok(params) = env::var("AA_KBC_PARAMS") {
confidential-data-hub/README.md:* **AA_KBC_PARAMS** environment variable
confidential-data-hub/hub/src/bin/config/mod.rs:                "AA_KBC_PARAMS",

Note: AA_KBC_PARAMS env does seem to be required in the constructor of CDH's kbs kms plugin.

huoqifeng commented 1 week ago

Yeah, I think which comes from the CDH config if agent-config.toml does not provide it, https://github.com/confidential-containers/guest-components/blob/main/confidential-data-hub/hub/src/bin/config/mod.rs#L133-L136, that's why I dropped it in this commit https://github.com/confidential-containers/cloud-api-adaptor/pull/1868/commits/fc0a04ef34bf93cee1fcd06a5625cbe684174967. @mkulke

huoqifeng commented 1 week ago

I tried it on s390x by building the image following as below:

  1. Build caa image
    cd src/cloud-api-adaptor
    ARCHES=linux/s390x make image
  2. Build PodVM image
    cd podvm-mkosi
    make fedora-binaries-builder
    ATTESTER=se-attester make binaries
    make image-debug
  3. Deploy a KBS service following https://github.com/confidential-containers/trustee/blob/main/attestation-service/verifier/src/se/README.md
  4. Try it in a k8s cluster by using the caa and PodVM image
  5. Create a PeerPod instance and retrieve a secret
    # kubectl exec -it nginx-nydus-54c6f8c6cc-xzxzg -- /bin/sh
    # 
    # 
    # 
    # curl http://127.0.0.1:8006/cdh/resource/one/two/key
    some random characters....
    Sat Jun 15 03:06:01 UTC 2024
    # exit
huoqifeng commented 1 week ago

@mkulke @stevenhorsman I think this PR can be reviewed now as I have tried it. May you please have a look?

huoqifeng commented 1 week ago

CDH's kbs kms plugin.

oh, I think this is a case not covered indeed -- CDH's kbs kms plugin. We need a fix in CDH side.

huoqifeng commented 1 week ago

I created an issue in cdh here https://github.com/confidential-containers/guest-components/issues/593 to track this. Sounds it's not big problem so far if we don't use KBS/KMS separately.

liudalibj commented 1 week ago

I tried it on s390x by building the image following as below:

  1. Build caa image
cd src/cloud-api-adaptor
ARCHES=linux/s390x make image
  1. Build PodVM image
cd podvm-mkosi
make fedora-binaries-builder
ATTESTER=se-attester make binaries
make image-debug

We should run export HOST_KEYS_DIR=<the_hkd_folder> run make image or before run the make image-debug we should update the Makefile, to call ../hack/build-s390x-se-image.sh for image-debug, so that we can get the SE-enabled podvm image.

huoqifeng commented 1 week ago
process-user-data.service.d

My fault, update-agent-config is not used any more because agent-config.toml is static and aa.toml, cdh.toml can be handled by provision-files. Dropped a fix.

huoqifeng commented 1 week ago

this is what we provision when we specify --aa-kbc-params offline_fs_kbc::null

cat /run/peerpod/aa.toml
[token_configs]
[token_configs.coco_as]
url = ''

[token_configs.kbs]
url = 'null'

not sure if that makes sense, it probably doesn't, but it might also not hurt at the moment. I think what we could do is just provision that file if aa-kbc-params is cc_kbc::* from the CAA side, because the config file doesn't cover anything else yet.

Indeed, aa.toml, not sure cdh.toml should follow same pattern?

huoqifeng commented 6 days ago

cc_kbc::*

With the fix in https://github.com/confidential-containers/guest-components/pull/599, I think we can leave the code as is, I will ad a log for s.aaKBCParams and pick the guest-components new commit df60725afe0ba452a25a740cf460c2855442c49a

liudalibj commented 6 days ago

I tried to build podvm ubuntu amd64 image, from this PR.

The related log from journalctl

...
Jun 25 08:40:28 ubuntu systemd[1]: Installed transient /etc/machine-id file.
Jun 25 08:40:28 ubuntu systemd[1]: multi-user.target: Found ordering cycle on attestation-agent.service/start
Jun 25 08:40:28 ubuntu systemd[1]: multi-user.target: Found dependency on cloud-final.service/start
Jun 25 08:40:28 ubuntu systemd[1]: multi-user.target: Found dependency on multi-user.target/start
Jun 25 08:40:28 ubuntu systemd[1]: multi-user.target: Job attestation-agent.service/start deleted to break ordering cycle starting with multi-user.target/start
Jun 25 08:40:28 ubuntu systemd[1]: multi-user.target: Found ordering cycle on kata-agent.service/start
Jun 25 08:40:28 ubuntu systemd[1]: multi-user.target: Found dependency on cloud-final.service/start
Jun 25 08:40:28 ubuntu systemd[1]: multi-user.target: Found dependency on multi-user.target/start
Jun 25 08:40:28 ubuntu systemd[1]: multi-user.target: Job kata-agent.service/start deleted to break ordering cycle starting with multi-user.target/start
Jun 25 08:40:28 ubuntu systemd[1]: Created slice system-modprobe.slice.
...
Jun 25 08:40:33 podvm-nginx-nydus-54c6f8c6cc-dz74n-6f4c65e4 systemd-timesyncd[561]: Network configuration changed, trying to establish connection.
Jun 25 08:40:33 podvm-nginx-nydus-54c6f8c6cc-dz74n-6f4c65e4 systemd[1]: multi-user.target: Found ordering cycle on kata-agent.service/start
Jun 25 08:40:33 podvm-nginx-nydus-54c6f8c6cc-dz74n-6f4c65e4 systemd[1]: multi-user.target: Found dependency on cloud-final.service/start
Jun 25 08:40:33 podvm-nginx-nydus-54c6f8c6cc-dz74n-6f4c65e4 systemd[1]: multi-user.target: Found dependency on multi-user.target/start
Jun 25 08:40:33 podvm-nginx-nydus-54c6f8c6cc-dz74n-6f4c65e4 systemd[1]: multi-user.target: Job kata-agent.service/start deleted to break ordering cycle starting with multi-user.target/start
Jun 25 08:40:33 podvm-nginx-nydus-54c6f8c6cc-dz74n-6f4c65e4 systemd[1]: agent-protocol-forwarder.service: Scheduled restart job, restart counter is at 1.
...

@huoqifeng it seems that there are cycle decency on multi-user.target, attestation-agent.service and kata-agent.service maybe we need do a fix, the e2e-test case TestLibvirtKbsKeyRelease is failed because this.

mkulke commented 6 days ago

oh, I only tested mkosi on x86_64 (i.e. without cloud-config)

So assuming you tested mkosi + cloud-config, and that worked, we probably should look into the delta of packer vs mkosi + cloud-config

liudalibj commented 6 days ago

@huoqifeng it seems that there are cycle decency on multi-user.target, attestation-agent.service and kata-agent.service maybe we need do a fix, the e2e-test case TestLibvirtKbsKeyRelease is failed because this.

Verified: replace the cloud-final.service to cloud-init.service, can fix the cycle dependency issue @huoqifeng @mkulke FYI.

With this replace fix, packer ubuntu amd64 peerpod + cloud-init + libvirt + sample kbc, works fine.

huoqifeng commented 6 days ago

@mkulke not sure whether worth for you to have another try with the latest fix (hopefully last :-) ) here https://github.com/confidential-containers/cloud-api-adaptor/pull/1868/commits/1fbc63646559834565be06db6c098f7270dcdf73.

mkulke commented 6 days ago

@mkulke not sure whether worth for you to have another try with the latest fix (hopefully last :-) ) here 1fbc636.

I think that shouldn't impact mkosi on x86_64, let's wait for the libvirt test