edgelesssys / contrast

Deploy and manage confidential containers on Kubernetes
https://docs.edgeless.systems/contrast
GNU Affero General Public License v3.0
208 stars 8 forks source link

Cannot get the bare metal setup working on my Epyc 7313p #944

Open blenessy opened 1 month ago

blenessy commented 1 month ago

I could follow the Bare metal setup without any issues on a newly installed Ubuntu 24.10 system (with Linux 6.11 containing the SNP Host patches) without issues:

AFAIK I enabled SNP correctly and upgraded to the latest SEV firmware correctly:

root@epyc-7313p:~# dmesg | grep -i 'SEV\|ccp'
[   13.774631] ccp 0000:44:00.1: enabling device (0000 -> 0002)
[   13.778684] ccp 0000:44:00.1: no command queues available
[   13.780132] ccp 0000:44:00.1: sev enabled
[   13.780141] ccp 0000:44:00.1: psp enabled
[   13.841416] ccp 0000:44:00.1: SEV firmware update successful
[   14.807197] ccp 0000:44:00.1: SEV API:1.55 build:21
[   14.807204] ccp 0000:44:00.1: SEV-SNP API:1.55 build:21
[   14.816262] kvm_amd: SEV enabled (ASIDs 509 - 509)
[   14.816265] kvm_amd: SEV-ES enabled (ASIDs 1 - 508)
[   14.816267] kvm_amd: SEV-SNP enabled (ASIDs 1 - 508)

From the Emojivoto guide, I managed to:

  1. Install the runtime with the fix in #943.
  2. Install the Coordinator without problems
  3. Generate metadata.json and friends with contrast generate
  4. Set the correct Min. TCP for my system:
       "MinimumTCB": {
          "BootloaderVersion": 3,
          "TEEVersion": 0,
          "SNPVersion": 22,
          "MicrocodeVersion": 211
        }

My problem is that running the following command times out: contrast set -c "${coordinator}:1313" --coordinator-policy-hash c36809d83e5b2c7853e95ed08434ff2b7bca4ae1b471229d66dcf712918fcf6f deployment/

Here is the interesting parts of the coordinator-0 kubectl log:

time=2024-10-20T19:21:52.864Z level=INFO msg="Logger initialized" level=INFO
time=2024-10-20T19:21:52.865Z level=INFO msg="Coordinator started"
time=2024-10-20T19:21:52.874Z level=INFO msg="csi device not identified, assuming first start, formatting"
time=2024-10-20T19:21:53.065Z level=INFO msg="csi device mounted to state disk mount point" dev=/dev/csi0 mountPoint=/mnt/state
time=2024-10-20T19:21:53.067Z level=INFO msg="Coordinator user API listening"
time=2024-10-20T19:21:53.067Z level=INFO msg="Coordinator mesh API listening"
time=2024-10-20T19:22:36.438Z level=INFO msg="Issue called" issuer.tee-type=snp
time=2024-10-20T19:22:36.444Z level=INFO msg="Retrieved report" issuer.tee-type=snp issuer.reportRaw=020000000000000000000300000000000000000000000000000000000000000000000000000000000000000000000000000000000100000003000000000016d301000000000000000000000000000000d3ff55398259bc3b0551013278c170ef3db2db1c64e872f6fb3bee319e895e2c7c4a35a4d692bc284e06c9b031ddb5391bffdc605272042a69b005bed0caec18cde33fb25b0af5f9ae88485ec6c0e86bd1c67d2fe722b38084084b12118a0b1b85d2dd0fd7645b5257ec773f85aba61cc36809d83e5b2c7853e95ed08434ff2b7bca4ae1b471229d66dcf712918fcf6f000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000fd197969dcf013aeb3391edbe5a19c73c3f755daaf6a55d625026782e543df1affffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffff03000000000016d30000000000000000000000000000000000000000000000000f8f087080b26bbb05e94cd452344a55cb34c86ad96d31b04e4b9756222b1c40891051636cd27110babb5ca8fa2fc7efaed6d301676daa7834b149222d95ba1e03000000000016d3153701001537010003000000000016d30000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000006a9bed2db23542b11d93627e6ab108d3c23850e2ff4abbe57a62ff1b67995722ba657127a3f5946a9a04cc09298777c50000000000000000000000000000000000000000000000007acf46cd5638ee0fc50dce35b8043bbb0519ba8ad5785d0ef4c0b91a606ff9d54d70e76b3b12987e7c12c0e4f080ee880000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000ecae0c0f950243b1afa20ae2e0d565b6300000000800000000000000000000000000000000000000000000000000000008000000010fa000
time=2024-10-20T19:23:06.439Z level=INFO msg="Issue called" issuer.tee-type=snp
...
Freax13 commented 1 month ago

The TCB and the policy hash match the values in the attestation report. Your setup looks good to me.

My problem is that running the following command times out: contrast set -c "${coordinator}:1313" --coordinator-policy-hash c36809d83e5b2c7853e95ed08434ff2b7bca4ae1b471229d66dcf712918fcf6f deployment/

Can you set --log-level debug and try again?

Can you double-check that you can reach the port at ${coordinator}:1313 from the machine where you're executing contrast set?

blenessy commented 1 week ago
  1. Having be away for a while, I shut down my dev. Epyc 7313. After a fresh start the problem in question magically disappeared - iow. the following command succeeded:
    contrast set -c "${coordinator}:1313" \
      --coordinator-policy-hash c36809d83e5b2c7853e95ed08434ff2b7bca4ae1b471229d66dcf712918fcf6f deployment/ \
      --log-level debug
  2. Then I retried the same command (1) again... and failure.
  3. Then I did a systemctl restart k3s
  4. Then I retried the same command (1) again... and success.

It does seem like there is some idempotency problem here. Attaching logs as requested. contrast-cli.log contrast-0.log