Open akutz opened 1 year ago
@akutz , thanks for the proposal. You have some interesting ideas.
In your scenario, when you speak of bootstrap data, are you referring to the cloud-init user data? Also, if there was a scenario where the TPM is available pre-boot, why do we need a separate shared secret? Why wouldn't we be able to use the public EK directly to encrypt the bootstrap data?
We have explored the idea of working with TPMs in the past and how cloud-init could support confidential compute. In exploring this, we have run into a roadblock. On the public clouds we have tried, the vTPM of the instance isn't available until after the instance is started, but by the time the instance has started, the userdata has already been fed to the cloud. Thus we have no way of pre-encrypting the data. Do things work differently on VMWare?
That said, if this is a viable use case for any cloud currently,I think it'd be valuable to get the infrastructure in place.
Also, if there was a scenario where the TPM is available pre-boot, why do we need a separate shared secret? Why wouldn't we be able to use the public EK directly to encrypt the bootstrap data?
Because of MAX_SYM_DATA
, the maximum number of octets that may be in a sealed blob. The definition of MAX_SYM_DATA
may be found in Part 4: Supporting routines, Table 7 as well as Part 2: Structures, section 11.1.13, which states _For interoperability, MAX_SYM_DATA
should be 128
_. So while it is possible to seal data with a length that exceeds MAX_SYM_DATA
, it is not recommended to do so. In fact, when I tried on vSphere, anything larger than 128
causes the unseal operation to fail.
Given the size limitation, it makes sense to leverage the EK to send in an encrypted, shared secret that is then used to decrypt the potentially much larger payload, the metadata / userdata / vendordata.
We have explored the idea of working with TPMs in the past and how cloud-init could support confidential compute. In exploring this, we have run into a roadblock. On the public clouds we have tried, the vTPM of the instance isn't available until after the instance is started, but by the time the instance has started, the userdata has already been fed to the cloud. Thus we have no way of pre-encrypting the data. Do things work differently on VMWare?
I'm not sure why the vTPM would not yet be available if the normal init image is used. I can give this a go myself on vSphere to find out. I'll let you know what I find.
Hi @TheRealFalcon,
We have explored the idea of working with TPMs in the past and how cloud-init could support confidential compute. In exploring this, we have run into a roadblock. On the public clouds we have tried, the vTPM of the instance isn't available until after the instance is started, but by the time the instance has started, the userdata has already been fed to the cloud. Thus we have no way of pre-encrypting the data. Do things work differently on VMWare?
I'm not sure why the vTPM would not yet be available if the normal init image is used. I can give this a go myself on vSphere to find out. I'll let you know what I find.
I just deployed a VM Service VM using the following data:
---
apiVersion: vmoperator.vmware.com/v1alpha1
kind: VirtualMachine
metadata:
name: my-vm-9
namespace: my-namespace
spec:
className: best-effort-small-with-vtpm
imageName: vmi-46dde29f52fb5afcb
storageClass: wcpglobal-storage-profile
vmMetadata:
transport: CloudInit
secretName: my-vm-9-bootstrap-data
---
apiVersion: v1
kind: Secret
metadata:
name: my-vm-9-bootstrap-data
namespace: my-namespace
stringData:
user-data: |
#cloud-config
users:
- default
- name: akutz
primary_group: akutz
sudo: ALL=(ALL) NOPASSWD:ALL
groups: users
lock_passwd: false
ssh_authorized_keys:
- ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAACAQDE0c5FczvcGSh/tG4iw+Fhfi/O5/EvUM/96js65tly4++YTXK1d9jcznPS5ruDlbIZ30oveCBd3kT8LLVFwzh6hepYTf0YmCTpF4eDunyqmpCXDvVscQYRXyasEm5olGmVe05RrCJSeSShAeptv4ueIn40kZKOghinGWLDSZG4+FFfgrmcMCpx5YSCtX2gvnEYZJr0czt4rxOZuuP7PkJKgC/mt2PcPjooeX00vAj81jjU2f3XKrjjz2u2+KIt9eba+vOQ6HiC8c2IzRkUAJ5i1atLy8RIbejo23+0P4N2jjk17QySFOVHwPBDTYb0/0M/4ideeU74EN/CgVsvO6JrLsPBR4dojkV5qNbMNxIVv5cUwIy2ThlLgqpNCeFIDLCWNZEFKlEuNeSQ2mPtIO7ETxEL2Cz5y/7AIuildzYMc6wi2bofRC8HmQ7rMXRWdwLKWsR0L7SKjHblIwarxOGqLnUI+k2E71YoP7SZSlxaKi17pqkr0OMCF+kKqvcvHAQuwGqyumTEWOlH6TCx1dSPrW+pVCZSHSJtSTfDW2uzL6y8k10MT06+pVunSrWo5LHAXcS91htHV1M1UrH/tZKSpjYtjMb5+RonfhaFRNzvj7cCE1f3Kp8UVqAdcGBTtReoE8eRUT63qIxjw03a7VwAyB2w+9cu1R9/vAo8SBeRqw== sakutz@gmail.com
runcmd:
- tdnf update --assumeno
- tdnf install -y build-essential jq python3 python3-devel python3-pip python3-pyyaml sudo tpm2-tools tpm2-tss tpm2-tss-devel
- pip3 install pkgconfig tpm2-pytss
- tpm2_nvread '0x01C00002' | openssl x509 -noout -text
After the VM came online I looked at /var/log/cloud-init-output.log
, and this was at the end:
WARN: Reading full size of the NV index
Certificate:
Data:
Version: 3 (0x2)
Serial Number:
fb:1f:0e:35:b6:0f:d9:c5
Signature Algorithm: sha256WithRSAEncryption
Issuer: CN = CA, DC = vsphere, DC = local, C = US, ST = California, O = sc2-10-184-103-126.eng.vmware.com, OU = VMware Engineering
Validity
Not Before: Sep 8 22:15:54 2023 GMT
Not After : Aug 17 08:45:41 2033 GMT
Subject: 2.23.133.2.1 = id:564D5700, 2.23.133.2.2 = VMware TPM2, 2.23.133.2.3 = id:00020065
Subject Public Key Info:
Public Key Algorithm: rsaEncryption
Public-Key: (2048 bit)
Modulus:
00:c7:4a:a9:e5:8d:a4:e0:64:38:34:29:83:b7:01:
2e:22:53:b0:20:09:52:61:80:2a:cd:c8:4f:a6:13:
30:37:e6:a0:db:ad:13:1f:91:47:4f:4e:d7:20:09:
8c:62:25:57:ce:22:78:3c:e2:7e:b2:9e:ad:f8:0a:
f3:41:d1:7d:92:b4:ed:78:aa:d1:3b:1a:eb:4a:bf:
e7:3c:55:2d:c1:95:3c:f1:76:64:d2:ea:8d:ff:6c:
bc:ba:17:90:bd:f5:cb:a1:5a:ef:d7:ec:55:34:55:
97:4d:59:c3:a3:19:79:be:08:94:2d:bb:9d:a3:72:
92:85:75:6d:3d:04:29:a6:9c:d7:29:a1:94:6b:24:
ac:ad:f1:6e:8b:ac:72:cd:e4:4b:59:dd:94:a5:ea:
d4:79:e5:c5:5c:53:82:f4:d9:b9:dd:29:c2:3d:64:
f4:e4:1f:1b:a3:c8:a1:e2:4b:57:50:85:d1:a1:fc:
24:12:21:4d:2d:7c:2b:1e:3d:19:79:fe:98:60:cb:
50:69:c4:7b:8d:87:45:45:59:82:cc:cb:6e:9b:37:
d8:c2:df:32:32:a2:75:08:3f:11:07:46:40:dc:c7:
71:ed:00:c7:fd:c7:6f:a7:52:3a:a1:db:93:87:2c:
bf:71:97:30:9f:0f:8a:fe:3c:be:e9:12:37:4d:9c:
6f:2f
Exponent: 65537 (0x10001)
X509v3 extensions:
X509v3 Key Usage: critical
Key Encipherment
X509v3 Subject Alternative Name: critical
DirName:/2.23.133.2.1=id:564D5700/2.23.133.2.2=VMware TPM2/2.23.133.2.3=id:00020065
X509v3 Basic Constraints: critical
CA:FALSE
X509v3 Extended Key Usage:
2.23.133.8.1
X509v3 Subject Directory Attributes:
0...2.0.....t 0.0...g....1
X509v3 Subject Key Identifier:
40:8F:D0:E6:A9:1E:28:D5:3F:CA:12:96:1B:8E:25:7F:69:89:3E:52
X509v3 Authority Key Identifier:
A6:C2:AF:A2:C3:CE:68:BC:5C:B1:8B:83:11:9A:79:FA:C8:96:C1:8B
Authority Information Access:
CA Issuers - URI:https://sc2-10-184-103-126.eng.vmware.com/afd/vecs/ca
Signature Algorithm: sha256WithRSAEncryption
Signature Value:
24:d3:91:b2:ed:e3:c6:1e:22:4e:66:c1:97:66:cd:c0:fb:a7:
ac:7e:6c:10:e7:0b:87:d9:27:18:fd:11:cb:ac:b8:3d:b9:bc:
57:f8:b7:ab:8c:a1:76:57:a2:dd:d6:86:7f:f9:9f:70:52:c6:
7e:a4:6c:bd:94:8c:3f:0d:6f:cd:dc:36:ca:dd:85:d7:a9:42:
ed:f7:74:20:ba:f3:af:82:0a:98:23:29:80:2a:2e:ed:96:36:
8a:af:f7:84:0b:16:49:a4:3c:a9:f5:30:6c:8a:7b:69:8e:bd:
1a:bd:92:b4:16:38:c5:64:4e:18:c9:2c:08:7e:27:16:1c:4e:
4b:68:b7:f0:dc:42:d7:3b:b8:76:fb:77:49:07:8e:06:d0:57:
af:4d:38:be:7a:a3:06:ae:74:fd:d0:64:b1:a0:a3:04:33:d3:
b6:64:93:18:db:46:e7:7d:d6:d9:58:02:4a:f0:57:a0:a3:4f:
a6:dd:a5:91:25:e9:0d:d3:29:fe:50:c0:88:3d:d3:7a:f3:1e:
8e:d3:a8:93:1c:bf:f5:34:6b:aa:cc:df:84:1a:07:00:44:70:
ea:2b:61:23:6f:e9:d7:7d:12:67:c4:3d:da:fe:7b:09:41:3a:
00:24:42:ee:ee:3c:d5:6d:fd:30:9f:a4:63:22:e5:f9:db:2b:
ec:51:18:60:54:57:0f:b1:5c:02:98:8a:7d:47:3b:9c:43:40:
82:80:aa:be:5f:3b:c4:79:7a:8b:76:d9:76:fc:25:9f:e8:e9:
cb:fb:d4:6e:be:99:07:ae:b6:3d:f1:35:94:87:e2:7a:3c:cc:
8e:55:bd:06:3b:39:fe:e5:d1:c4:85:22:22:d5:bc:2e:84:46:
c9:8b:54:b6:1a:ed:aa:d5:39:f6:52:fb:6a:ca:60:90:4a:b0:
fa:13:27:55:24:d7:09:63:b7:48:a5:75:cb:85:f3:98:d1:bc:
23:34:ff:1b:4c:2c:46:b6:ad:18:58:6e:d1:92:33:5b:12:ed:
53:fa:f3:45:e6:23
Cloud-init v. 23.1.1 finished at Fri, 08 Sep 2023 22:18:20 +0000. Datasource DataSourceVMware [seed=guestinfo]. Up 96.17 seconds
So yeah, it looks like we're able to interact with the TPM without an issue, at least during the runcmd
module.
Hi @TheRealFalcon,
So yeah, it looks like we're able to interact with the TPM without an issue, at least during the runcmd module.
I just tried the following to validate it works end-to-end:
⚠️ Please note if you saw an earlier revision of this comment about it not working, it was because I had transposed two lines in the list of commands under runcmd
. Once I fixed that, it worked fine.
Deploy a VM without powering it on:
cat <<EOF | kubectl apply -f -
apiVersion: vmoperator.vmware.com/v1alpha1
kind: VirtualMachine
metadata:
name: my-vm-10
namespace: my-namespace
spec:
className: best-effort-small-with-vtpm
imageName: vmi-46dde29f52fb5afcb
storageClass: wcpglobal-storage-profile
powerState: poweredOff
vmMetadata:
transport: CloudInit
secretName: my-vm-10-bootstrap-data
---
apiVersion: v1
kind: Secret
metadata:
name: my-vm-10-bootstrap-data
namespace: my-namespace
stringData:
user-data: |
#cloud-config
EOF
Wait for it to be created...
Encrypt a secret to it using govc vm.tpm2.seal
from https://github.com/vmware/govmomi/pull/3222:
echo "Hello, James." | govc vm.tpm2.seal -vm my-vm-10 -json
Which emitted the following output:
{
"public": "AE4ACAALAAAEAAAgAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAEAAgAjJUvmxZSepeIuLFOyKwvBdoIAhQwqAxjlaSw0CcznM=",
"private": "AFoAIH52GhSnDP3e/gj9IP8RhjPURaJHM8JGpPjDUgEuwbxpLkajp5ZASjalLkRdm3cW5pgU3GJ0iA5EEi9JglxylaspUUczwjguW/jXuG8d1felZQTGRDQYHzE=",
"seed": "AQAiK93UfeaWIO+ZMT6PujPoo29FIzNVrj2DtLVRgsDMBeYqMmJY1yfNUrUuGmQtt/VDgprYdiVRMGOY0C0gz2e5lOJMG/z0Ic1MMGvJinNAik8T6IP0jeqcdgIXm7zCPVz5ig+x0AnN8ehGKs232oYxUzUjEeuGAGH4V7SoqNCZccotWE1tpMiBwRzrbPoWhZoVTV1Yqt0l6e2NThlV4WEgwtUJ6opN08+qC2Z5cAIOnoHKAfxyHKur8eUj4sxbth4au2juF7VFQXW6jSmduvV/yqD9JCmT6G5E01olJXFP5HYci2zZhIcBQaSZEOEqmq4rezoD8YA0yH4lCDTYW15V"
}
Update the VM's bootstrap data to decrypt the above secret at boot:
cat <<EOF | kubectl apply -f -
apiVersion: v1
kind: Secret
metadata:
name: my-vm-10-bootstrap-data
namespace: my-namespace
stringData:
user-data: |
#cloud-config
users:
- default
- name: akutz
primary_group: akutz
sudo: ALL=(ALL) NOPASSWD:ALL
groups: users
lock_passwd: false
ssh_authorized_keys:
- ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAACAQDE0c5FczvcGSh/tG4iw+Fhfi/O5/EvUM/96js65tly4++YTXK1d9jcznPS5ruDlbIZ30oveCBd3kT8LLVFwzh6hepYTf0YmCTpF4eDunyqmpCXDvVscQYRXyasEm5olGmVe05RrCJSeSShAeptv4ueIn40kZKOghinGWLDSZG4+FFfgrmcMCpx5YSCtX2gvnEYZJr0czt4rxOZuuP7PkJKgC/mt2PcPjooeX00vAj81jjU2f3XKrjjz2u2+KIt9eba+vOQ6HiC8c2IzRkUAJ5i1atLy8RIbejo23+0P4N2jjk17QySFOVHwPBDTYb0/0M/4ideeU74EN/CgVsvO6JrLsPBR4dojkV5qNbMNxIVv5cUwIy2ThlLgqpNCeFIDLCWNZEFKlEuNeSQ2mPtIO7ETxEL2Cz5y/7AIuildzYMc6wi2bofRC8HmQ7rMXRWdwLKWsR0L7SKjHblIwarxOGqLnUI+k2E71YoP7SZSlxaKi17pqkr0OMCF+kKqvcvHAQuwGqyumTEWOlH6TCx1dSPrW+pVCZSHSJtSTfDW2uzL6y8k10MT06+pVunSrWo5LHAXcS91htHV1M1UrH/tZKSpjYtjMb5+RonfhaFRNzvj7cCE1f3Kp8UVqAdcGBTtReoE8eRUT63qIxjw03a7VwAyB2w+9cu1R9/vAo8SBeRqw== sakutz@gmail.com
runcmd:
- tdnf update --assumeno
- tdnf install -y build-essential jq python3 python3-devel python3-pip python3-pyyaml sudo tpm2-tools tpm2-tss tpm2-tss-devel
- pip3 install pkgconfig tpm2-pytss
- echo 'AE4ACAALAAAEAAAgAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAEAAgAjJUvmxZSepeIuLFOyKwvBdoIAhQwqAxjlaSw0CcznM=' | base64 -d >/tmp/enc.bin.pub
- echo 'AFoAIH52GhSnDP3e/gj9IP8RhjPURaJHM8JGpPjDUgEuwbxpLkajp5ZASjalLkRdm3cW5pgU3GJ0iA5EEi9JglxylaspUUczwjguW/jXuG8d1felZQTGRDQYHzE=' | base64 -d >/tmp/enc.bin.priv
- echo 'AQAiK93UfeaWIO+ZMT6PujPoo29FIzNVrj2DtLVRgsDMBeYqMmJY1yfNUrUuGmQtt/VDgprYdiVRMGOY0C0gz2e5lOJMG/z0Ic1MMGvJinNAik8T6IP0jeqcdgIXm7zCPVz5ig+x0AnN8ehGKs232oYxUzUjEeuGAGH4V7SoqNCZccotWE1tpMiBwRzrbPoWhZoVTV1Yqt0l6e2NThlV4WEgwtUJ6opN08+qC2Z5cAIOnoHKAfxyHKur8eUj4sxbth4au2juF7VFQXW6jSmduvV/yqD9JCmT6G5E01olJXFP5HYci2zZhIcBQaSZEOEqmq4rezoD8YA0yH4lCDTYW15V' | base64 -d >/tmp/enc.bin.seed
- tpm2_createek -c /tmp/ek.ctx
- tpm2_startauthsession --policy-session -S /tmp/session.dat
- tpm2_policysecret -c 0x4000000B -S /tmp/session.dat
- tpm2_import -C /tmp/ek.ctx -P "session:/tmp/session.dat" -u /tmp/enc.bin.pub -i /tmp/enc.bin.priv -s /tmp/enc.bin.seed -r /tmp/enc.bin.key
- tpm2_flushcontext /tmp/session.dat && rm -f /tmp/session.dat
- tpm2_startauthsession --policy-session -S /tmp/session.dat
- tpm2_policysecret -c 0x4000000B -S /tmp/session.dat
- tpm2_load -C /tmp/ek.ctx -P "session:/tmp/session.dat" -u /tmp/enc.bin.pub -r /tmp/enc.bin.key -c /tmp/enc.bin.ctx
- tpm2_flushcontext /tmp/session.dat && rm -f /tmp/session.dat
- tpm2_startauthsession --policy-session -S /tmp/session.dat
- tpm2_unseal -p "session:/tmp/session.dat" -c /tmp/enc.bin.ctx
- tpm2_flushcontext /tmp/session.dat && rm -f /tmp/session.dat
EOF
Power on the VM:
cat <<EOF | kubectl apply -f -
apiVersion: vmoperator.vmware.com/v1alpha1
kind: VirtualMachine
metadata:
name: my-vm-10
namespace: my-namespace
spec:
className: best-effort-small-with-vtpm
imageName: vmi-46dde29f52fb5afcb
storageClass: wcpglobal-storage-profile
powerState: poweredOn
vmMetadata:
transport: CloudInit
secretName: my-vm-10-bootstrap-data
EOF
SSH into the VM...
Look at /var/log/cloud-init-output.log
:
837197674484b3f81a90cc8d46a5d724fd52d76e06520b64f2a1da1b331469aa
837197674484b3f81a90cc8d46a5d724fd52d76e06520b64f2a1da1b331469aa
name: 000bef2a25937b6d46979f35a4cad9eb462f2faf12f106bfbdab5f945656c867b8c4
Hello, James.
Cloud-init v. 23.1.1 finished at Fri, 08 Sep 2023 23:27:04 +0000. Datasource DataSourceVMware [seed=guestinfo]. Up 93.79 seconds
Maybe my colleague @jessepool can help us understand why it works on vSphere and not other hyperscalers?
Also, cc @randomvariable on the above example since we've been discussing secure Cloud-Init.
@akutz , That's exciting. I'm glad you were able to make it work. We're definitely interested in getting something like this working in cloud-init.
Maybe my colleague @jessepool can help us understand why it works on vSphere and not other hyperscalers?
I think it's because you were able to create the machine, then run your TPM seal between creation and power-on. The public clouds we have tried have no option of creating an instance without also simultaneously starting it. There is no TPM interaction possible before feeding the userdata to the instance.
Do you have any thoughts moving forward? Realistically, the upstream devs probably won't get to anything like this in the short term, but in the next few months we can start to pick this up. We would also likely need a lot of help with testing using the proper vSphere resources. Would you be able to help with that? You showed some stubs in the OP. Were you working on a cloud-init POC there or just spitballing?
@akutz , circling back here...are you able to answer the questions from my previous comment? Is there any POC code you have that is shareable? We're interested in helping drive something like this into cloud-init, but I'm hoping we can start with something a little more concrete as a starting point.
Enhancement
There is often a need to bootstrap a guest with information that may be classified as sensitive, ex. a public/private key pair written out as files. While it is true that every datasource could develop their own means of handling this problem, it occurred to me recently that perhaps they do not have to do so.
I've been busy of late digging into how to leverage TPMs as a means to securely bootstrap machines (see https://github.com/google/go-tpm/pull/343 and https://github.com/vmware/govmomi/pull/3222), and developed a workflow for bootstrapping guests that leverages the TPM's existing endorsement key (EK):
MAX_SYM_SIZE
for portability with respect to TPMs.A secure bootstrap model has been demonstrated before, which eventually became Keylime. I think Keylime is great, and there is nothing in the above workflow that precludes leveraging Keylime for on-going, secure communication with a guest, either from/to Cloud-Init or other actors.
However, the bootstrap problem is one that I think can be addressed without requiring key exchange, either explicit attestation or storage, and instead rely on a simplified version (there may or may not be an auth policy that relies on a specific combination of PCRs) of implicit attestation via the EK, something every TPM already has or has the ability to determinstically generate. To that end, if an EK-based bootstrap model were adopted, it could be something that Cloud-Init directly utilized as opposed to leaving it up to every datasource implementation. For example, the Datasource interface could be augmented with:
With something like the above, it could be up to Cloud-Init core to handle interfacing with the TPM to decrypt the shared secret via the example outlined in
tpm2-ekunseal.sh
. I imagine shell execing to tpm2-tools is a better option than requiring Cloud-Init to depend on tpm2-pytss directly. Plus, this way it keeps the use of encryption as optional without adding another dependency to Cloud-Init's Python modules.Anyway, I am quite happy to discuss this further if there is interest. Thanks!