containers / libkrun

A dynamic library providing Virtualization-based process isolation capabilities
Apache License 2.0
901 stars 74 forks source link

Introduce the krun_set_data_disk API. #102

Closed blenessy closed 1 year ago

blenessy commented 1 year ago

This API adds the possibility to introduce a second block device, to a TEE.

It is assumed that the root disk contains a symmetric key (secret), and code to encrypt/decrypt the data disk before use. The recommended setup is to included cryptsetup on the root disk and use that to safely access the data disk.

blenessy commented 1 year ago

It would be nicer to assume that all data disks are LUKS encrypted as well and have initrd unlock, mount it implicitly on top of the root disk. In which case we need to introduce Mounts or Volumes in .krun_config.json to define the mountpoints.

slp commented 1 year ago

Thanks for the PR! This could be a good opportunity to think about the the different storage scenarios we'd like to support. For instance, I remember a proposal in the past about the possibility of having a read-only root disk and a layered ephemeral data disk.

Now, to express such diversity of scenarios, it'll probably be better to rely exclusively on .krun_config.json and leave the related APIs as a legacy method. This also means we need a way to expose .krun_config.json to the guest, but this is something that was already planned, most likely by putting it in the first partition of the root disk.

@blenessy @tylerfanelli What do you think?

blenessy commented 1 year ago

Thanks for the PR! This could be a good opportunity to think about the the different storage scenarios we'd like to support. For instance, I remember a proposal in the past about the possibility of having a read-only root disk and a layered ephemeral data disk.

Now, to express such diversity of scenarios, it'll probably be better to rely exclusively on .krun_config.json and leave the related APIs as a legacy method. This also means we need a way to expose .krun_config.json to the guest, but this is something that was already planned, most likely by putting it in the first partition of the root disk.

@blenessy @tylerfanelli What do you think?

FWIW. the scenario I'm wrestling with involves running a Avalanche Node in a libkrun-sev TEE. Apart from the 1000+ TCP connections it wants to use, the key challenge is to maintain the huge (200GB) data space (~/.avalanchego) between different avalanchego releases. That is the reason I opened this PR - it becomes way more manageable to separate the data from code.

(1) In regards to .krun_config.json, we can definitely continue building on that as it is a relatively easy and safe solution. The only downside is that you have to rebuild your LUKS rootfs image every time you want to change something. However, as long as you only keep relatively static info there (Env., Entrypoint, etc.) it will not become a practical problem I think.

(2) An alternative to .krun_config.json is to pass the .krun_config.json key/values through the /resource/{repository}/{type}/{tag} KBS API instead. It would make the infra a bit more complicated (from a consistency perspective) as the info from each docker image you want to convert to krun image, would need to be pushed to both krun (rootfs) image registry and attestation server (assuming these are different databases).

(3) A more portable and less TEE-specific alternative to (2) is to move the .krun_config.json processing down from initrd to a new simple init process in rootfs. Think tini on steroids. Given an API token is found in the rootfs, it could GET (HTTPS) the .krun_config.json key/values from even a docker registry :) - maybe from from the krun image registry ? This idea is inspired by EnvKey.

Hope this helps and I just popped (2) and (3) so they might be a bit naive :).

@slp

slp commented 1 year ago

Thanks for the PR! This could be a good opportunity to think about the the different storage scenarios we'd like to support. For instance, I remember a proposal in the past about the possibility of having a read-only root disk and a layered ephemeral data disk. Now, to express such diversity of scenarios, it'll probably be better to rely exclusively on .krun_config.json and leave the related APIs as a legacy method. This also means we need a way to expose .krun_config.json to the guest, but this is something that was already planned, most likely by putting it in the first partition of the root disk. @blenessy @tylerfanelli What do you think?

FWIW. the scenario I'm wrestling with involves running a Avalanche Node in a libkrun-sev TEE. Apart from the 1000+ TCP connections it wants to use, the key challenge is to maintain the huge (200GB) data space (~/.avalanchego) between different avalanchego releases. That is the reason I opened this PR - it becomes way more manageable to separate the data from code.

Sounds like a interesting use case. Since extending the configuration file may take a while, I think we can merge this PR to cover the most straightforward use and eventually phase it own once we have a more expressive way to define the storage options.

Could you please rebase your changes and a "Signed-off-by" line to the commit? We may also need to make some other minor changes, but I'd like to see the CI tests passing first.

(1) In regards to .krun_config.json, we can definitely continue building on that as it is a relatively easy and safe solution. The only downside is that you have to rebuild your LUKS rootfs image every time you want to change something. However, as long as you only keep relatively static info there (Env., Entrypoint, etc.) it will not become a practical problem I think.

(2) An alternative to .krun_config.json is to pass the .krun_config.json key/values through the /resource/{repository}/{type}/{tag} KBS API instead. It would make the infra a bit more complicated (from a consistency perspective) as the info from each docker image you want to convert to krun image, would need to be pushed to both krun (rootfs) image registry and attestation server (assuming these are different databases).

(3) A more portable and less TEE-specific alternative to (2) is to move the .krun_config.json processing down from initrd to a new simple init process in rootfs. Think tini on steroids. Given an API token is found in the rootfs, it could GET (HTTPS) the .krun_config.json key/values from even a docker registry :) - maybe from from the krun image registry ? This idea is inspired by EnvKey.

Now I've noticed I was talking about .krun_config.json when I should've been talking about tee-config-sev.json, which maps to TeeConfig: https://github.com/containers/libkrun/blob/c60fb0479fe1a1377ed89d98758d684be3129af4/src/vmm/src/resources.rs#L57

While .krun_config.json is expected to contain private information and thus is required to reside in the LUKS volume, TeeConfig contains information that is not confidential. And we know we're going to need to put it somewhere accessible by the guest (most likely a second disk partition) for SNP, as the attestation is initiated by init.c instead of the VMM. But, as I said above, this change will probably take a while, so I think it's reasonable to move forward with this PR.

blenessy commented 1 year ago

Done @slp . Please let me know if I missed something.