confidential-containers / cloud-api-adaptor

Ability to create Kata pods using cloud provider APIs aka the peer-pods approach
Apache License 2.0
46 stars 76 forks source link

podvm-mkosi: need allocate more space for "/run" to store large image data #1921

Open genjuro214 opened 1 month ago

genjuro214 commented 1 month ago

I attempted to run e2e tests on a Fedora s390x Pod VM image built using mkosi, and found the case TestCreatePeerPodWithLargeImage fails on the Fedora image but succeeds on an Ubuntu image built using Packer.

The root cause is there is no enough space to store the image data.

The large test image exceeds 2GB in size and is pulled into the directory "/run/kata-containers/image/layers". However, the mount point "/run" is only allocated 1.6GB of space via tmpfs:

# df -h
Filesystem      Size  Used Avail Use% Mounted on
/dev/vdb2        98G  7.2G   86G   8% /
devtmpfs        4.0M     0  4.0M   0% /dev
tmpfs           4.0G     0  4.0G   0% /dev/shm
tmpfs           1.6G  162M  1.5G  11% /run
tmpfs           4.0G  4.0K  4.0G   1% /tmp
/dev/vdb1       223M   98M  108M  48% /boot
tmpfs           802M  4.0K  802M   1% /run/user/0

So it run out of space when pulling an image larger than 1.6GB.

There is no such problem for Ubuntu image built using Packer, because the mount service "run-kata\x2dcontainers.mount" will mount "/run/kata-containers" to hard disk "/kata-containers".

However, this mount service was removed for mkosi. I can find the related PR and commit: https://github.com/confidential-containers/cloud-api-adaptor/pull/1606 https://github.com/confidential-containers/cloud-api-adaptor/commit/c6ac908799244fb7b7874047e71a61eba48d8a29

Since "/" should be read only, should we allocate more space to "/run" for runtime data?

Based on my investigation, Fedora typically allocates 20% of physical RAM to "/run" via tmpfs by default. We can increase this allocation to 50% or more in /etc/fstab:

tmpfs           /run            tmpfs   rw,nosuid,nodev,size=50%   0 0

Moreover, we can allocate an even larger size than physical RAM by enabling swap. However, we must prepare the swap file on the hard disk in advance.

I hope we can discuss the following options:

  1. increase size percentage to 50% or 75%
  2. enable swap and allocate a larger size.
  3. keep the default size for "/run" but reduce the size of the large test image.
bpradipt commented 1 month ago

@genjuro214 mkosi generated images are more closer to what we eventually want, ie

  1. verity based ro rootfs
  2. All writeable stuff in memory (since memory is encrypted)
  3. Mount encrypted storage if needed.

Now for large images we would want the VM memory to be sufficient to hold the image content and correspondingly the fs size should be adjusted as needed.

I think increasing the size of /run to 75% of mem is good option and run the tests to see if there are any isses. Enabling swap will result in the risk of leaking sensitive data without additional protection in place and better avoided Reducing the size of the large test image doesn't solve the real problem :-)

stevenhorsman commented 1 month ago

All writeable stuff in memory (since memory is encrypted)

I'm not necessarily against this. But I remember that we had a discussion about the security implication before adding the /run/kata-containers -> /kata-containers and Azure and SE were using encrypted disks, so effectively gave us encrypted storage for free, which is why we were comfortable with this change. Has this position changed?

bpradipt commented 1 month ago

All writeable stuff in memory (since memory is encrypted)

I'm not necessarily against this. But I remember that we had a discussion about the security implication before adding the /run/kata-containers -> /kata-containers and Azure and SE were using encrypted disks, so effectively gave us encrypted storage for free, which is why we were comfortable with this change. Has this position changed?

The position remains the same. If the disk is encrypted then we put the sensitive data there. Since encrypted root disk with end user key needs quite a few manual steps (afaik), it's not safe to mount /run/kata-containers into a disk storage by default for mkosi based builds imho.