coreos / fedora-coreos-tracker

Issue tracker for Fedora CoreOS
https://fedoraproject.org/coreos/
262 stars 59 forks source link

Ephemeral rootfs sizing in live environment doesn't account for swap space #1688

Open robbycuenot opened 6 months ago

robbycuenot commented 6 months ago

Describe the bug

I am pxe booting FCOS 39 with UEFI and SecureBoot using shim.efi -> grubx64.efi on bare-metal (both virtualized and on real hardware, Dell Optiplex 7060 Micro with 8GB DDR4). In both scenarios, a 256GB drive is added as swap space. The goal is to have stateless machines running containers, using ignition.

Regardless of memory available to the system, running containers in such an environment results in inode exhaustion over time:

systemUser@example-system:~$ df -i
Filesystem      Inodes IUsed   IFree IUse% Mounted on
/dev/loop1       36511 36511       0  100% /sysroot
/dev/loop0       69368 69245     123  100% /etc
devtmpfs        888181   564  887617    1% /dev
tmpfs           990085     3  990082    1% /dev/shm
efivarfs             0     0       0     - /sys/firmware/efi/efivars
tmpfs           819200  1151  818049    1% /run
tmpfs           990085     2  990083    1% /run/ephemeral_base
tmpfs          1048576    20 1048556    1% /tmp
tmpfs           198017    27  197990    1% /run/user/1001

By default, podman containers are stored under /var, which is stored under the /dev/loop0 filesystem.

systemUser@example-system:~$ df -h /var/lib
Filesystem      Size  Used Avail Use% Mounted on
/dev/loop0      3.8G  3.8G   28K 100% /var

The solutions I am considering in the meantime involve creating a filesystem on the swap drive specifically for /var; however I am curious if there is a way to fix this for systems that do not have physical storage attached but do have ample memory.

For reference, I am creating 3 containers at startup -- a terraform cloud agent, a github actions runner, and a fan controller for another machine.

Reproduction steps

  1. Boot a Fedora CoreOS live image
  2. Start multiple containers until inode exhaustion occurs

Expected behavior

Containers are able to be created until memory exhaustion is reached.

Actual behavior

Containers fail prematurely, due to an inability to write new files to the overlayfs, despite several GB of free memory on the system.

System details

Butane or Ignition config

{
  "ignition": { "version": "3.4.0" },
  "passwd": {
    "users": [
      {
        "name": "systemUser",
        "sshAuthorizedKeys": [
          "redacted"
        ],
        "groups": [ "sudo" ]
      }
    ]
  },
  "storage": {
    "files": [
      {
        "path": "/var/home/systemUser/create_tpm.sh",
        "mode": 493,
        "contents": { "source": "http://example.com/scripts/create_tpm.sh" }
      },
      {
        "path": "/var/home/systemUser/remove_tpm.sh",
        "mode": 493,
        "contents": { "source": "http://example.com/scripts/remove_tpm.sh" }
      },
      {
        "path": "/var/home/systemUser/decrypt_with_tpm.sh",
        "mode": 493,
        "contents": { "source": "http://example.com/scripts/decrypt_with_tpm.sh" }
      },
      {
        "path": "/var/home/systemUser/encrypt_with_tpm.sh",
        "mode": 493,
        "contents": { "source": "http://example.com/scripts/encrypt_with_tpm.sh" }
      },
      {
        "path": "/var/home/systemUser/unattended_decryption.sh",
        "mode": 493,
        "contents": { "source": "http://example.com/scripts/unattended_decryption.sh" }
      },
      {
        "path": "/usr/local/bin/tfcagent_service.sh",
        "mode": 493,
        "contents": { "source": "http://example.com/scripts/tfcagent_service.sh" }
      },
      {
        "path": "/usr/local/bin/ghaction_service.sh",
        "mode": 493,
        "contents": { "source": "http://example.com/scripts/ghaction_service.sh" }
      },
      {
        "path": "/usr/local/bin/idrac_fan_service.sh",
        "mode": 493,
        "contents": { "source": "http://example.com/scripts/idrac_fan_service.sh" }
      },
      {
        "path": "/var/home/systemUser/encrypted_tfcagent_token.txt",
        "mode": 493,
        "contents": { "source": "http://example.com/data/00_00_00_00_00_00/encrypted_tfcagent_token.txt" }
      },
      {
        "path": "/var/home/systemUser/encrypted_ghaction_token.txt",
        "mode": 493,
        "contents": { "source": "http://example.com/data/00_00_00_00_00_00/encrypted_ghaction_token.txt" }
      },
      {
        "path": "/var/home/systemUser/encrypted_idrac_token.txt",
        "mode": 493,
        "contents": { "source": "http://example.com/data/00_00_00_00_00_00/encrypted_idrac_token.txt" }
      },
      {
        "path": "/var/home/systemUser/github_token_organization.txt",
        "mode": 493,
        "contents": { "source": "http://example.com/data/00_00_00_00_00_00/github_token_organization.txt" }
      },
      {
        "path": "/var/home/systemUser/idrac_fan_settings.json",
        "mode": 493,
        "contents": { "source": "http://example.com/data/00_00_00_00_00_00/idrac_fan_settings.json" }
      }
    ],
    "disks": [
      {
        "device": "/dev/sda",
        "wipeTable": true,
        "partitions": [
          {
            "label": "swap",
            "sizeMiB": 0,
            "startMiB": 0,
            "typeGuid": "0657FD6D-A4AB-43C4-84E5-0933C84B4F4F"
          }
        ]
      }
    ]
  },
  "systemd": {
    "units": [
      {
        "name": "format-swap.service",
        "enabled": true,
        "contents": "[Unit]\nDescription=Format /dev/sda1 as swap\nAfter=local-fs.target\n[Service]\nType=oneshot\nExecStart=/sbin/mkswap /dev/sda1\nExecStart=/sbin/swapon /dev/sda1\nRemainAfterExit=true\n[Install]\nWantedBy=multi-user.target\n"
      },
      {
        "name": "tfcagent.service",
        "enabled": true,
        "contents": "[Unit]\nDescription=Terraform Cloud Agent Container\nAfter=network.target\n\n[Service]\nType=simple\nExecStartPre=/bin/sleep 30\nExecStart=/usr/local/bin/tfcagent_service.sh\nRestart=always\n\n[Install]\nWantedBy=multi-user.target\n"
      },
      {
        "name": "ghaction.service",
        "enabled": true,
        "contents": "[Unit]\nDescription=GitHub Action Runner Container\nAfter=network.target\n\n[Service]\nType=simple\nExecStartPre=/bin/sleep 45\nExecStart=/usr/local/bin/ghaction_service.sh\nRestart=always\n\n[Install]\nWantedBy=multi-user.target\n"
      },
      {
        "name": "idracfan.service",
        "enabled": true,
        "contents": "[Unit]\nDescription=iDrac Fan Controller Container\nAfter=network.target\n\n[Service]\nType=simple\nExecStartPre=/bin/sleep 50\nExecStart=/usr/local/bin/idrac_fan_service.sh\nRestart=always\n\n[Install]\nWantedBy=multi-user.target\n"
      }
    ]
  }
}

Additional information

No response

jlebon commented 6 months ago

I think the issue here is probably that the live setup isn't "swap-aware". It sizes the ephemeral rootfs based on physical RAM (more specifically, the size=50% tmpfs mount option is documented as being relative to available physical RAM). But anyway, another thing here is that this code executes well before Ignition would create the swap device. (Note BTW that nowadays you can use .storage.filesystems[] for formatting the swap device too.)

The solutions I am considering in the meantime involve creating a filesystem on the swap drive specifically for /var

That's what I'd suggest as well. If you want no persistence/caching at all, you can always have Ignition wipe and re-mkfs each time.

jlebon commented 6 months ago

Another approach is to manually do the same XFS loopback on tmpfs trick on top of /var/lib/containers.

robbycuenot commented 6 months ago

Another approach is to manually do the same XFS loopback on tmpfs trick on top of /var/lib/containers.

I need to familiarize myself with how this process works, but it sounds like a potential solution :)