confidential-containers / cloud-api-adaptor

Ability to create Kata pods using cloud provider APIs aka the peer-pods approach
Apache License 2.0
47 stars 78 forks source link

podvm mkosi: Default policy not working #2041

Open stevenhorsman opened 2 days ago

stevenhorsman commented 2 days ago

A few weeks ago https://github.com/confidential-containers/cloud-api-adaptor/pull/1998 introduced redirecting the default policy file (/etc/kata-opa/default-policy.rego which is hardcoded in kata-containers) to /run/peerpod/policy.rego in order to support the policy being set through init-data. It also added the src/cloud-api-adaptor/podvm/files/etc/tmpfiles.d/policy.conf directive copying etc/kata-opa/allow-all.rego to /run/peerpod/policy.rego, so that if the policy wasn't supplied through init data, there would be a default, however this doesn't seem to be working.

I created a new docker podvm image via the instructions (which is available as quay.io/stevenhorsman/podvm-docker-image:latest) and created a peer pod from it. The create container failed, so after removing the delete instance code I exec'd into the docker container to debug.

The journal log isn't very helpful:

# journalctl -u kata-agent
Sep 14 00:39:26 5ecb23a3fb12 systemd[1]: Starting kata-agent.service - Kata Agent...
Sep 14 00:39:26 5ecb23a3fb12 kata-agent[81]: umount: /sys/fs/cgroup/misc: no mount point specified.
Sep 14 00:39:26 5ecb23a3fb12 systemd[1]: Started kata-agent.service - Kata Agent.
Sep 14 00:39:26 5ecb23a3fb12 kata-agent[82]: {"msg":"announce","level":"INFO","ts":"2024-09-14T00:39:26.358895899Z","subsystem":"root","version":"0.1.0","pid":"82","source":"agent","name":"kata-agent","extra-features":"[\"agent-policy\", \"guest-pull\", \"seccomp\"]","api-version":"0.0.1","config":"AgentConfig { debug_console: false, dev_mode: false, log_level: Info, hotplug_timeout: 3s, debug_console_vport: 0, log_vport: 0, container_pipe_size: 0, server_addr: \"unix:///run/kata-containers/agent.sock\", passfd_listener_port: 0, unified_cgroup_hierarchy: false, tracing: false, supports_seccomp: true, https_proxy: \"\", no_proxy: \"\", guest_components_rest_api: Resource, guest_components_procs: None, image_registry_auth: \"\", secure_storage_integrity: false }","agent-version":"3.8.0","agent-commit":"3.8.0"}
Sep 14 00:39:26 5ecb23a3fb12 kata-agent[82]: {"msg":"https_proxy is not set (environment variable not found)","level":"INFO","ts":"2024-09-14T00:39:26.359260689Z","subsystem":"image","source":"agent","version":"0.1.0","name":"kata-agent","pid":"82"}
Sep 14 00:39:26 5ecb23a3fb12 kata-agent[82]: {"msg":"no_proxy is not set (environment variable not found)","level":"INFO","ts":"2024-09-14T00:39:26.359459638Z","subsystem":"image","source":"agent","pid":"82","version":"0.1.0","name":"kata-agent"}
Sep 14 00:39:26 5ecb23a3fb12 systemd[1]: kata-agent.service: Main process exited, code=dumped, status=6/ABRT
Sep 14 00:39:26 5ecb23a3fb12 kata-agent[128]: Can't remove  as it doesn't exist
Sep 14 00:39:26 5ecb23a3fb12 systemd[1]: kata-agent.service: Failed with result 'core-dump'.

but luckily Paul hit the same problem, on the same day as me and used strace to discover an error that the policy file didn't exist. Sure enough, when I checked that I found:

# ls -al /etc/kata-opa/
total 12
drwxr-xr-x 2 root root  139 Sep 11 16:38 .
drwxr-xr-x 1 root root   76 Sep 16 10:02 ..
-rw-r--r-- 1 root root 1361 Sep 11 16:38 allow-all-except-exec-process.rego
-rw-r--r-- 1 root root 1359 Sep 11 16:38 allow-all.rego
lrwxrwxrwx 1 root root   24 Sep 11 16:38 default-policy.rego -> /run/peerpod/policy.rego
-rw-r--r-- 1 root root 1395 Sep 11 16:38 disallow-all-except-setpolicy.rego
# ls -al /run/peerpod/policy.rego
ls: cannot access '/run/peerpod/policy.rego': No such file or directory

so it looks like the src/cloud-api-adaptor/podvm/files/etc/tmpfiles.d/policy.conf directive isn't working?

stevenhorsman commented 2 days ago

I've had a quick debug and try to understand the systemd tempfiles service. If I manually start it then it correctly creates the file:

# ls -al /run/peerpod/
total 8
drwxr-xr-x  2 root root   80 Sep 16 12:11 .
drwxr-xr-x 12 root root  260 Sep 16 12:11 ..
-rw-r--r--  1 root root   87 Sep 16 12:11 agent-config.toml
-rw-r--r--  1 root root 2460 Sep 16 12:11 daemon.json
# systemctl status systemd-tmpfiles-setup.service
○ systemd-tmpfiles-setup.service - Create System Files and Directories
     Loaded: loaded (/usr/lib/systemd/system/systemd-tmpfiles-setup.service; static)
    Drop-In: /usr/lib/systemd/system/service.d
             └─10-timeout-abort.conf
     Active: inactive (dead)
       Docs: man:tmpfiles.d(5)
             man:systemd-tmpfiles(8)
[root@2e9b957ef8b4 /]# systemctl start systemd-tmpfiles-setup.service
[root@2e9b957ef8b4 /]# systemctl status systemd-tmpfiles-setup.service
● systemd-tmpfiles-setup.service - Create System Files and Directories
     Loaded: loaded (/usr/lib/systemd/system/systemd-tmpfiles-setup.service; static)
    Drop-In: /usr/lib/systemd/system/service.d
             └─10-timeout-abort.conf
     Active: active (exited) since Mon 2024-09-16 12:13:39 UTC; 2s ago
       Docs: man:tmpfiles.d(5)
             man:systemd-tmpfiles(8)
    Process: 177 ExecStart=systemd-tmpfiles --create --remove --boot --exclude-prefix=/dev (code=exited, status=0/SUCCESS)
   Main PID: 177 (code=exited, status=0/SUCCESS)
        CPU: 26ms

Sep 16 12:13:39 2e9b957ef8b4 systemd[1]: Starting systemd-tmpfiles-setup.service - Create System Files and Directories...
Sep 16 12:13:39 2e9b957ef8b4 systemd[1]: Finished systemd-tmpfiles-setup.service - Create System Files and Directories.
[root@2e9b957ef8b4 /]# ls -al /run/peerpod/
total 12
drwxr-xr-x  2 root root  100 Sep 16 12:13 .
drwxr-xr-x 22 root root  520 Sep 16 12:13 ..
-rw-r--r--  1 root root   87 Sep 16 12:11 agent-config.toml
-rw-r--r--  1 root root 2460 Sep 16 12:11 daemon.json
-rw-r--r--  1 root root 1359 Sep 11 16:38 policy.rego

so I don't know if it's not triggering on boot (like I assume it is supposed to?) for some reason?

mkulke commented 2 days ago

that's really curious. I just built an mkosi image from main (x86_64), and the policy files is provisoned:

[root@fedora ~]# ls -l /run/peerpod/
total 12
-rw-r--r-- 1 root root   87 Sep 16 15:07 agent-config.toml
-rw-r--r-- 1 root root 2500 Sep 16 15:07 daemon.json
-rw-r--r-- 1 root root 1359 Jan  1  1970 policy.rego
[root@fedora ~]# md5sum /run/peerpod/policy.rego /etc/kata-opa/allow-all.rego
1ea0434ea12c9bb86ecfc06d44fe16bb  /run/peerpod/policy.rego
1ea0434ea12c9bb86ecfc06d44fe16bb  /etc/kata-opa/allow-all.rego
stevenhorsman commented 2 days ago

that's really curious. I just built an mkosi image from main (x86_64), and the policy files is provisoned:

[root@fedora ~]# ls -l /run/peerpod/
total 12
-rw-r--r-- 1 root root   87 Sep 16 15:07 agent-config.toml
-rw-r--r-- 1 root root 2500 Sep 16 15:07 daemon.json
-rw-r--r-- 1 root root 1359 Jan  1  1970 policy.rego
[root@fedora ~]# md5sum /run/peerpod/policy.rego /etc/kata-opa/allow-all.rego
1ea0434ea12c9bb86ecfc06d44fe16bb  /run/peerpod/policy.rego
1ea0434ea12c9bb86ecfc06d44fe16bb  /etc/kata-opa/allow-all.rego

That is interesting, maybe there is something different about the docker image environment, though @katexochen wasn't testing the docker provider when he hit it from what I understand?

mkulke commented 1 day ago

We had further discussion on slack about this. This issue is specific to the docker cloudprovider, the tmpfiles setup service unit is not being disablled in this setup, which will mean policy.json isn't provisioned to the /run/peerpod/daemon.json in the docker container.

https://github.com/confidential-containers/cloud-api-adaptor/blob/7f07f9aa8271b5a309d836fbc117c50758d7dc1f/src/cloud-api-adaptor/podvm-mkosi/Dockerfile.podvm#L13