Open stevenhorsman opened 2 days ago
I've had a quick debug and try to understand the systemd tempfiles service. If I manually start it then it correctly creates the file:
# ls -al /run/peerpod/
total 8
drwxr-xr-x 2 root root 80 Sep 16 12:11 .
drwxr-xr-x 12 root root 260 Sep 16 12:11 ..
-rw-r--r-- 1 root root 87 Sep 16 12:11 agent-config.toml
-rw-r--r-- 1 root root 2460 Sep 16 12:11 daemon.json
# systemctl status systemd-tmpfiles-setup.service
○ systemd-tmpfiles-setup.service - Create System Files and Directories
Loaded: loaded (/usr/lib/systemd/system/systemd-tmpfiles-setup.service; static)
Drop-In: /usr/lib/systemd/system/service.d
└─10-timeout-abort.conf
Active: inactive (dead)
Docs: man:tmpfiles.d(5)
man:systemd-tmpfiles(8)
[root@2e9b957ef8b4 /]# systemctl start systemd-tmpfiles-setup.service
[root@2e9b957ef8b4 /]# systemctl status systemd-tmpfiles-setup.service
● systemd-tmpfiles-setup.service - Create System Files and Directories
Loaded: loaded (/usr/lib/systemd/system/systemd-tmpfiles-setup.service; static)
Drop-In: /usr/lib/systemd/system/service.d
└─10-timeout-abort.conf
Active: active (exited) since Mon 2024-09-16 12:13:39 UTC; 2s ago
Docs: man:tmpfiles.d(5)
man:systemd-tmpfiles(8)
Process: 177 ExecStart=systemd-tmpfiles --create --remove --boot --exclude-prefix=/dev (code=exited, status=0/SUCCESS)
Main PID: 177 (code=exited, status=0/SUCCESS)
CPU: 26ms
Sep 16 12:13:39 2e9b957ef8b4 systemd[1]: Starting systemd-tmpfiles-setup.service - Create System Files and Directories...
Sep 16 12:13:39 2e9b957ef8b4 systemd[1]: Finished systemd-tmpfiles-setup.service - Create System Files and Directories.
[root@2e9b957ef8b4 /]# ls -al /run/peerpod/
total 12
drwxr-xr-x 2 root root 100 Sep 16 12:13 .
drwxr-xr-x 22 root root 520 Sep 16 12:13 ..
-rw-r--r-- 1 root root 87 Sep 16 12:11 agent-config.toml
-rw-r--r-- 1 root root 2460 Sep 16 12:11 daemon.json
-rw-r--r-- 1 root root 1359 Sep 11 16:38 policy.rego
so I don't know if it's not triggering on boot (like I assume it is supposed to?) for some reason?
that's really curious. I just built an mkosi image from main (x86_64), and the policy files is provisoned:
[root@fedora ~]# ls -l /run/peerpod/
total 12
-rw-r--r-- 1 root root 87 Sep 16 15:07 agent-config.toml
-rw-r--r-- 1 root root 2500 Sep 16 15:07 daemon.json
-rw-r--r-- 1 root root 1359 Jan 1 1970 policy.rego
[root@fedora ~]# md5sum /run/peerpod/policy.rego /etc/kata-opa/allow-all.rego
1ea0434ea12c9bb86ecfc06d44fe16bb /run/peerpod/policy.rego
1ea0434ea12c9bb86ecfc06d44fe16bb /etc/kata-opa/allow-all.rego
that's really curious. I just built an mkosi image from main (x86_64), and the policy files is provisoned:
[root@fedora ~]# ls -l /run/peerpod/ total 12 -rw-r--r-- 1 root root 87 Sep 16 15:07 agent-config.toml -rw-r--r-- 1 root root 2500 Sep 16 15:07 daemon.json -rw-r--r-- 1 root root 1359 Jan 1 1970 policy.rego [root@fedora ~]# md5sum /run/peerpod/policy.rego /etc/kata-opa/allow-all.rego 1ea0434ea12c9bb86ecfc06d44fe16bb /run/peerpod/policy.rego 1ea0434ea12c9bb86ecfc06d44fe16bb /etc/kata-opa/allow-all.rego
That is interesting, maybe there is something different about the docker image environment, though @katexochen wasn't testing the docker provider when he hit it from what I understand?
We had further discussion on slack about this. This issue is specific to the docker cloudprovider, the tmpfiles setup service unit is not being disablled in this setup, which will mean policy.json
isn't provisioned to the /run/peerpod/daemon.json
in the docker container.
A few weeks ago https://github.com/confidential-containers/cloud-api-adaptor/pull/1998 introduced redirecting the default policy file (
/etc/kata-opa/default-policy.rego
which is hardcoded in kata-containers) to/run/peerpod/policy.rego
in order to support the policy being set through init-data. It also added thesrc/cloud-api-adaptor/podvm/files/etc/tmpfiles.d/policy.conf
directive copyingetc/kata-opa/allow-all.rego
to/run/peerpod/policy.rego
, so that if the policy wasn't supplied through init data, there would be a default, however this doesn't seem to be working.I created a new docker podvm image via the instructions (which is available as quay.io/stevenhorsman/podvm-docker-image:latest) and created a peer pod from it. The create container failed, so after removing the delete instance code I exec'd into the docker container to debug.
The journal log isn't very helpful:
but luckily Paul hit the same problem, on the same day as me and used
strace
to discover an error that the policy file didn't exist. Sure enough, when I checked that I found:so it looks like the
src/cloud-api-adaptor/podvm/files/etc/tmpfiles.d/policy.conf
directive isn't working?