Closed EmilienM closed 5 years ago
@rhatdan PTAL - Seems to be an SELinux issue. The code in Podman seems fine (we relabel the path in question with the mount label, nothing special) - could be the go-selinux Relabel code?
For reference:
:ro,z
Our call out to go-selinux to relabel volume mounts: https://github.com/containers/libpod/blob/master/libpod/container_internal_linux.go#L159-L161
FTR we also track it in OpenStack: https://bugs.launchpad.net/tripleo/+bug/1800737/
This seems pretty clear that the /var/lib/config-data directory does not exists. Podman is different then docker in that it does NOT create SRC volumes when they don't exists. If you were relying on this BUG in docker for podman, you will need to do a mkdir -p /var/lib/config-data; podman ...
, to make sure the directory exists before the relabel is attempted.
I believe this is a BUG in Docker, because it can lead to user creating content with typos in their commands. For example, imaging I typo'd the above command -v /var/lib/configdata:/var/lib/config-data,rw,Z
, In Docker it would create the typo'd directory and you could end up with unexpected errors, when other tools looked for /var/lib/config-data on the host.
@rhatdan I'm 99% sure that the directory does exist, it's actually manage by Ansible and you can see its creation here:
2018-10-30 21:08:15.606 17849 WARNING tripleoclient.v1.tripleo_deploy.Deploy [ ] TASK [Create /var/lib/config-data directory] ***********************************
2018-10-30 21:08:15.803 17849 WARNING tripleoclient.v1.tripleo_deploy.Deploy [ ] changed: [undercloud]
2018-10-30 21:08:15.842 17849 WARNING tripleoclient.v1.tripleo_deploy.Deploy [ ]
Or here:
Invoked with directory_mode=None force=False remote_src=None path=/var/lib/config-data owner=None follow=True group=None unsafe_writes=None state=directory content=NOT_LOGGING_PARAMETER serole=None diff_peek=None setype=svirt_sandbox_file_t selevel=s0 original_basename=None regexp=None validate=None src=None seuser=None recurse=False delimiter=None mode=None attributes=None backup=None
And here in the code: https://github.com/openstack/tripleo-heat-templates/blob/3b68405f5a94f18df989522526150bf0f53809e2/common/deploy-steps-tasks.yaml#L246-L251
I'm going to verify this 1% of incertitude today but please note that it usually fails on one container while other get deployed, with the same bind mounts (including /var/lib/config-data). Also Please note that it worked fine with podman 0.9 and seems broken for us in 0.10.
Thanks
@rhatdan also, to demonstrate that the error message isn't the same when the directory doesn't exist:
[root@undercloud ~]# podman run --rm -ti -v /foo:/bar busybox bash
Trying to pull docker.io/busybox:latest...Getting image source signatures
Copying blob sha256:90e01955edcd85dac7985b72a8374545eac617ccdddcc992b732e43cd42534af
710.92 KB / 710.92 KB [====================================================] 0s
Copying config sha256:59788edf1f3e78cd0ebe6ce1446e9d10788225db3dedcfd1a59f764bad2b2690
1.46 KB / 1.46 KB [========================================================] 0s
Writing manifest to image destination
Storing signatures
error checking path "/foo": stat /foo: no such file or directory
[root@undercloud ~]# podman run --rm -ti -v /foo:/bar:z,rw busybox bash
error checking path "/foo": stat /foo: no such file or directory
See error checking path "/foo": stat /foo: no such file or directory
versus relabel failed \"/var/lib/config-data\": no such file or directory"
. Again, let me confirm all of that today but if you have any clue, let us know.
and in addition, I just found out that we make sure /var/lib/config-data
really exists:
https://git.openstack.org/cgit/openstack/tripleo-heat-templates/tree/docker/docker-puppet.py#n63
So with that, it's pretty clear that the directory is here and the error message is probably wrong during the relabelling issue.
Also why is podman trying to relabel the directory while we run the container with --security-opt label=disable
?
That's a good question - @rhatdan Should we skip the -z/-Z
relabel if we are running with SELinux disabled?
Meanwhile, been tracing down what's going on here. Fairly certain that the no such file or directory
is an ENOENT
coming out of an lsetxattr()
to set the SELinux label, which seems to indicate this is coming out of the kernel?
Per the official docks on setxattr()
and related calls, ENOENT
is the standard "a component of the path does not exist", with no twists - so, per the kernel, the file in question does not exist, when it pretty clearly does?
To prove that /var/lib/config-data REALLY exists:
1) In our CI scripts, we collect logs and we copy /var/lib/config-data/puppet-generated into /var/log/config-data. The code is here: https://git.openstack.org/cgit/openstack-infra/tripleo-ci/tree/scripts/get_docker_logs.sh#n47
2) In a failing job, you can see that the directory was successfully collected: http://logs.openstack.org/40/613640/1/gate/tripleo-ci-centos-7-containers-multinode/e212bb8/logs/undercloud/var/log/config-data/
Which means /var/lib/config-data/puppet-generated does exist, therefore /var/lib/config-data is here.
We did make a change to always have a mount label even when SELinux Labeling is disabled, which is probably what the difference is here.
What kind of file system is mounted on /var/log? Just a normal ext4 or xfs?
partitions => {"vda1"=>{"uuid"=>"d56e4695-de15-46eb-8259-25a16ed8f6ce", "size"=>"335542239", "mount"=>"/", "label"=>"cloudimg-rootfs", "filesystem"=>"ext4"}}
so / is ext4
I proposed this workaround for now: https://review.openstack.org/614825 (remove -z from /var/lib/config-data mount)
If you are running a confined domain this will not work unless you pre label the content as container_file_t:s0.
this is a TOCTTOU issue. I think this depends on the golang binding forChcon
not being atomic, the /var/lib/config-data
is probably read/written by other processes and by the time we are walking the directory some files are deleted/moved so that the lsetxattr
fails.
I think the solution is to modify Chcon
to not give up on an ENOENT
(or avoiding relabelling /var/lib/config-data
).
Is this a BUG REPORT or FEATURE REQUEST?:
kind bug
Description Since we have podman-0.10.1.3-2.git6e1aeb0.el7.x86_64 in OpenStack CI, we have more than 65% of failure in our jobs and they fail for the same reason: relabelling a one directory (always the same).
Steps to reproduce the issue:
Running podman command:
It fails randomly, and never on the same container, but always on /var/lib/config-data relabelling.
Describe the results you received:
Describe the results you expected: Container should be run and relabelling should work.
Additional information you deem important (e.g. issue happens only occasionally):
Output of
podman version
:Output of
podman info
:Additional environment details (AWS, VirtualBox, physical, etc.): The CI jobs run in VMs, with 8vcpu, 8GB of RAM and 8GB of swap.