containers / podman

Podman: A tool for managing OCI containers and pods.
https://podman.io
Apache License 2.0
23.77k stars 2.42k forks source link

race: system df: failed to get read/write size of container: /proc/something: ENOENT or EINVAL #23909

Closed edsantiago closed 1 month ago

edsantiago commented 2 months ago

In window 1:

$ sudo bash -c 'while :;do bin/podman system df --format "{{\"\"}}" || break;done';beep  

In window 2:

$ while :;do hack/bats --root 252:rootfs || break;done

Doesn't take too long for window 1 to fail:

Error: failed to get read/write size of container xxx: readdirent /proc/1347012/net: invalid argument

Also seen:

Error: failed to get read/write size of container xxx: lstat /proc/2688266/fd/5: no such file or directory

Found in parallel testing. It should be possible to minimize the reproducer if necessary.

Luap99 commented 1 month ago

Why do we even read /proc when computing the size? That doesn't make sense as these are not real files.

Luap99 commented 1 month ago

Oh I see why, the quadlet is using a overlyfs rootfs over / and podman of course in order to know the size has to walk over everything and because well /proc is a normal mount point on the host it walks /proc there too. The actual containers /proc is mounted in the container mount namespace only so we would not walk that by default. Only because we see the /proc on the host here it happens...

Luap99 commented 1 month ago

So I think found the cause for the ENOENT errors (https://github.com/containers/storage/pull/2097) which seems like a real bug that can always happen, however walking /proc in general seem to generate a bunch of weird errors, i.e. EINVAL and that is certainly not an error that we like to ignore by default when walking+stating each file to get the size.

I am not sure how important this is, I guess it blocks you from running this rootfs test in parallel? We can try something like this:

diff --git a/test/system/252-quadlet.bats b/test/system/252-quadlet.bats
index 3ef297b869..566a512ab5 100644
--- a/test/system/252-quadlet.bats
+++ b/test/system/252-quadlet.bats
@@ -726,7 +726,12 @@ EOF
     local quadlet_file=$PODMAN_TMPDIR/basic_$(safename).container
     cat > $quadlet_file <<EOF
 [Container]
-Rootfs=/:O
+Rootfs=$PODMAN_TMPDIR:O
+Volume=/usr:/usr
+Volume=/lib:/lib
+Volume=/lib64:/lib64
+Volume=/etc:/etc
+Volume=/var:/var
 Exec=sh -c "echo STARTED CONTAINER; echo "READY=1" | socat -u STDIN unix-sendto:\$NOTIFY_SOCKET; top -b"
 Notify=yes
 EOF

Or like we do in other tests use an actual mounted container image as source for rootfs which I guess is more robust then only mounting certain host directories.

ygalblum commented 1 month ago

@Luap99 but then what are you really testing? If we want to avoid this walk, I think the test will need to prepare the rootfs directory and use it and not mount all these directories.

edsantiago commented 1 month ago

See https://github.com/containers/podman/pull/23275/commits/522c718bf5b502554921976beacf9d6226b207a7

ygalblum commented 1 month ago

See 522c718

Yes, this is what I meant. Thanks