containers / podman

Podman: A tool for managing OCI containers and pods.
https://podman.io
Apache License 2.0
23.71k stars 2.41k forks source link

pull EXTEND_TIMEOUT_USEC: dial unixgram $NOTIFY_SOCKET: connect: ENOENT #23798

Closed edsantiago closed 1 month ago

edsantiago commented 2 months ago

Failure in the USEC test:

[+0401s] not ok 200 [260] podman pull - EXTEND_TIMEOUT_USEC in 1058ms
         ...
         #
<+     > # # podman --storage-driver vfs --root /tmp/CI_Wbi1/bats-run-9BZCtY/suite/podman-bats-registry/root --runroot /tmp/CI_Wbi1/bats-run-9BZCtY/suite/podman-bats-registry/runroot --tmpdir /tmp/CI_Wbi1/bats-run-9BZCtY/suite/podman-bats-registry/tmpdir network reload --all
<+213ms> # 326a26406461fb291f701bac8691416115b1c2983cfc4c056ef06d7697a914f3
         #
<+019ms> # # podman push --tls-verify=false --creds userqhud:pw3uManIPnWDSnDBy quay.io/libpod/testimage:20240123 localhost:42915/name:tag
<+122ms> # Getting image source signatures
         # Copying blob sha256:5f70bf18a086007016e948b04aed3b82103a36bea41755b6cddfaf10ace3c6ef
         # Copying blob sha256:25a964f60e84b2d24f8098376c985a5c057bea74d699ee50c93e417c26516cc3
         # Copying config sha256:1f6acd4c4a1d4d39395870e1c12dbb68da27dd432afecf86e7cb6e23cf4b75d0
         # Writing manifest to image destination
         #
<+011ms> # # podman pull --tls-verify=false --creds userqhud:pw3uManIPnWDSnDBy localhost:42915/name:tag
<+050ms> # Trying to pull localhost:42915/name:tag...
         # Error: dial unixgram /tmp/CI_Wbi1/podman_bats.dM0zjZ/notify.sock: connect: no such file or directory
<+005ms> # [ rc=125 (** EXPECTED 0 **) ]

Happening only in #23275, but this is NOT A PARALLEL TEST. The failure is happening in pass 1, the serial tests, which run before parallel tests.

This is almost certainly my problem to deal with. I'm just filing because all these flakes are getting hard to track.

x x x x x x
sys(6) podman(6) debian-13(5) root(6) host(6) sqlite(5)
fedora-39(1) boltdb(1)
Luap99 commented 2 months ago

ENOENT possibly means the socat process was never started, exited before we tried to connect (also exits and deletes the socket on something like SIGTERM but if it got SIGKILL we should get ECONNREFUSED instead). We do have no error checking on the socat process so this is a possibility but if I read the setup correctly we should still see stderr from it!? And lastly something might be deleting the socket file but if there are no parallel things going on it seems really strange.

I would recommend to add a simple file check before the pull command to see if the socket is there

github-actions[bot] commented 1 month ago

A friendly reminder that this issue had no activity for 30 days.