Closed lsm5 closed 1 year ago
Can you check systemctl --user status podman.service
, maybe the podman system service process itself is broken and not accepting new connections?
It's always been active whenever I've checked. Or, if I have to wildly speculate, it became
active after running that command and started accepting jobs, which, so far, is the only explanation I have.
$ systemctl --user status podman.socket
● podman.socket - Podman API Socket
Loaded: loaded (/usr/lib/systemd/user/podman.socket; enabled; preset: disabled)
Active: active (listening) since Fri 2022-09-30 14:42:47 UTC; 21min ago
Until: Fri 2022-09-30 14:42:47 UTC; 21min ago
Triggers: ● podman.service
Docs: man:podman-system-service(1)
Listen: /run/user/1000/podman/podman.sock (Stream)
CGroup: /user.slice/user-1000.slice/user@1000.service/app.slice/podman.socket
Sep 30 14:42:47 lmandvek-fedora-gitlab-runner.c.libpod-218412.internal systemd[1704]: Listening on podman.socket - Podman API Socket.
See the continuously failing jobs in the pipeline list at: https://gitlab.com/rhcontainerbot/pkg-builder/-/pipelines . The 2nd one from top which succeeded was a result of a manual retry, and the most recent one was automatically run only a few minutes after the manual rerun, so I guess the socket stayed active in that time interval.
Here's runner config info if it helps:
$ sudo cat /etc/gitlab-runner/config.toml
concurrent = 50
check_interval = 0
[session_server]
session_timeout = 1800
[[runners]]
name = "lmandvek-fedora-gitlab-runner"
url = "https://gitlab.com"
id = 17760853
token = $TOKEN_REDACTED
token_obtained_at = 2022-09-27T18:26:21Z
token_expires_at = 0001-01-01T00:00:00Z
executor = "docker"
environment = ["FF_NETWORK_PER_BUILD=0"]
[runners.custom_build_dir]
[runners.cache]
[runners.cache.s3]
[runners.cache.gcs]
[runners.cache.azure]
[runners.docker]
tls_verify = false
image = "registry.gitlab.com/rhcontainerbot/pkg-builder:fedora"
privileged = false
disable_entrypoint_overwrite = false
oom_kill_disable = false
disable_cache = false
volumes = ["/cache"]
shm_size = 0
host = "unix:///run/user/1000/podman/podman.sock"
not the socket, the service! systemctl --user status podman.service
. As long as the system service process is still running systemd will not spawn a new service process. Maybe the podman system service process gets stucked.
ah whoops, i read your previous comment wrong, my bad.
$ systemctl --user status podman.service
○ podman.service - Podman API Service
Loaded: loaded (/usr/lib/systemd/user/podman.service; disabled; preset: disabled)
Active: inactive (dead) since Fri 2022-09-30 15:02:31 UTC; 11min ago
Duration: 1min 6.588s
TriggeredBy: ● podman.socket
Docs: man:podman-system-service(1)
Process: 12768 ExecStart=/usr/bin/podman $LOGGING system service (code=exited, status=0/SUCCESS)
Main PID: 12768 (code=exited, status=0/SUCCESS)
CPU: 23.629s
Sep 30 15:02:26 lmandvek-fedora-gitlab-runner.c.libpod-218412.internal podman[12768]: @ - - [30/Sep/2022:15:02:26 +0000] "DELETE /v1.41/containers/754fdda1700be63ff3f8d77689b5a2c5832d09a33c2f15d862509dca6e00153e?force=1&v=1 HTTP/1.1" 204 0 "" "Go-http-client/1.1"
Sep 30 15:02:26 lmandvek-fedora-gitlab-runner.c.libpod-218412.internal podman[12768]: time="2022-09-30T15:02:26Z" level=info msg="Request Failed(Internal Server Error): container 754fdda1700be63ff3f8d77689b5a2c5832d09a33c2f15d862509dca6e00153e does not exist in database: no such container"
Sep 30 15:02:26 lmandvek-fedora-gitlab-runner.c.libpod-218412.internal podman[12768]: @ - - [30/Sep/2022:15:02:26 +0000] "GET /v1.41/networks HTTP/1.1" 500 178 "" "Go-http-client/1.1"
Sep 30 15:02:26 lmandvek-fedora-gitlab-runner.c.libpod-218412.internal podman[12768]: 2022-09-30 15:02:26.376422149 +0000 UTC m=+61.273672352 container remove 177d5ba021688d30e59c21afdae84a7cca6314f31515f9397ba070b55fdd3e33 (image=registry.gitlab.com/gitlab-org/gitlab-runner/gitlab-runner-helper:x86_64-43b2dc3d, name>
Sep 30 15:02:26 lmandvek-fedora-gitlab-runner.c.libpod-218412.internal podman[12768]: @ - - [30/Sep/2022:15:02:26 +0000] "DELETE /v1.41/containers/177d5ba021688d30e59c21afdae84a7cca6314f31515f9397ba070b55fdd3e33?force=1&v=1 HTTP/1.1" 204 0 "" "Go-http-client/1.1"
Sep 30 15:02:26 lmandvek-fedora-gitlab-runner.c.libpod-218412.internal podman[12768]: 2022-09-30 15:02:26.582305812 +0000 UTC m=+61.479556035 container remove 1be38a82e621e1b53c1faf098bb5ee2b2283d6a0e0e113c7c09e647a75686dbe (image=registry.gitlab.com/rhcontainerbot/pkg-builder:fedora, name=runner-uf1gckrg-project-132>
Sep 30 15:02:26 lmandvek-fedora-gitlab-runner.c.libpod-218412.internal podman[12768]: @ - - [30/Sep/2022:15:02:26 +0000] "DELETE /v1.41/containers/1be38a82e621e1b53c1faf098bb5ee2b2283d6a0e0e113c7c09e647a75686dbe?force=1&v=1 HTTP/1.1" 204 0 "" "Go-http-client/1.1"
Sep 30 15:02:26 lmandvek-fedora-gitlab-runner.c.libpod-218412.internal podman[12768]: 2022-09-30 15:02:26.593532671 +0000 UTC m=+61.490782904 container remove e381c291e8490744c830688870eff6f8062b6c1f9b8f1cda09875d61dc61b9c5 (image=registry.gitlab.com/gitlab-org/gitlab-runner/gitlab-runner-helper:x86_64-43b2dc3d, name>
Sep 30 15:02:26 lmandvek-fedora-gitlab-runner.c.libpod-218412.internal podman[12768]: @ - - [30/Sep/2022:15:02:26 +0000] "DELETE /v1.41/containers/e381c291e8490744c830688870eff6f8062b6c1f9b8f1cda09875d61dc61b9c5?force=1&v=1 HTTP/1.1" 204 0 "" "Go-http-client/1.1"
Sep 30 15:02:31 lmandvek-fedora-gitlab-runner.c.libpod-218412.internal systemd[1704]: podman.service: Consumed 23.629s CPU time.
do i need to enable the service and keep it enabled explicitly? The current docs don't mention it https://docs.gitlab.com/runner/executors/docker.html so maybe that needs to change?
No, the socket should start the service once a connections happen. The podman service process will then exit when it does not handle active connections after 5 seconds. For the next connection systemd should start it again, looks like it exited in your output so I would think the systemd socket should start it again.
Can you try to curl the socket manually and see if this works? Or just use podman-remote. If this works the gitlab runner is doing something weird.
things seem to work better with enabling a system connection. I'll keep checking how runs go over the next few days. Thanks @Luap99
A friendly reminder that this issue had no activity for 30 days.
A friendly reminder that this issue had no activity for 30 days.
Closing this one as I'll likely be using @cevich's podman-in-podman method which is waiting on https://github.com/containers/podman/issues/16576
FWIW: Failed to remove network for build
I initially hit this and found it for gitlab-runner, inside a container, you pretty much have to use host-mode networking. Podman's networking vs docker is just too much of a difference for the runner to handle.
Is this a BUG REPORT or FEATURE REQUEST? (leave only one on its own line)
/kind bug
Description
I have followed the steps for enabling podman as a gitlab runner on gitlab.com.
podman.socket
is enabled and active yet new jobs consistently fail.Now, if I run a
systemctl --user status podman.socket
and then try the CI jobs again, they pass. But the error comes back for the next hourly run.Steps to reproduce the issue:
Describe the results you received:
Describe the results you expected:
Additional information you deem important (e.g. issue happens only occasionally):
Output of
podman version
:Latest available podman on Fedora 37 or CentOS 9 Stream
Additional environment details (AWS, VirtualBox, physical, etc.):
The runner instance is on GCE.