Closed goochjj closed 3 years ago
I just assumed it is not merged because of the #8508 's state, no further investigation.
For me Type=notify works with podman's --sdnotify=container
but only for the READY message, because all others don't get through after the runtime goes away, and --sdnotify=conmon
works too, but there is no NOTIFY_SOCKET set in the container (I guess that is what it should do).
I just updated an pushed it. This PR has been floundering for a while, I guess I will need to pay more attention to it.
Jul 16 08:57:12 localhost.localdomain conmon[186897]: File "/app/sdnotify_py.py", line 15, in
Jul 16 08:57:12 localhost.localdomain conmon[186897]: n = sdnotify.SystemdNotifier(debug=True) Jul 16 08:57:12 localhost.localdomain conmon[186897]: File "/usr/local/lib/python3.9/site-packages/sdnotify/init.py", line 37, in init Jul 16 08:57:12 localhost.localdomain conmon[186897]: if addr[0] == '@': Jul 16 08:57:12 localhost.localdomain conmon[186897]: TypeError: 'NoneType' object is not subscriptable
@rhatdan this is what you get if the env var NOTIFY_SOCKET is not set in the container. Which is the case with --sdnotify=conmon
.
Wait - #8508 isn't merged yet?
@vrothberg Didn't we just swap
generate systemd
to use notify by default? If so, and #8508 is not merged, I think we need an immediate revert of that. The behavior of sdnotify when the OCI runtime is used is not sane.
By default, it's setting --sdnotify=conmon where everything's handled on the Podman-side of things.
OK Good. We're set then.
Is this a BUG REPORT or FEATURE REQUEST? (leave only one on its own line)
/kind bug
Description
Creating a service using Type=notify freezes everything until the container is ready.
Setting a service as Type=notify sets the NOTIFY_SOCKET, which gets passed through podman properly. Runc and Crun then proxy that NOTIFY_SOCKET through to the container, indicating the container will signal when ready. The whole idea being that starting a container does not equal "a container is ready", this initialization could take seconds, 10's of seconds, or minutes. And it shouldn't matter how long it takes..
The problem is while it's in "starting" status, podman is frozen completely.
podman ps
doesn't return,podman exec
doesn't work, evenpodman info
won't return. One partially initialized container shouldn't freeze everything, and the lack of exec makes it hard to diagnose what's going on inside the container to resolve the sd-notify issue.podman stop
andpodman kill
appear to work, but the container is still stuck.In addition, the MAINPID isn't set right - but we'll come back to that.
Steps to reproduce the issue:
[Service] Environment=PODMAN_SYSTEMD_UNIT=%n SyslogIdentifier=%N ExecStartPre=-/usr/bin/podman stop %N ExecStartPre=-/usr/bin/podman rm %N LogExtraFields=CONTAINER_NAME=%N ExecStart=/usr/bin/podman --log-level=debug run \ -d --log-driver=journald \ --init \ --cgroups no-conmon \ --net=host \ --name %N \ alpine sleep infinity ExecStop=/usr/bin/podman stop -t 20 %N Type=notify NotifyAccess=all Restart=on-failure
Restart=always
RestartSec=30s TimeoutStartSec=20 TimeoutStopSec=25
KillMode=none
Type=forking
PIDFile=/run/podman-pid-%n
Delegate=yes Slice=machine.slice
[Install] WantedBy=multi-user.target default.target
notifytest[8595]: time="2020-06-19T13:22:38Z" level=debug msg="Starting container e6043f58bcd610d1e448739f2120447f2880c9b498c65fc3c181e1f453a48ef7 with command
Main PID: 12003 (podman) Tasks: 22 (limit: 4915) Memory: 27.9M CGroup: /machine.slice/notifytest.service ├─12003 /usr/share/gocode/src/github.com/containers/libpod/bin/podman --log-level=debug run -d --log-driver=journald --init --cgroups no-conmon --net=host --name notifytest alpine sleep infinity ├─12065 /usr/libexec/podman/conmon --api-version 1 -c a3e79ea772bdcca69020eca158f059718ff0f4b34dd1c8f8af5e77c6840e60f0 -u a3e79ea772bdcca69020eca158f059718ff0f4b34dd1c8f8af5e77c6840e60f0 -r /usr/bin/runc -b /va> └─12084 /usr/bin/runc start a3e79ea772bdcca69020eca158f059718ff0f4b34dd1c8f8af5e77c6840e60f0
Version: 2.0.0-dev API Version: 1 Go Version: go1.13.3 Git Commit: b27df834c18b08bb68172fa5bd5fd12a5cd54633 Built: Thu Jun 18 12:19:01 2020 OS/Arch: linux/amd64
host: arch: amd64 buildahVersion: 1.15.0 cgroupVersion: v1 conmon: package: Unknown path: /usr/libexec/podman/conmon version: 'conmon version 2.0.18-dev, commit: 954b05a7908c0aeeff007ebd19ff662e20e5f46f' cpus: 4 distribution: distribution: photon version: "3.0" eventLogger: file hostname: photon-machine idMappings: gidmap: null uidmap: null kernel: 4.19.115-6.ph3-esx linkmode: dynamic memFree: 5536968704 memTotal: 8359960576 ociRuntime: name: runc package: runc-1.0.0.rc9-2.ph3.x86_64 path: /usr/bin/runc version: |- runc version 1.0.0-rc10+dev commit: 2a0466958d9af23af2ad12bd79d06ed0af4091e2 spec: 1.0.2-dev os: linux remoteSocket: path: /run/podman/podman.sock rootless: false slirp4netns: executable: "" package: "" version: "" swapFree: 0 swapTotal: 0 uptime: 25h 9m 27.25s (Approximately 1.04 days) registries: search:
Package info (e.g. output of
rpm -q podman
orapt list podman
):Compiled from source.
Additional environment details (AWS, VirtualBox, physical, etc.):