Podman child process left running after simple podman commands

jskov-jyskebank-dk commented 3 years ago

/kind bug

Description

When I run a simple podman command inside a container (i.e. no systemd), a podman sub-process is left running (sleeping).

(this is probably not a bug as such, but just surprising behavior)

Steps to reproduce the issue:

I run a container on OpenShift 4.4 which launches tini and a python server (just to keep the Pod running).
Run ps auxww to show live processes.
Run podman images (process 191 in the below output)
Run ps auxww (shows podman process 204 in the below output)
If a new podman images command is run, process 204 remains (does not spawn a new lingering process)

Describe the results you received:

It looks like this in the console:

h-5.0$ ps auxww
USER         PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND
podman         1  0.0  0.0   2344   736 ?        Ss   08:25   0:00 /tini -- python3 -m http.server
podman         7  0.0  0.0  21136 15752 ?        S    08:25   0:00 python3 -m http.server
podman         8  0.0  0.0   4788  4172 pts/0    Ss   08:25   0:00 sh
podman       190  0.0  0.0   6736  3200 pts/0    R+   08:46   0:00 ps auxww
sh-5.0$ podman --events-backend none images &
[1] 191
sh-5.0$ REPOSITORY  TAG     IMAGE ID  CREATED  SIZE

[1]+  Done                    podman --events-backend none images
sh-5.0$ ps auxww
USER         PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND
podman         1  0.0  0.0   2344   736 ?        Ss   08:25   0:00 /tini -- python3 -m http.server
podman         7  0.0  0.0  21136 15752 ?        S    08:25   0:00 python3 -m http.server
podman         8  0.0  0.0   4788  4172 pts/0    Ss   08:25   0:00 sh
podman       204  0.0  0.0  48868 23316 ?        S    08:46   0:00 podman
podman       217  0.0  0.0   6736  3100 pts/0    R+   08:47   0:00 ps auxww
sh-5.0$ podman --events-backend none images &
[1] 218
sh-5.0$ REPOSITORY  TAG     IMAGE ID  CREATED  SIZE

[1]+  Done                    podman --events-backend none images
sh-5.0$ ps auxww
USER         PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND
podman         1  0.0  0.0   2344   736 ?        Ss   08:25   0:00 /tini -- python3 -m http.server
podman         7  0.0  0.0  21136 15752 ?        S    08:25   0:00 python3 -m http.server
podman         8  0.0  0.0   4788  4172 pts/0    Ss   08:25   0:00 sh
podman       204  0.0  0.0  48868 23316 ?        S    08:46   0:00 podman
podman       239  0.0  0.0   6736  3176 pts/0    R+   08:47   0:00 ps auxww

Describe the results you expected:

I would expect no lingering sub-processes from podman, after it completes the list.

I wonder if it is related to the "systemd lingering" support, or maybe the server mode? It does not appear related to the API service, as this needs to be started explicitly.

Since the second invocation does not create a new subprocess, I assume this is something intentional. But then it should maybe be mention in the docs/trouble shooting guide. And/or be controllable with an option.

Output of podman version:

Version:      2.2.1
API Version:  2.1.0
Go Version:   go1.15.5
Built:        Tue Dec  8 15:37:50 2020
OS/Arch:      linux/amd64

Output of podman info --debug:

host:
  arch: amd64
  buildahVersion: 1.18.0
  cgroupManager: cgroupfs
  cgroupVersion: v1
  conmon:
    package: conmon-2.0.21-3.fc33.x86_64
    path: /usr/bin/conmon
    version: 'conmon version 2.0.21, commit: 0f53fb68333bdead5fe4dc5175703e22cf9882ab'
  cpus: 4
  distribution:
    distribution: fedora
    version: "33"
  eventLogger: journald
  hostname: podman-test-749877c57f-p9cfw
  idMappings:
    gidmap:
    - container_id: 0
      host_id: 1000
      size: 1
    - container_id: 1
      host_id: 10000
      size: 65536
    uidmap:
    - container_id: 0
      host_id: 1000
      size: 1
    - container_id: 1
      host_id: 10000
      size: 65536
  kernel: 4.18.0-193.28.1.el8_2.x86_64
  linkmode: dynamic
  memFree: 2864472064
  memTotal: 33724624896
  ociRuntime:
    name: runc
    package: runc-1.0.0-279.dev.gitdedadbf.fc33.x86_64
    path: /usr/bin/runc
    version: |-
      runc version 1.0.0-rc92+dev
      commit: c9a9ce0286785bef3f3c3c87cd1232e535a03e15
      spec: 1.0.2-dev
  os: linux
  remoteSocket:
    path: /tmp/podman-run-1000/podman/podman.sock
  rootless: true
  slirp4netns:
    executable: /usr/bin/slirp4netns
    package: slirp4netns-1.1.8-1.fc33.x86_64
    version: |-
      slirp4netns version 1.1.8
      commit: d361001f495417b880f20329121e3aa431a8f90f
      libslirp: 4.3.1
      SLIRP_CONFIG_VERSION_MAX: 3
      libseccomp: 2.5.0
  swapFree: 0
  swapTotal: 0
  uptime: 730h 26m 11.3s (Approximately 30.42 days)
registries:
  search:
  - registry.fedoraproject.org
  - registry.access.redhat.com
  - registry.centos.org
  - docker.io
store:
  configFile: /home/podman/.config/containers/storage.conf
  containerStore:
    number: 0
    paused: 0
    running: 0
    stopped: 0
  graphDriverName: vfs
  graphOptions: {}
  graphRoot: /home/podman/.local/share/containers/storage
  graphStatus: {}
  imageStore:
    number: 0
  runRoot: /tmp/containers-user-1000/containers
  volumePath: /home/podman/.local/share/containers/storage/volumes
version:
  APIVersion: 2.1.0
  Built: 1607438270
  BuiltTime: Tue Dec  8 15:37:50 2020
  GitCommit: ""
  GoVersion: go1.15.5
  OsArch: linux/amd64
  Version: 2.2.1

Package info (e.g. output of rpm -q podman or apt list podman):

podman-2.2.1-1.fc33.x86_64

Have you tested with the latest version of Podman and have you checked the Podman Troubleshooting Guide?

Yes

Additional environment details (AWS, VirtualBox, physical, etc.):

mheon commented 3 years ago

This is probably the rootless pause process. It keeps the rootless user namespace alive for future Podman commands to use.

On Thu, Dec 17, 2020 at 03:16 Jesper Skov notifications@github.com wrote:

/kind bug

Description

When I run a simple podman command inside a container (i.e. no systemd), a podman sub-process is left running (sleeping).

(this is probably not a bug as such, but just surprising behavior)

Steps to reproduce the issue:

I run a container on OpenShift 4.4 which launches tini and a python server (just to keep the Pod running).

Run ps auxww to show live processes.

Run podman images (process 191 in the below output)

Run ps auxww (shows podman process 204 in the below output)

If a new podman images command is run, process 204 remains (does not spawn a new lingering process)

Describe the results you received:

It looks like this in the console:

h-5.0$ ps auxwwUSER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMANDpodman 1 0.0 0.0 2344 736 ? Ss 08:25 0:00 /tini -- python3 -m http.serverpodman 7 0.0 0.0 21136 15752 ? S 08:25 0:00 python3 -m http.serverpodman 8 0.0 0.0 4788 4172 pts/0 Ss 08:25 0:00 shpodman 190 0.0 0.0 6736 3200 pts/0 R+ 08:46 0:00 ps auxwwsh-5.0$ podman --events-backend none images &[1] 191sh-5.0$ REPOSITORY TAG IMAGE ID CREATED SIZE [1]+ Done podman --events-backend none imagessh-5.0$ ps auxwwUSER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMANDpodman 1 0.0 0.0 2344 736 ? Ss 08:25 0:00 /tini -- python3 -m http.serverpodman 7 0.0 0.0 21136 15752 ? S 08:25 0:00 python3 -m http.serverpodman 8 0.0 0.0 4788 4172 pts/0 Ss 08:25 0:00 shpodman 204 0.0 0.0 48868 23316 ? S 08:46 0:00 podmanpodman 217 0.0 0.0 6736 3100 pts/0 R+ 08:47 0:00 ps auxwwsh-5.0$ podman --events-backend none images &[1] 218sh-5.0$ REPOSITORY TAG IMAGE ID CREATED SIZE [1]+ Done podman --events-backend none imagessh-5.0$ ps auxwwUSER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMANDpodman 1 0.0 0.0 2344 736 ? Ss 08:25 0:00 /tini -- python3 -m http.serverpodman 7 0.0 0.0 21136 15752 ? S 08:25 0:00 python3 -m http.serverpodman 8 0.0 0.0 4788 4172 pts/0 Ss 08:25 0:00 shpodman 204 0.0 0.0 48868 23316 ? S 08:46 0:00 podmanpodman 239 0.0 0.0 6736 3176 pts/0 R+ 08:47 0:00 ps auxww

Describe the results you expected:

I would expect no lingering sub-processes from podman, after it completes the list.

I wonder if it is related to the "systemd lingering" support, or maybe the server mode? It does not appear related to the API service, as this needs to be started explicitly.

Since the second invocation does not create a new subprocess, I assume this is something intentional. But then it should maybe be mention in the docs/trouble shooting guide. And/or be controllable with an option.

Output of podman version:

Version: 2.2.1 API Version: 2.1.0 Go Version: go1.15.5 Built: Tue Dec 8 15:37:50 2020 OS/Arch: linux/amd64

Output of podman info --debug:

host: arch: amd64 buildahVersion: 1.18.0 cgroupManager: cgroupfs cgroupVersion: v1 conmon: package: conmon-2.0.21-3.fc33.x86_64 path: /usr/bin/conmon version: 'conmon version 2.0.21, commit: 0f53fb68333bdead5fe4dc5175703e22cf9882ab' cpus: 4 distribution: distribution: fedora version: "33" eventLogger: journald hostname: podman-test-749877c57f-p9cfw idMappings: gidmap:

container_id: 0 host_id: 1000 size: 1

container_id: 1 host_id: 10000 size: 65536 uidmap:

container_id: 0 host_id: 1000 size: 1

container_id: 1 host_id: 10000 size: 65536 kernel: 4.18.0-193.28.1.el8_2.x86_64 linkmode: dynamic memFree: 2864472064 memTotal: 33724624896 ociRuntime: name: runc package: runc-1.0.0-279.dev.gitdedadbf.fc33.x86_64 path: /usr/bin/runc version: |- runc version 1.0.0-rc92+dev commit: c9a9ce0286785bef3f3c3c87cd1232e535a03e15 spec: 1.0.2-dev os: linux remoteSocket: path: /tmp/podman-run-1000/podman/podman.sock rootless: true slirp4netns: executable: /usr/bin/slirp4netns package: slirp4netns-1.1.8-1.fc33.x86_64 version: |- slirp4netns version 1.1.8 commit: d361001f495417b880f20329121e3aa431a8f90f libslirp: 4.3.1 SLIRP_CONFIG_VERSION_MAX: 3 libseccomp: 2.5.0 swapFree: 0 swapTotal: 0 uptime: 730h 26m 11.3s (Approximately 30.42 days) registries: search:

registry.fedoraproject.org

registry.access.redhat.com

registry.centos.org

docker.io store: configFile: /home/podman/.config/containers/storage.conf containerStore: number: 0 paused: 0 running: 0 stopped: 0 graphDriverName: vfs graphOptions: {} graphRoot: /home/podman/.local/share/containers/storage graphStatus: {} imageStore: number: 0 runRoot: /tmp/containers-user-1000/containers volumePath: /home/podman/.local/share/containers/storage/volumes version: APIVersion: 2.1.0 Built: 1607438270 BuiltTime: Tue Dec 8 15:37:50 2020 GitCommit: "" GoVersion: go1.15.5 OsArch: linux/amd64 Version: 2.2.1

Package info (e.g. output of rpm -q podman or apt list podman):

podman-2.2.1-1.fc33.x86_64

Have you tested with the latest version of Podman and have you checked the Podman Troubleshooting Guide?

Yes

Additional environment details (AWS, VirtualBox, physical, etc.):

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/containers/podman/issues/8759, or unsubscribe https://github.com/notifications/unsubscribe-auth/AB3AOCFQ47GME4AMKHRGVCLSVG45LANCNFSM4U7HDYNA .

rhatdan commented 3 years ago

This is expected, we leave a process running that just holds the namespaces open. This helps us avoid race conditions where you are running lots of containers at the same time.

jskov-jyskebank-dk commented 3 years ago

Thanks!

asottile commented 3 years ago

is there a way to control this behaviour? just encountered a bug in a systemd service which refused to restart because it thought this podman pause daemon thing was still alive

rhatdan commented 3 years ago

podman system migrate will kill it, I believe.

rhatdan commented 3 years ago

@giuseppe thoughts on allowing podman to automatically kill it when it exits?

giuseppe commented 3 years ago

is there a way to control this behaviour? just encountered a bug in a systemd service which refused to restart because it thought this podman pause daemon thing was still alive

@asottile the podman pause process should be living in its own scope. Do you have a reproducer? What version are you using?

asottile commented 3 years ago

I can try and get you a more isolated / reproducible example in the morning -- here's an example from a nonprod host:

$ podman version
Version:      2.2.1
API Version:  2.1.0
Go Version:   go1.15.2
Built:        Thu Jan  1 00:00:00 1970
OS/Arch:      linux/amd64
$ systemctl status pre-commit-ci-worker.service
● pre-commit-ci-worker.service - pre-commit-ci queue worker
     Loaded: loaded (/etc/systemd/system/pre-commit-ci-worker.service; enabled; vendor preset: enabled)
     Active: active (running) since Wed 2021-01-06 21:34:50 UTC; 4 days ago
   Main PID: 8332 (python3)
      Tasks: 2 (limit: 1160)
     Memory: 69.8M
     CGroup: /system.slice/pre-commit-ci-worker.service
             ├─8332 /opt/pre-commit-ci/venv/bin/python3 -u -m runner.worker
             └─8447 podman

<< snipped output>>

$ sudo kill -INT 8332  # simulate a termination
$ systemctl status pre-commit-ci-worker.service
● pre-commit-ci-worker.service - pre-commit-ci queue worker
     Loaded: loaded (/etc/systemd/system/pre-commit-ci-worker.service; enabled; vendor preset: enabled)
     Active: deactivating (stop-sigterm) (Result: exit-code) since Mon 2021-01-11 09:45:06 UTC; 4s ago
    Process: 8332 ExecStart=/opt/pre-commit-ci/venv/bin/python3 -u -m runner.worker (code=exited, status=1/FAILURE)
   Main PID: 8332 (code=exited, status=1/FAILURE)
      Tasks: 1 (limit: 1160)
     Memory: 30.2M
     CGroup: /system.slice/pre-commit-ci-worker.service
             └─8447 podman

asottile commented 3 years ago

I guess the part I'm most confused on is I thought podman was supposed to be daemonless?

rhatdan commented 3 years ago

Well it is, but it needs a process to monitor the container. The daemonless part is that no two containers are related. IE, you can start a container from scratch without talking to a daemon. When the container is running there are several processes running. For example in rootless mode, there is the conmon process for monitoring the pid1 of the container, there is slirp4netns which is setting up its network, fuse-overlayfs which is setting up its mounts Then there are the processes within the container. Of course podman executeable is also running unless you run the container in --daemon mode.

When the container exits all of thes support processes go away.

asottile commented 3 years ago

When the container exits all of thes support processes go away.

that doesn't appear to be the case, even a podman ps seems to spawn a lingering daemon:

$ ps -ef | grep '[p]odman'
$ podman ps -a
CONTAINER ID  IMAGE   COMMAND  CREATED  STATUS  PORTS   NAMES
$ ps -ef | grep '[p]odman'
asottile   16208    2007  1 09:06 ?        00:00:00 podman

mheon commented 3 years ago

That is the pause process. It's not really a daemon, because it does no processing (once launched, it immediately goes to sleep and takes no further action). The pause process keeps alive the rootless user namespace for other rootless Podman processes - subsequent invocations of rootless Podman by that user will join that process's namespaces.

The pause process can be safely killed once all containers are stopped (if done when containers are still running, there is a risk that pods and other shared namespace containers will not work properly), but Podman can't easily do it ourselves - there is a serious race condition if a fresh Podman process launches as we are in the process of stopping the pause process that we have not figured out how to address.

asottile commented 3 years ago

is there a consistent way to kill that process? can I configure podman such that it is killed by podman upon exit if I can guarantee that I don't spawn containers in parallel?

mheon commented 3 years ago

I don't believe we expose the PID anywhere that's easy to access - @giuseppe @rhatdan Should we consider including it in podman info?

mheon commented 3 years ago

We should also discuss methods for automatically shutting the process down at the cabal meeting - there has to be a locking solution we can use to guarantee we don't race.

asottile commented 3 years ago

Here's the hacky workaround I'm using such that the lingering process is not part of my main service

[Unit]
Description = hack for https://github.com/containers/podman/issues/8759

[Service]
User = whateveruser
Group = whateveruser

Type = notify
NotifyAccess = all
ExecStart = bash -c 'podman ps && systemd-notify --ready && exec sleep infinity'

[Install]
WantedBy=multi-user.target

and then my other service uses

[Unit]
Description = ...
Wants = dummy-podman.service
After = dummy-podman.service

# ...

should this issue be reopened until there's a better solution (or is there somewhere I can track that's better than this issue?)

giuseppe commented 3 years ago

I think we always move the pause process in its own cgroup.

Can you run with debug messages and check if you have some messages like Failed to add pause process to systemd sandbox cgroup?

asottile commented 3 years ago

I'm not super experienced with the tech involved here -- is that an option to podman? the systemd service?

giuseppe commented 3 years ago

it is an option to podman:

podman --log-level debug ...

asottile commented 3 years ago

here's the last few lines:

Feb 03 07:47:16 ip-172-31-67-194 bash[1623]: time="2021-02-03T07:47:16Z" level=info msg="[graphdriver] using prior storage driver: overlay"
Feb 03 07:47:16 ip-172-31-67-194 bash[1623]: time="2021-02-03T07:47:16Z" level=debug msg="Initializing event backend journald"
Feb 03 07:47:16 ip-172-31-67-194 bash[1623]: time="2021-02-03T07:47:16Z" level=debug msg="using runtime \"/usr/lib/cri-o-runc/sbin/runc\""
Feb 03 07:47:16 ip-172-31-67-194 bash[1623]: time="2021-02-03T07:47:16Z" level=debug msg="using runtime \"/usr/bin/crun\""
Feb 03 07:47:16 ip-172-31-67-194 bash[1623]: time="2021-02-03T07:47:16Z" level=warning msg="Error initializing configured OCI runtime kata: no valid executable found for OCI runtime kata: invalid argument"
Feb 03 07:47:16 ip-172-31-67-194 bash[1623]: time="2021-02-03T07:47:16Z" level=info msg="Setting parallel job count to 4"
Feb 03 07:47:16 ip-172-31-67-194 bash[1623]: time="2021-02-03T07:47:16Z" level=warning msg="Failed to add podman to systemd sandbox cgroup: exec: \"dbus-launch\": executable file not found in $PATH"
Feb 03 07:47:16 ip-172-31-67-194 bash[1623]: CONTAINER ID  IMAGE   COMMAND  CREATED  STATUS  PORTS   NAMES
Feb 03 07:47:16 ip-172-31-67-194 bash[1623]: time="2021-02-03T07:47:16Z" level=debug msg="Called ps.PersistentPostRunE(podman --log-level debug ps)"
Feb 03 07:47:16 ip-172-31-67-194 bash[1607]: time="2021-02-03T07:47:16Z" level=debug msg="Failed to add pause process to systemd sandbox cgroup: <nil>"
Feb 03 07:47:16 ip-172-31-67-194 systemd[1]: Started hack for https://github.com/containers/podman/issues/8759.

giuseppe commented 3 years ago

Feb 03 07:47:16 ip-172-31-67-194 bash[1623]: time="2021-02-03T07:47:16Z" level=warning msg="Failed to add podman to systemd sandbox cgroup: exec: \"dbus-launch\": executable file not found in $PATH"

the dbus-launch program is missing on your system. I think that is the reason why it is not moved to its own scope

asottile commented 3 years ago

dbus-launch is provided by dbus-x11 on ubuntu -- I don't think I should need X11 to run podman

asottile commented 3 years ago

even with that installed it still does not work:

Feb 03 08:06:01 ip-172-31-67-194 bash[2227]: time="2021-02-03T08:06:01Z" level=warning msg="Error initializing configured OCI runtime kata: no valid executable found for OCI runtime kata: invalid argument"
Feb 03 08:06:01 ip-172-31-67-194 bash[2227]: time="2021-02-03T08:06:01Z" level=info msg="Setting parallel job count to 4"
Feb 03 08:06:01 ip-172-31-67-194 dbus-daemon[2243]: [session uid=0 pid=2241] AppArmor D-Bus mediation is enabled
Feb 03 08:06:01 ip-172-31-67-194 dbus-daemon[2243]: Cannot setup inotify for '/root/.local/share/dbus-1/services'; error 'Permission denied'
Feb 03 08:06:01 ip-172-31-67-194 bash[2227]: time="2021-02-03T08:06:01Z" level=warning msg="Failed to add podman to systemd sandbox cgroup: dbus: authentication failed"
Feb 03 08:06:01 ip-172-31-67-194 bash[2227]: CONTAINER ID  IMAGE   COMMAND  CREATED  STATUS  PORTS   NAMES
Feb 03 08:06:01 ip-172-31-67-194 bash[2227]: time="2021-02-03T08:06:01Z" level=debug msg="Called ps.PersistentPostRunE(podman --log-level debug ps)"
Feb 03 08:06:01 ip-172-31-67-194 dbus-daemon[2248]: [session uid=2000 pid=2246] AppArmor D-Bus mediation is enabled
Feb 03 08:06:01 ip-172-31-67-194 dbus-daemon[2248]: [session uid=2000 pid=2246] Activating service name='org.freedesktop.systemd1' requested by ':1.0' (uid=2000 pid=2221 comm="podman --log-level debug ps " label="unconfined")
Feb 03 08:06:01 ip-172-31-67-194 dbus-daemon[2248]: [session uid=2000 pid=2246] Activated service 'org.freedesktop.systemd1' failed: Process org.freedesktop.systemd1 exited with status 1
Feb 03 08:06:01 ip-172-31-67-194 dbus-daemon[2248]: [session uid=2000 pid=2246] Activating service name='org.freedesktop.systemd1' requested by ':1.0' (uid=2000 pid=2221 comm="podman --log-level debug ps " label="unconfined")
Feb 03 08:06:01 ip-172-31-67-194 dbus-daemon[2248]: [session uid=2000 pid=2246] Activated service 'org.freedesktop.systemd1' failed: Process org.freedesktop.systemd1 exited with status 1
Feb 03 08:06:01 ip-172-31-67-194 bash[2221]: time="2021-02-03T08:06:01Z" level=debug msg="Failed to add pause process to systemd sandbox cgroup: <nil>"
Feb 03 08:06:01 ip-172-31-67-194 systemd[1]: Started hack for https://github.com/containers/podman/issues/8759.

containers / podman

Podman child process left running after simple podman commands #8759