rootless container - systemd user service fails on 1.7.0 but is working on 1.6.2

choeffer commented 4 years ago

Is this a BUG REPORT or FEATURE REQUEST? (leave only one on its own line)

/kind bug

Description

Followed https://www.redhat.com/sysadmin/podman-shareable-systemd-services to create rootless container systemd user service. On Fedora 31 and podman 1.6.2 it is working fine, on Fedora 31 and podman 1.7.0 it is failing.

Steps to reproduce the issue:

Create systemd user service, as described in the link above.
Start service.
Podman 1.6.2 works, 1.7.0 fails.

Describe the results you received:

on 1.7.0 I get -> failed (Result: timeout)

[fedora@fedora-31-cloud-podman user]$ systemctl --user status redhat_ex.service
● redhat_ex.service - Podman in Systemd
   Loaded: loaded (/home/fedora/.config/systemd/user/redhat_ex.service; disabled; vendor preset: enabled)
   Active: failed (Result: timeout) since Fri 2020-01-10 15:07:28 UTC; 9s ago
      CPU: 115ms

Jan 10 15:05:58 fedora-31-cloud-podman.novalocal systemd[658]: This usually indicates unclean termination of a previous run, or service implementation defici>
Jan 10 15:05:58 fedora-31-cloud-podman.novalocal podman[817]: 2020-01-10 15:05:58.421989591 +0000 UTC m=+0.060805575 container create 6513f72db0a5558629524c5>
Jan 10 15:05:58 fedora-31-cloud-podman.novalocal podman[817]: 2020-01-10 15:05:58.685485772 +0000 UTC m=+0.324301748 container init 6513f72db0a5558629524c524>
Jan 10 15:05:58 fedora-31-cloud-podman.novalocal podman[817]: 2020-01-10 15:05:58.690586238 +0000 UTC m=+0.329402224 container start 6513f72db0a5558629524c52>
Jan 10 15:05:58 fedora-31-cloud-podman.novalocal podman[817]: 6513f72db0a5558629524c5244f1956380b998fbe3d37b9c86339093d087ccf0
Jan 10 15:05:58 fedora-31-cloud-podman.novalocal systemd[658]: redhat_ex.service: New main PID 836 does not belong to service, and PID file is not owned by r>
Jan 10 15:05:58 fedora-31-cloud-podman.novalocal systemd[658]: redhat_ex.service: New main PID 836 does not belong to service, and PID file is not owned by r>
Jan 10 15:07:28 fedora-31-cloud-podman.novalocal systemd[658]: redhat_ex.service: start operation timed out. Terminating.
Jan 10 15:07:28 fedora-31-cloud-podman.novalocal systemd[658]: redhat_ex.service: Failed with result 'timeout'.
Jan 10 15:07:28 fedora-31-cloud-podman.novalocal systemd[658]: Failed to start Podman in Systemd.

Describe the results you expected:

on 1.6.2 I get -> active (running)

[pink@dhcp100 user]$ systemctl --user status redhat_ex.service
● redhat_ex.service - Podman in Systemd
   Loaded: loaded (/home/pink/.config/systemd/user/redhat_ex.service; disabled; vendor preset: enabled)
   Active: active (running) since Fri 2020-01-10 16:07:30 CET; 21s ago
  Process: 312164 ExecStartPre=/usr/bin/rm -f //run/user/1000/redhat_ex.service-pid //run/user/1000/redhat_ex.service-cid (code=exited, status=0/SUCCESS)
  Process: 312165 ExecStart=/usr/bin/podman run --conmon-pidfile //run/user/1000/redhat_ex.service-pid --cidfile //run/user/1000/redhat_ex.service-cid -d alp>
 Main PID: 312191 (conmon)
    Tasks: 4 (limit: 16631)
   Memory: 6.2M
      CPU: 250ms
   CGroup: /user.slice/user-1000.slice/user@1000.service/redhat_ex.service
           ├─312185 /usr/bin/fuse-overlayfs -o lowerdir=/home/pink/.local/share/containers/storage/overlay/l/3MRCHF7M66UBFEUO2LPMGGGA2T,upperdir=/home/pink/.>
           ├─312187 /usr/bin/slirp4netns --disable-host-loopback --mtu 65520 -c -e 3 -r 4 --netns-type=path /run/user/1000/netns/cni-708019e3-cb4f-0929-4b0a->
           └─312191 /usr/bin/conmon --api-version 1 -s -c e87418d51ff92e68e1063279299fa14bfdbb7de1da3364ab6207d17147644a16 -u e87418d51ff92e68e1063279299fa14>

Jan 10 16:07:30 dhcp100.oxoe.int systemd[1994]: Starting Podman in Systemd...
Jan 10 16:07:30 dhcp100.oxoe.int podman[312165]: 2020-01-10 16:07:30.694415665 +0100 CET m=+0.127860579 container create e87418d51ff92e68e1063279299fa14bfdbb>
Jan 10 16:07:30 dhcp100.oxoe.int podman[312165]: 2020-01-10 16:07:30.934813248 +0100 CET m=+0.368248244 container init e87418d51ff92e68e1063279299fa14bfdbb7d>
Jan 10 16:07:30 dhcp100.oxoe.int podman[312165]: 2020-01-10 16:07:30.943507606 +0100 CET m=+0.376942114 container start e87418d51ff92e68e1063279299fa14bfdbb7>
Jan 10 16:07:30 dhcp100.oxoe.int podman[312165]: e87418d51ff92e68e1063279299fa14bfdbb7de1da3364ab6207d17147644a16
Jan 10 16:07:30 dhcp100.oxoe.int systemd[1994]: Started Podman in Systemd.

Additional information you deem important (e.g. issue happens only occasionally):

Output of podman version:

[pink@dhcp100 user]$ podman version
Version:            1.6.2
RemoteAPI Version:  1
Go Version:         go1.13.1
OS/Arch:            linux/amd64

and

[fedora@fedora-31-cloud-podman user]$ podman version
Version:            1.7.0
RemoteAPI Version:  1
Go Version:         go1.13.5
OS/Arch:            linux/amd64

Output of podman info --debug:

[pink@dhcp100 user]$ podman info --debug
debug:
  compiler: gc
  git commit: ""
  go version: go1.13.1
  podman version: 1.6.2
host:
  BuildahVersion: 1.11.3
  CgroupVersion: v2
  Conmon:
    package: conmon-2.0.2-1.fc31.x86_64
    path: /usr/bin/conmon
    version: 'conmon version 2.0.2, commit: 186a550ba0866ce799d74006dab97969a2107979'
  Distribution:
    distribution: fedora
    version: "31"
  IDMappings:
    gidmap:
    - container_id: 0
      host_id: 1000
      size: 1
    - container_id: 1
      host_id: 100000
      size: 65536
    uidmap:
    - container_id: 0
      host_id: 1000
      size: 1
    - container_id: 1
      host_id: 100000
      size: 65536
  MemFree: 665190400
  MemTotal: 14647562240
  OCIRuntime:
    name: crun
    package: crun-0.10.6-1.fc31.x86_64
    path: /usr/bin/crun
    version: |-
      crun version 0.10.6
      spec: 1.0.0
      +SYSTEMD +SELINUX +APPARMOR +CAP +SECCOMP +EBPF +YAJL
  SwapFree: 6157291520
  SwapTotal: 7394553856
  arch: amd64
  cpus: 8
  eventlogger: journald
  hostname: dhcp100.oxoe.int
  kernel: 5.3.16-300.fc31.x86_64
  os: linux
  rootless: true
  slirp4netns:
    Executable: /usr/bin/slirp4netns
    Package: slirp4netns-0.4.0-20.1.dev.gitbbd6f25.fc31.x86_64
    Version: |-
      slirp4netns version 0.4.0-beta.3+dev
      commit: bbd6f25c70d5db2a1cd3bfb0416a8db99a75ed7e
  uptime: 271h 0m 40.51s (Approximately 11.29 days)
registries:
  blocked: null
  insecure: null
  search:
  - docker.io
  - registry.fedoraproject.org
  - registry.access.redhat.com
  - registry.centos.org
  - quay.io
store:
  ConfigFile: /home/pink/.config/containers/storage.conf
  ContainerStore:
    number: 4
  GraphDriverName: overlay
  GraphOptions:
    overlay.mount_program:
      Executable: /usr/bin/fuse-overlayfs
      Package: fuse-overlayfs-0.7.2-2.fc31.x86_64
      Version: |-
        fusermount3 version: 3.6.2
        fuse-overlayfs: version 0.7.2
        FUSE library version 3.6.2
        using FUSE kernel interface version 7.29
  GraphRoot: /home/pink/.local/share/containers/storage
  GraphStatus:
    Backing Filesystem: extfs
    Native Overlay Diff: "false"
    Supports d_type: "true"
    Using metacopy: "false"
  ImageStore:
    number: 3
  RunRoot: /run/user/1000
  VolumePath: /home/pink/.local/share/containers/storage/volumes

and

[fedora@fedora-31-cloud-podman user]$ podman info --debug
debug:
  compiler: gc
  git commit: ""
  go version: go1.13.5
  podman version: 1.7.0
host:
  BuildahVersion: 1.12.0
  CgroupVersion: v2
  Conmon:
    package: conmon-2.0.2-1.fc31.x86_64
    path: /usr/bin/conmon
    version: 'conmon version 2.0.2, commit: 186a550ba0866ce799d74006dab97969a2107979'
  Distribution:
    distribution: fedora
    version: "31"
  IDMappings:
    gidmap:
    - container_id: 0
      host_id: 1000
      size: 1
    - container_id: 1
      host_id: 100000
      size: 65536
    uidmap:
    - container_id: 0
      host_id: 1000
      size: 1
    - container_id: 1
      host_id: 100000
      size: 65536
  MemFree: 3722665984
  MemTotal: 4121141248
  OCIRuntime:
    name: crun
    package: crun-0.10.6-1.fc31.x86_64
    path: /usr/bin/crun
    version: |-
      crun version 0.10.6
      spec: 1.0.0
      +SYSTEMD +SELINUX +APPARMOR +CAP +SECCOMP +EBPF +YAJL
  SwapFree: 0
  SwapTotal: 0
  arch: amd64
  cpus: 2
  eventlogger: journald
  hostname: fedora-31-cloud-podman.novalocal
  kernel: 5.4.8-200.fc31.x86_64
  os: linux
  rootless: true
  slirp4netns:
    Executable: /usr/bin/slirp4netns
    Package: slirp4netns-0.4.0-20.1.dev.gitbbd6f25.fc31.x86_64
    Version: |-
      slirp4netns version 0.4.0-beta.3+dev
      commit: bbd6f25c70d5db2a1cd3bfb0416a8db99a75ed7e
  uptime: 19m 42.43s
registries:
  search:
  - docker.io
  - registry.fedoraproject.org
  - registry.access.redhat.com
  - registry.centos.org
  - quay.io
store:
  ConfigFile: /home/fedora/.config/containers/storage.conf
  ContainerStore:
    number: 3
  GraphDriverName: overlay
  GraphOptions:
    overlay.mount_program:
      Executable: /usr/bin/fuse-overlayfs
      Package: fuse-overlayfs-0.7.2-2.fc31.x86_64
      Version: |-
        fusermount3 version: 3.6.2
        fuse-overlayfs: version 0.7.2
        FUSE library version 3.6.2
        using FUSE kernel interface version 7.29
  GraphRoot: /home/fedora/.local/share/containers/storage
  GraphStatus:
    Backing Filesystem: extfs
    Native Overlay Diff: "false"
    Supports d_type: "true"
    Using metacopy: "false"
  ImageStore:
    number: 2
  RunRoot: /run/user/1000/containers
  VolumePath: /home/fedora/.local/share/containers/storage/volumes

Package info (e.g. output of rpm -q podman or apt list podman):

[pink@dhcp100 user]$ rpm -q podman
podman-1.6.2-2.fc31.x86_64

and

[fedora@fedora-31-cloud-podman user]$ rpm -q podman
podman-1.7.0-2.fc31.x86_64

Additional environment details (AWS, VirtualBox, physical, etc.):

[pink@dhcp100 user]$ cat redhat_ex.service 
[Unit]
Description=Podman in Systemd

[Service]
Restart=no
ExecStartPre=/usr/bin/rm -f /%t/%n-pid /%t/%n-cid
ExecStart=/usr/bin/podman run --conmon-pidfile /%t/%n-pid --cidfile /%t/%n-cid -d alpine:latest top
ExecStop=/usr/bin/sh -c "/usr/bin/podman rm -f `cat /%t/%n-cid`"
KillMode=none
Type=forking
PIDFile=/%t/%n-pid

[Install]
WantedBy=multi-user.target

and

[fedora@fedora-31-cloud-podman user]$ cat redhat_ex.service 
[Unit]
Description=Podman in Systemd

[Service]
Restart=no
ExecStartPre=/usr/bin/rm -f /%t/%n-pid /%t/%n-cid
ExecStart=/usr/bin/podman run --conmon-pidfile /%t/%n-pid --cidfile /%t/%n-cid -d alpine:latest top
ExecStop=/usr/bin/sh -c "/usr/bin/podman rm -f `cat /%t/%n-cid`"
KillMode=none
Type=forking
PIDFile=/%t/%n-pid

[Install]
WantedBy=multi-user.target

user config:

[pink@dhcp100 user]$ cat ~/.config/containers/libpod.conf 
volume_path = "/home/pink/.local/share/containers/storage/volumes"
image_default_transport = "docker://"
runtime = "crun"
runtime_supports_json = ["crun", "runc"]
runtime_supports_nocgroups = ["crun"]
conmon_path = ["/usr/libexec/podman/conmon", "/usr/local/libexec/podman/conmon", "/usr/local/lib/podman/conmon", "/usr/bin/conmon", "/usr/sbin/conmon", "/usr/local/bin/conmon", "/usr/local/sbin/conmon", "/run/current-system/sw/bin/conmon"]
conmon_env_vars = ["PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin"]
cgroup_manager = "systemd"
init_path = ""
static_dir = "/home/pink/.local/share/containers/storage/libpod"
tmp_dir = "/run/user/1000/libpod/tmp"
max_log_size = -1
no_pivot_root = false
cni_config_dir = "/etc/cni/net.d/"
cni_plugin_dir = ["/usr/libexec/cni", "/usr/lib/cni", "/usr/local/lib/cni", "/opt/cni/bin"]
infra_image = "k8s.gcr.io/pause:3.1"
infra_command = "/pause"
enable_port_reservation = true
label = true
network_cmd_path = ""
num_locks = 2048
lock_type = "shm"
events_logger = "journald"
events_logfile_path = ""
detach_keys = "ctrl-p,ctrl-q"
SDNotify = false
cgroup_check = true

[runtimes]
  crun = ["/usr/bin/crun", "/usr/local/bin/crun"]
  runc = ["/usr/bin/runc", "/usr/sbin/runc", "/usr/local/bin/runc", "/usr/local/sbin/runc", "/sbin/runc", "/bin/runc", "/usr/lib/cri-o-runc/sbin/runc", "/run/current-system/sw/bin/runc"]

and

[fedora@fedora-31-cloud-podman user]$ cat ~/.config/containers/libpod.conf
volume_path = "/home/fedora/.local/share/containers/storage/volumes"
image_default_transport = "docker://"
runtime = "crun"
runtime_supports_json = ["crun", "runc"]
runtime_supports_nocgroups = ["crun"]
conmon_path = ["/usr/libexec/podman/conmon", "/usr/local/libexec/podman/conmon", "/usr/local/lib/podman/conmon", "/usr/bin/conmon", "/usr/sbin/conmon", "/usr/local/bin/conmon", "/usr/local/sbin/conmon", "/run/current-system/sw/bin/conmon"]
conmon_env_vars = ["PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin"]
cgroup_manager = "systemd"
init_path = ""
static_dir = "/home/fedora/.local/share/containers/storage/libpod"
tmp_dir = "/run/user/1000/libpod/tmp"
max_log_size = -1
no_pivot_root = false
cni_config_dir = "/etc/cni/net.d/"
cni_plugin_dir = ["/usr/libexec/cni", "/usr/lib/cni", "/usr/local/lib/cni", "/opt/cni/bin"]
infra_image = "k8s.gcr.io/pause:3.1"
infra_command = "/pause"
enable_port_reservation = true
label = true
network_cmd_path = ""
num_locks = 2048
lock_type = "shm"
events_logger = "journald"
events_logfile_path = ""
detach_keys = "ctrl-p,ctrl-q"
SDNotify = false
cgroup_check = true

[pink@dhcp100 user]$ systemctl --version
systemd 243 (v243.5-1.fc31)
+PAM +AUDIT +SELINUX +IMA -APPARMOR +SMACK +SYSVINIT +UTMP +LIBCRYPTSETUP +GCRYPT +GNUTLS +ACL +XZ +LZ4 +SECCOMP +BLKID +ELFUTILS +KMOD +IDN2 -IDN +PCRE2 default-hierarchy=unified

and

[fedora@fedora-31-cloud-podman user]$ systemctl --version
systemd 243 (v243.5-1.fc31)
+PAM +AUDIT +SELINUX +IMA -APPARMOR +SMACK +SYSVINIT +UTMP +LIBCRYPTSETUP +GCRYPT +GNUTLS +ACL +XZ +LZ4 +SECCOMP +BLKID +ELFUTILS +KMOD +IDN2 -IDN +PCRE2 default-hierarchy=unified

mheon commented 4 years ago

Can you try the commands in the systemd (the ExecStartPre and ExecStart ones most notably) manually, with --log-level=debug added, and provide the logs? Should help us identify what is going on

mheon commented 4 years ago

@vrothberg PTAL

choeffer commented 4 years ago

Thanks for the fast reply. The ExecStartPre and ExecStart on their own are working fine in both cases.

[fedora@fedora-31-cloud-podman user]$ /usr/bin/rm -f //run/user/1000/redhat_ex.service-pid //run/user/1000/redhat_ex.service-cid
[fedora@fedora-31-cloud-podman user]$ /usr/bin/podman run --conmon-pidfile //run/user/1000/redhat_ex.service-pid --cidfile //run/user/1000/redhat_ex.service-cid -d alpine:latest top --log-level=debug
6ee53e0507479f608f9c6fe4a6200b558a94dd90866db5240ef081454a89dd78
[fedora@fedora-31-cloud-podman user]$

choeffer commented 4 years ago

Interestingly, the same .service file is running fine as systemd system service.

[fedora@fedora-31-cloud-podman systemd]$ sudo cp /home/fedora/.config/systemd/user/redhat_ex.service /etc/systemd/system/
[fedora@fedora-31-cloud-podman systemd]$ sudo systemctl daemon-reload
[fedora@fedora-31-cloud-podman systemd]$ sudo systemctl start redhat_ex.service
[fedora@fedora-31-cloud-podman systemd]$ sudo systemctl status redhat_ex.service
● redhat_ex.service - Podman in Systemd
   Loaded: loaded (/etc/systemd/system/redhat_ex.service; disabled; vendor preset: disabled)
   Active: active (running) since Fri 2020-01-10 16:12:58 UTC; 8s ago
  Process: 1329 ExecStartPre=/usr/bin/rm -f //run/redhat_ex.service-pid //run/redhat_ex.service-cid (code=exited, status=0/SUCCESS)
  Process: 1330 ExecStart=/usr/bin/podman run --conmon-pidfile //run/redhat_ex.service-pid --cidfile //run/redhat_ex.service-cid -d alpine:latest top (code=exited, status=0/SUCCESS)
 Main PID: 1462 (conmon)
    Tasks: 0 (limit: 4678)
   Memory: 33.0M
      CPU: 892ms
   CGroup: /system.slice/redhat_ex.service
           ‣ 1462 /usr/bin/conmon --api-version 1 -s -c 3b7eb5fd0547f8ac67b8106a3ac51cd2c4a9be8ced6bafd9850408c47399bce6 -u 3b7eb5fd0547f8ac67b8106a3ac51cd2c4a9be8ced6bafd9850408c47399bce6 -r /usr/bin/crun -b /var/lib/containers/storage/overlay-containers/3b7eb5fd0547f8ac67b8106a3ac51cd2c4a9be8ced6bafd9850408c47399b>

Jan 10 16:12:58 fedora-31-cloud-podman.novalocal systemd[1]: redhat_ex.service: Child 1330 belongs to redhat_ex.service.
Jan 10 16:12:58 fedora-31-cloud-podman.novalocal systemd[1]: redhat_ex.service: Control process exited, code=exited, status=0/SUCCESS
Jan 10 16:12:58 fedora-31-cloud-podman.novalocal systemd[1]: redhat_ex.service: Got final SIGCHLD for state start.
Jan 10 16:12:58 fedora-31-cloud-podman.novalocal systemd[1]: redhat_ex.service: New main PID 1462 does not belong to service, but we'll accept it since PID file is owned by root.
Jan 10 16:12:58 fedora-31-cloud-podman.novalocal systemd[1]: redhat_ex.service: Main PID loaded: 1462
Jan 10 16:12:58 fedora-31-cloud-podman.novalocal systemd[1]: redhat_ex.service: Changed start -> running
Jan 10 16:12:58 fedora-31-cloud-podman.novalocal systemd[1]: redhat_ex.service: Job 604 redhat_ex.service/start finished, result=done
Jan 10 16:12:58 fedora-31-cloud-podman.novalocal systemd[1]: Started Podman in Systemd.
Jan 10 16:12:58 fedora-31-cloud-podman.novalocal systemd[1]: redhat_ex.service: Failed to send unit change signal for redhat_ex.service: Connection reset by peer
Jan 10 16:12:58 fedora-31-cloud-podman.novalocal systemd[1]: redhat_ex.service: Control group is empty.

vrothberg commented 4 years ago

@choeffer, can you share the service file or provide one as a reproducer?

choeffer commented 4 years ago

This is the one I am using:

[Unit]
Description=Podman in Systemd

[Service]
Restart=no
ExecStartPre=/usr/bin/rm -f /%t/%n-pid /%t/%n-cid
ExecStart=/usr/bin/podman run --conmon-pidfile /%t/%n-pid --cidfile /%t/%n-cid -d alpine:latest top
ExecStop=/usr/bin/sh -c "/usr/bin/podman rm -f `cat /%t/%n-cid`"
KillMode=none
Type=forking
PIDFile=/%t/%n-pid

[Install]
WantedBy=multi-user.target

vrothberg commented 4 years ago

I can't reproduce. It works on my machine. I am on cgroups v1 though.

vrothberg commented 4 years ago

It is related to cgroups v2. I can reproduce the issue locally, see below:

systemd[1426]: This usually indicates unclean termination of a previous run, or service implementation deficiencies.
systemd[1426]: alpine.service: Found left-over process 5227 (fuse-overlayfs) in control group while starting unit. Ignoring.
systemd[1426]: This usually indicates unclean termination of a previous run, or service implementation deficiencies.
systemd[1426]: alpine.service: Found left-over process 5230 (slirp4netns) in control group while starting unit. Ignoring.
systemd[1426]: This usually indicates unclean termination of a previous run, or service implementation deficiencies.

... AND ...

systemd[1426]: alpine.service: New main PID 5233 does not belong to service, and PID file is not owned by root. Refusing.

It works when setting --cgroups=disabled in podman-run. @giuseppe, do you know what's going on?

choeffer commented 4 years ago

Can confirm, with --cgroups=disabled it also works with podman 1.7.0 again.

[fedora@fedora-31-cloud-podman user]$ cat redhat_ex.service 
[Unit]
Description=Podman in Systemd

[Service]
Restart=no
ExecStartPre=/usr/bin/rm -f /%t/%n-pid /%t/%n-cid
ExecStart=/usr/bin/podman run --cgroups=disabled --conmon-pidfile /%t/%n-pid --cidfile /%t/%n-cid -d alpine:latest top 
ExecStop=/usr/bin/sh -c "/usr/bin/podman rm -f `cat /%t/%n-cid`"
KillMode=none
Type=forking
PIDFile=/%t/%n-pid

[Install]
WantedBy=multi-user.target

[fedora@fedora-31-cloud-podman user]$ systemctl --user status redhat_ex.service
● redhat_ex.service - Podman in Systemd
   Loaded: loaded (/home/fedora/.config/systemd/user/redhat_ex.service; disabled; vendor preset: enabled)
   Active: active (running) since Fri 2020-01-10 16:46:00 UTC; 4min 21s ago
  Process: 1573 ExecStartPre=/usr/bin/rm -f //run/user/1000/redhat_ex.service-pid //run/user/1000/redhat_ex.service-cid (code=exited, status=0/SUCCESS)
  Process: 1574 ExecStart=/usr/bin/podman run --cgroups=disabled --conmon-pidfile //run/user/1000/redhat_ex.service-pid --cidfile //run/user/1000/redhat_ex.service-cid -d alpine:latest top (code=exited, status=0/SUCCESS)
 Main PID: 1592 (conmon)
    Tasks: 8 (limit: 4678)
   Memory: 77.4M
      CPU: 406ms
   CGroup: /user.slice/user-1000.slice/user@1000.service/redhat_ex.service
           ├─ 720 /usr/bin/podman
           ├─1263 /usr/bin/fuse-overlayfs -o lowerdir=/home/fedora/.local/share/containers/storage/overlay/l/RELFBA7TL6P5UYS2YRPZYUHHQV,upperdir=/home/fedora/.local/share/containers/storage/overlay/dad442dadfc9880c972793590dbec2420c50e1192bc8c038ebf23ec6fad17e13/diff,workdir=/home/fedora/.local/share/containers/stor>
           ├─1266 /usr/bin/slirp4netns --disable-host-loopback --mtu 65520 -c -e 3 -r 4 --netns-type=path /run/user/1000/netns/cni-0f5ef442-e2e6-af7d-723f-6d950eea572a tap0
           ├─1587 /usr/bin/fuse-overlayfs -o lowerdir=/home/fedora/.local/share/containers/storage/overlay/l/RELFBA7TL6P5UYS2YRPZYUHHQV,upperdir=/home/fedora/.local/share/containers/storage/overlay/43130682adb50ba6d984dd14e06c656ac9617f27adff49b17562e96ced1a4d59/diff,workdir=/home/fedora/.local/share/containers/stor>
           ├─1588 /usr/bin/slirp4netns --disable-host-loopback --mtu 65520 -c -e 3 -r 4 --netns-type=path /run/user/1000/netns/cni-f6f876b7-76c3-ea21-a122-ed6ab14bd9d2 tap0
           ├─1592 /usr/bin/conmon --api-version 1 -c 31f38e9bac3798edf6f9c6c4500d04940d48ab1736c78606086c4eb49eb0ccc2 -u 31f38e9bac3798edf6f9c6c4500d04940d48ab1736c78606086c4eb49eb0ccc2 -r /usr/bin/crun -b /home/fedora/.local/share/containers/storage/overlay-containers/31f38e9bac3798edf6f9c6c4500d04940d48ab1736c7860>
           └─1598 top

Jan 10 16:46:00 fedora-31-cloud-podman.novalocal systemd[658]: This usually indicates unclean termination of a previous run, or service implementation deficiencies.
Jan 10 16:46:00 fedora-31-cloud-podman.novalocal systemd[658]: redhat_ex.service: Found left-over process 1263 (fuse-overlayfs) in control group while starting unit. Ignoring.
Jan 10 16:46:00 fedora-31-cloud-podman.novalocal systemd[658]: This usually indicates unclean termination of a previous run, or service implementation deficiencies.
Jan 10 16:46:00 fedora-31-cloud-podman.novalocal systemd[658]: redhat_ex.service: Found left-over process 1266 (slirp4netns) in control group while starting unit. Ignoring.
Jan 10 16:46:00 fedora-31-cloud-podman.novalocal systemd[658]: This usually indicates unclean termination of a previous run, or service implementation deficiencies.
Jan 10 16:46:00 fedora-31-cloud-podman.novalocal podman[1574]: 2020-01-10 16:46:00.450713326 +0000 UTC m=+0.071489446 container create 31f38e9bac3798edf6f9c6c4500d04940d48ab1736c78606086c4eb49eb0ccc2 (image=docker.io/library/alpine:latest, name=agitated_villani)
Jan 10 16:46:00 fedora-31-cloud-podman.novalocal podman[1574]: 2020-01-10 16:46:00.710006256 +0000 UTC m=+0.330782378 container init 31f38e9bac3798edf6f9c6c4500d04940d48ab1736c78606086c4eb49eb0ccc2 (image=docker.io/library/alpine:latest, name=agitated_villani)
Jan 10 16:46:00 fedora-31-cloud-podman.novalocal podman[1574]: 2020-01-10 16:46:00.715430062 +0000 UTC m=+0.336206192 container start 31f38e9bac3798edf6f9c6c4500d04940d48ab1736c78606086c4eb49eb0ccc2 (image=docker.io/library/alpine:latest, name=agitated_villani)
Jan 10 16:46:00 fedora-31-cloud-podman.novalocal podman[1574]: 31f38e9bac3798edf6f9c6c4500d04940d48ab1736c78606086c4eb49eb0ccc2
Jan 10 16:46:00 fedora-31-cloud-podman.novalocal systemd[658]: Started Podman in Systemd.

vrothberg commented 4 years ago

@choeffer thanks for confirming! We'll have a look at the issue (I'm currently bisecting).

vrothberg commented 4 years ago

I opened https://github.com/containers/libpod/pull/4835 to revert the change introducing the regression.

vrothberg commented 4 years ago

@giuseppe, how shall we tackle this? Move slirp and overlayfs in the other cgroups?

giuseppe commented 4 years ago

@giuseppe, how shall we tackle this? Move slirp and overlayfs in the other cgroups?

I think the correct solution would be to not create a new cgroup at all when running directly from systemd, it is pointless to create a new one and tell systemd to monitor a process from another cgroup.

The issue is that I don't know of any what to detect whether we are running from a systemd service, so we should invent our own way to signal it. It can be a simple environment variable, or if we want to make it more generic, we can add something like --cgroups=enabled-no-conmon or --cgroups-conmon=disabled.

What do you think?

vrothberg commented 4 years ago

I consider this as a regression as it works with pre-1.7 versions. One more F31 user has reached out on IRC today and I would not want to introduce that to RHEL. So I'd love to figure out a way to detect if we're running in a systemd service.

More recent versions of systemd are setting some env variables (see https://serverfault.com/a/927481) but I think that might not work in all circumstances. A service might run a script which runs some containers where Podman is not meant to be the main PID. I think we could look at the parent PID of Podman and check if that's systemd. Would that work?

mheon commented 4 years ago

What happens if the user does something that requires creating a cgroup, though? Sets resource limits on the Podman command line, uses pid=host, etc?

rhatdan commented 4 years ago

Can't we look at the current cgroup and figure out if it is a service cgroup?

vrothberg commented 4 years ago

Can't we look at the current cgroup and figure out if it is a service cgroup?

In some cases that might be okay, e.g., when Podman/conmon is not the main PID of the service. Checking for the parent PID is afaics more realiable.

What happens if the user does something that requires creating a cgroup, though? Sets resource limits on the Podman command line, uses pid=host, etc?

It seems redundant to what systemd provides but we must make sure to not regress. Tricky!

giuseppe commented 4 years ago

What happens if the user does something that requires creating a cgroup, though? Sets resource limits on the Podman command line, uses pid=host, etc?

that should not matter for the conmon cgroup, we don't set any limit there. We will still create a cgroup for the container payload.

That is why I was suggesting another mode for disabling cgroups only for conmon, differently than what we do now with disabled where the container itself has not a new cgroup.

In some cases that might be okay, e.g., when Podman/conmon is not the main PID of the service. Checking for the parent PID is afaics more realiable.

yes agree. For example on Fedora 31 my console session has this cgroup:

$ cat /proc/self/cgroup 
0::/user.slice/user-1000.slice/user@1000.service/dbus\x2d:1.2\x2dcom.gexperts.Tilix.slice/dbus-:1.2-com.gexperts.Tilix@0.service

mheon commented 4 years ago

Ahhh.

Could we just disable creating a distinct cgroup for conmon in rootless entirely? I can't think of any advantages to having it in a distinct cgroup.

giuseppe commented 4 years ago

Could we just disable creating a distinct cgroup for conmon in rootless entirely? I can't think of any advantages to having it in a distinct cgroup.

Unfortunately we need that as the user creating the container must own the current cgroup, otherwise it won't be possible to move the container process inside the correct cgroup (you must control both the source and destination cgroup). There are cases where that is not true, as the current cgroup is owned by root. We were not creating it on cgroup v1 as anyway we were not able to use cgroups, this was changed for cgroup v2 as we started to use cgroups also for rootless

containers / podman

rootless container - systemd user service fails on 1.7.0 but is working on 1.6.2 #4833