containers / podman

Podman: A tool for managing OCI containers and pods.
https://podman.io
Apache License 2.0
23.72k stars 2.41k forks source link

Podman fails to daemon postgres on Windows with run -d or -dt, Mac and Linux daemonize fine no issue #13965

Closed AddictArts closed 2 years ago

AddictArts commented 2 years ago

/kind bug

When using podman run -dt --name postgres -p 5432:5432 -e POSTGRES_PASSWORD=postgres docker.io/library/postgres:14.2 on Windows the process will exit without an error as though it received a shutdown. The same command on Linux and Mac OS it will properly daemonize. Also, other podman run -d containers do run as daemons on WIndows WSL2 backend.

Steps to reproduce the issue:

  1. Execute the above on WIndows podman run -dt --name postgres --network apls -p 5432:5432 -e POSTGRES_PASSWORD=postgres -v postgres:/var/lib/postgresql/data docker.io/library/postgres:14.2

or -d ends with the same results. Also --network=host does not work either.

Describe the results you received:

It just stop running like a shutdown was received with no errors in podman logs etc.

Describe the results you expected:

Stay as a daemon like Linux and Mac OS.

Output of podman version:

4.0.3

Output of podman info --debug:

host:
  arch: amd64
  buildahVersion: 1.24.3
  cgroupControllers: []
  cgroupManager: cgroupfs
  cgroupVersion: v1
  conmon:
    package: conmon-2.1.0-2.fc35.x86_64
    path: /usr/bin/conmon
    version: 'conmon version 2.1.0, commit: '
  cpus: 16
  distribution:
    distribution: fedora
    variant: container
    version: "35"
  eventLogger: file
  hostname: DESKTOP-73DQB37
  idMappings:
    gidmap:
    - container_id: 0
      host_id: 1000
      size: 1
    - container_id: 1
      host_id: 100000
      size: 65536
    uidmap:
    - container_id: 0
      host_id: 1000
      size: 1
    - container_id: 1
      host_id: 100000
      size: 65536
  kernel: 5.10.16.3-microsoft-standard-WSL2
  linkmode: dynamic
  logDriver: journald
  memFree: 50316668928
  memTotal: 53706846208
  networkBackend: netavark
  ociRuntime:
    name: crun
    package: crun-1.4.4-1.fc35.x86_64
    path: /usr/bin/crun
    version: |-
      crun version 1.4.4
      commit: 6521fcc5806f20f6187eb933f9f45130c86da230
      spec: 1.0.0
      +SYSTEMD +SELINUX +APPARMOR +CAP +SECCOMP +EBPF +CRIU +YAJL
  os: linux
  remoteSocket:
    exists: true
    path: /run/user/1000/podman/podman.sock
  security:
    apparmorEnabled: false
    capabilities: CAP_CHOWN,CAP_DAC_OVERRIDE,CAP_FOWNER,CAP_FSETID,CAP_KILL,CAP_NET_BIND_SERVICE,CAP_SETFCAP,CAP_SETGID,CAP_SETPCAP,CAP_SETUID,CAP_SYS_CHROOT
    rootless: true
    seccompEnabled: true
    seccompProfilePath: /usr/share/containers/seccomp.json
    selinuxEnabled: false
  serviceIsRemote: true
  slirp4netns:
    executable: /usr/bin/slirp4netns
    package: slirp4netns-1.1.12-2.fc35.x86_64
    version: |-
      slirp4netns version 1.1.12
      commit: 7a104a101aa3278a2152351a082a6df71f57c9a3
      libslirp: 4.6.1
      SLIRP_CONFIG_VERSION_MAX: 3
      libseccomp: 2.5.3
  swapFree: 13958643712
  swapTotal: 13958643712
  uptime: 30h 56m 17.1s (Approximately 1.25 days)
plugins:
  log:
  - k8s-file
  - none
  - passthrough
  - journald
  network:
  - bridge
  - macvlan
  volume:
  - local
registries:
  search:
  - registry.fedoraproject.org
  - registry.access.redhat.com
  - docker.io
  - quay.io
store:
  configFile: /home/user/.config/containers/storage.conf
  containerStore:
    number: 2
    paused: 0
    running: 0
    stopped: 2
  graphDriverName: overlay
  graphOptions:
    overlay.mount_program:
      Executable: /usr/bin/fuse-overlayfs
      Package: fuse-overlayfs-1.7.1-2.fc35.x86_64
      Version: |-
        fusermount3 version: 3.10.5
        fuse-overlayfs: version 1.7.1
        FUSE library version 3.10.5
        using FUSE kernel interface version 7.31
  graphRoot: /home/user/.local/share/containers/storage
  graphStatus:
    Backing Filesystem: extfs
    Native Overlay Diff: "false"
    Supports d_type: "true"
    Using metacopy: "false"
  imageCopyTmpDir: /var/tmp
  imageStore:
    number: 3
  runRoot: /run/user/1000/containers
  volumePath: /home/user/.local/share/containers/storage/volumes
version:
  APIVersion: 4.0.3
  Built: 1648837274
  BuiltTime: Fri Apr  1 11:21:14 2022
  GitCommit: ""
  GoVersion: go1.16.15
  OsArch: linux/amd64
  Version: 4.0.3

Have you tested with the latest version of Podman and have you checked the Podman Troubleshooting Guide? (https://github.com/containers/podman/blob/main/troubleshooting.md)

Yes

vrothberg commented 2 years ago

Thanks for reaching out, @AddictArts, and apologies for the silence.

@containers/podman-maintainers, any ideas?

rhatdan commented 2 years ago

@n1hility PTAL

n1hility commented 2 years ago

(copied from 13966)

Sure can @n1hility

From a Windows C:\ prompt using CMD.exe run podman run -d --name postgres -p 5432:5432 -e POSTGRES_PASSWORD=postgres docker.io/library/postgres:14.2

Postgres will not stay running. This is easier than using the one in this issue since it requires a running postgres Also note you cannot have any others running. If another is running, example without -d above then launch another on a different port, that one will stay a daemon oddly. I know that sounds odd, but be sure podman ps shows nothing.

Note: After the above run and it terminating shortly after, if I run podman start --attach postgres then it will continue running of course showing the log or stdout in the CMD. If you CTRL-C, the process will terminate. Also, note -dt does not help either for the initial run.

Sorry for the delay in replying. I am having a really hard time reproducing this.

After it exits immediately, if you do echo %errorlevel%, do you see a non-zero value?

When it fails, if you do a wsl -l -v, do you see wsl running?

If you run it from powershell instead of CMD does it work (BTW I highly recommend installing windows terminal BTW, super useful: winget install Microsoft.WindowsTerminal)?

Just to confirm if you run podman --version on the windows prompt you also see a 4.0.3 version?

As to wsl -d podman-machine-default, it is running and it does have poman. However a podman ps or a podman containers list -a does not show the Windows podman executed containers.

Sorry I forgot to mention to do a su user on the wsl prompt first. By default podman is configured to use rootless networking, but when you enter on the wsl prompt you are root, so you need to switch users to the user user to get to the same underlying source.

AddictArts commented 2 years ago

Thanks for the help @n1hility

C:\>echo %errorlevel%
0

There is no error it just exits gracefully.

Yes podman-machine-default remains running.

C:\>wsl -l -v
  NAME                      STATE           VERSION
* Ubuntu-20.04              Running         2
  podman-machine-default    Running         2

If I open wsl -d podman-machine-default and su user and keep that open the podman postgres does not exit and continues to run. Also podman container list -a does show it as expected. It appears all TTY's for user close and that exits the process or something like that. Something like screen or tmux would be a work around, but I know that is not a real solution and they don't exist in the limited podman-machie-default.

If I close the session, by say exit multiple times, then postgres will exit. Hope this helps.

n1hility commented 2 years ago

@AddictArts glad to help, and thanks for your patience.

Is it only postgres that has the issue? Do you observe the same behavior with other daemons (e.g. httpd, nginx etc)

Can you check the output of dmesg and see if you see any sort of oom_killer events or something that else that might explain a process being terminated?

Do you have any special wslconfig settings (memory constraints etc)?

If you run as rootfull do you observe the same behavior? You can do so without switching the VM by adding a -c to specify the rootfull connection like so

podman -c podman-machine-default-root run .....

Be sure to keep using it when running ps / logs etc

AddictArts commented 2 years ago

@n1hility No not only postgres. That other issue you helped with, network between wsl and Windows, hasura did exactly the same thing. If I started postgres with podman and then ran hasura using that podman instance, postgres would close and the hasura would stay running. If I run hasura pointing at a Windows service postgres, it will gracefully close and exit just like Postgres does as I describe.

AddictArts commented 2 years ago

@n1hility I run podman rootless fyi

n1hility commented 2 years ago

@AddictArts thanks on confirming its multiple types of containers. In addition to my questions above I forgot to ask if you tried a full system restart. I assume you already did but I just want to mention that WSL caches the kernel and a hyper-v instance. You can force a kernel restart with --shutdown, but sometimes Hyper-V can have issues as well.

The reason I asked about trying rootfull is the implementations are subtly different in a few areas. I'm hoping to find another clue as to why this is breaking for you.

AddictArts commented 2 years ago

@n1hility No hyper-v. WSL no longer needs it. So, I did not install it. Yes a full restart was performed. As mentioned it appears the TTY's go away and the process exits due to that. Thanks

n1hility commented 2 years ago

@AddictArts right to be clear I was just referring to the internal dependency on the core hyper-v hypervisor layer (not the full hyper-v feature and tool chain), which yes you don't need.

For a -d with no -t there wouldn't be a tty but the behavior would match session termination. Can you give the following try:

podman machine stop

wsl --shutdown
wsl -d podman-machine-default 
# touch /var/lib/systemd/linger/user
# chown user:user /var/lib/systemd/linger/user
# chmod 644 /var/lib/systemd/linger/user
# exit

wsl --shutdown
podman machine start

Then retry and see if that solves the issue. I have a feeling it will.

BTW if it doesn't, if you could just confirm that

wsl -d podman-machine-default
# su user
$ loginctl user-status

You should see linger is "yes"

n1hility commented 2 years ago

Knowing this was the likely issue I was finally able to reproduce and will fix this in 4.1.1. You can use the above fix until then.

AddictArts commented 2 years ago

Hi sorry @n1hility I've been traveling and gone. Looks like the issue is resolved. Thanks

[root /]# su user
[user /]$ loginctl user-status
Failed to execute 'pager', using next fallback pager: Permission denied
Failed to execute 'less', using next fallback pager: Permission denied
user (1000)
           Since: Fri 2022-05-20 15:43:55 PDT; 11min ago
           State: active
        Sessions: *c18
          Linger: yes
            Unit: user-1000.slice
                  ├─session-c18.scope
                  │ ├─602 su user
                  │ ├─603 bash
                  │ ├─622 loginctl user-status
                  │ └─623 more
                  └─user@1000.service
                    ├─app.slice
                    │ ├─dbus-broker.service
                    │ │ ├─122 /usr/bin/dbus-broker-launch --scope user
                    │ │ └─123 dbus-broker --log 4 --controller 9 --machine-id e158b87efc4e4362b6ab68681717dbf5 --max-bytes 100000000000000 --max-fds 25000000000000 --max-ma
tches 5000000000
                    │ └─linger-example.service
                    │   └─44 /usr/bin/sleep infinity
                    ├─init.scope
                    │ ├─36 /usr/lib/systemd/systemd --user
                    │ └─37 "(sd-pam)"
                    └─user.slice
                      └─podman-pause-7fd93bb8.scope
                        └─100 catatonit -P
n1hility commented 2 years ago

@AddictArts cool! thanks for your patience in tracking that one down