containers / podman

Podman: A tool for managing OCI containers and pods.
https://podman.io
Apache License 2.0
23.84k stars 2.42k forks source link

Podman machine does not stop correctly while running a container #22515

Closed cbr7 closed 4 months ago

cbr7 commented 6 months ago

Issue Description

On version 5.0.2 on macOS it seems that it's not possible to correctly stop the podman machine if it has at least an active container running.

Steps to reproduce the issue

Steps to reproduce the issue

  1. Have podman 5.0.2 installed
  2. Create a podman machine.
  3. pull and image and run it as a container.
  4. after container start up try to stop the podman machine
  5. Notice that "Error: failed waiting for vm to stop" error is thrown.
  6. At this point the podman machine is still showing as running in podman machine list but running podman images throws the following error: "Cannot connect to Podman. Please verify your connection to the Linux system using podman system connection list, or try podman machine init and podman machine start to manage a new Linux VM Error: unable to connect to Podman socket: failed to connect: ssh: handshake failed: read tcp 127.0.0.1:58659->127.0.0.1:53782: read: connection reset by peer"

Describe the results you received

Error thrown when stopping podman machine

Describe the results you expected

Podman machine successfully stops

podman info output

Error: failed waiting for vm to stop

Error: failed waiting for vm to stopode 125

============================================

vladimirlazar@Vladimirs-MacBook-Pro-2 ~ % podman images
Cannot connect to Podman. Please verify your connection to the Linux system using `podman system connection list`, or try `podman machine init` and `podman machine start` to manage a new Linux VM
Error: unable to connect to Podman socket: failed to connect: ssh: handshake failed: read tcp 127.0.0.1:56370->127.0.0.1:53782: read: connection reset by peer

Podman in a container

No

Privileged Or Rootless

Privileged

Upstream Latest Release

Yes

Additional environment details

vladimirlazar@Vladimirs-MacBook-Pro-2 ~ % podman version Client: Podman Engine Version: 5.0.2 API Version: 5.0.2 Go Version: go1.22.2 Git Commit: 3304dd95b8978a8346b96b7d43134990609b3b29 Built: Wed Apr 17 21:13:18 2024 OS/Arch: darwin/arm64

Server: Podman Engine Version: 5.0.2 API Version: 5.0.2 Go Version: go1.21.9 Built: Wed Apr 17 02:00:00 2024 OS/Arch: linux/arm64 vladimirlazar@Vladimirs-MacBook-Pro-2 ~ % clear vladimirlazar@Vladimirs-MacBook-Pro-2 ~ % podman version Client: Podman Engine Version: 5.0.2 API Version: 5.0.2 Go Version: go1.22.2 Git Commit: 3304dd95b8978a8346b96b7d43134990609b3b29 Built: Wed Apr 17 21:13:18 2024 OS/Arch: darwin/arm64

Server: Podman Engine Version: 5.0.2 API Version: 5.0.2 Go Version: go1.21.9 Built: Wed Apr 17 02:00:00 2024 OS/Arch: linux/arm64 vladimirlazar@Vladimirs-MacBook-Pro-2 ~ % podman info host: arch: arm64 buildahVersion: 1.35.3 cgroupControllers:

Additional information

Seems to happen consistently on macOS, but was not able to reproduce on Windows 11.

benoitf commented 6 months ago

@cbr7 could you add the image you're using / pulling /running

cbr7 commented 6 months ago

@benoitf I was able to reproduce the issue with the image ghcr.io/linuxcontainers/alpine:latest.

benoitf commented 6 months ago
$ podman machine start
$ podman run --rm -it fedora

another terminal:

podman machine stop

then it's delayed by 1mn30 image

Luap99 commented 6 months ago

From some internal discussion:

  1. podman machine stop should wait longer (at least 90 seconds) as shutdown can be delayed for many reason.
  2. Investigate a better way to stop containers when they don't react to sigterm (the default podman timeout is 10s) so we should likely not rely on systemd to stop it and wait 90s.
mheon commented 6 months ago

For podman machine possibly investigate reducing the 90s systemd timeout as well? When I want the VM down, I want it down quickly, and it's unlikely that containers in a machine VM are production-critical - early SIGKILL shouldn't hurt that much.

github-actions[bot] commented 5 months ago

A friendly reminder that this issue had no activity for 30 days.

odockal commented 4 months ago

Any update on the issue?

Luap99 commented 4 months ago

Yes for 2, https://github.com/containers/podman/pull/23064 fixes the long stop systemd timeout issue when the container does not exit on sigterm.

For 1 I can open a PR to increase the timeout. I guess at some point (maybe after 90s) we should terminate the VM forcefully and print a warning. I don't think machine stop should ever return an error if the shutdown takes to long.

Luap99 commented 4 months ago

Feel free to test if https://github.com/containers/podman/pull/23097 works for you

odockal commented 4 months ago

@Luap99 Thanks! @cbr7 Can you take a look, please?

cbr7 commented 4 months ago

@odockal sure