containers / podman

Podman: A tool for managing OCI containers and pods.
https://podman.io
Apache License 2.0
22.82k stars 2.33k forks source link

podman-machine uncategorized flakes #22551

Open edsantiago opened 3 months ago

edsantiago commented 3 months ago

Seeing this very often - much more than the table below shows, because my flake logger only logs once a PR is merged.

           Starting machine "blah blah"

[+1207s]   [FAILED] Timed out after 600.001s.
           Expected process to exit.  It did not.
x x x x x x
machine-linux(4) podman(5) fedora-39-aarch64(4) rootless(5) host(5) sqlite(5)
machine-mac(1) darwin(1)
edsantiago commented 3 months ago
x x x x x x
machine-linux(20) podman(22) fedora-39-aarch64(20) rootless(22) host(22) sqlite(22)
machine-hyperv(1) darwin(1)
machine-mac(1) windows(1)
cevich commented 3 months ago

Another Windows one (on main): https://api.cirrus-ci.com/v1/artifact/task/6723119224717312/html/machine-linux-podman-fedora-39-aarch64-rootless-host-sqlite.log.html

edsantiago commented 3 months ago

Different one: here the test fails, but not via timeout. machine wsl:

  C> podman.exe machine start 492b48671720
  Starting machine "492b48671720"
  your 131072x1 screen size is bogus. expect trouble

  This machine is currently configured in rootless mode. If your containers
  require root permissions (e.g. ports < 1024), or if you run into compatibility
  issues with non-podman clients, you can switch using the following command:

    podman machine set --rootful 492b48671720

  API forwarding listening on: npipe:////./pipe/docker_engine

  Docker API clients default to this address. You do not need to set DOCKER_HOST.
  Error: machine did not transition into running state: ssh error: machine is not listening on ssh port

Is this the same bug? Shall I assign it to this issue?

cevich commented 3 months ago

I seem to remember @baude was working/debugging on this or something related to "machine fails to start" a month or so ago. Dunno if it's the same thing or different, but in terms of getting it fixed, that's who I'd start with.

github-actions[bot] commented 2 months ago

A friendly reminder that this issue had no activity for 30 days.

edsantiago commented 1 month ago

The past 30 days. It's tempting to focus on Mac because that's what's hitting us so hard right now in #23154 and #23157, but this is happening on windows and linux too.

x x x x x x
machine-mac(9) podman(17) darwin(9) rootless(17) host(17) sqlite(17)
machine-hyperv(4) fedora-40-aarch64(4)
machine-linux(4) windows(4)
edsantiago commented 1 month ago

I've just spent ten minutes blindly assigning all podman machine flakes to this issue. I looked at logs for some of them, and, some of them include the timeout, some are different errors. I do not have the time nor interest in opening issues for every podman-machine failure, so the list below is not entirely accurate. Still, I hope it helps in some wah.

Podman machine CI is super broken right now. I hope this helps someone diagnose and fix it.

x x x x x x
machine-mac(82) podman(205) darwin(82) rootless(205) host(205) sqlite(205)
machine-linux(69) windows(54)
machine-hyperv(44) fedora-39-aarch64(37)
machine-wsl(10) fedora-40-aarch64(32)
edsantiago commented 3 weeks ago

Executive decision: this issue is now a one-stop catchall for all podman-machine flakes. There are too many flakes, I don't have the time to look at each one, so I'm just doing an automatic lump of all flakes with "machine" in the test name into this issue. If anyone cares about podman-machine, please feel free to start tackling these.

Here's the last two weeks. Have fun.

x x x x x x
machine-hyperv(84) podman(164) windows(112) rootless(164) host(164) sqlite(164)
machine-mac(42) darwin(42)
machine-wsl(28) fedora-40-aarch64(10)
machine-linux(10)