containers / podman

Podman: A tool for managing OCI containers and pods.
https://podman.io
Apache License 2.0
24k stars 2.43k forks source link

Error: machine did not transition into running state: ssh error: exit status 255 #19611

Open chevdor opened 1 year ago

chevdor commented 1 year ago

Issue Description

After each reboot, the podman machine fails starting unless qemu is killed.

Steps to reproduce the issue

Steps to reproduce the issue

  1. podman machine start

Describe the results you received

Starting machine "podman-machine-default"
Waiting for VM ...
Error: machine did not transition into running state: ssh error: exit status 255

Describe the results you expected

The podman machine starts

podman info output

host:
  arch: amd64
  buildahVersion: 1.31.0
  cgroupControllers:
  - cpu
  - io
  - memory
  - pids
  cgroupManager: systemd
  cgroupVersion: v2
  conmon:
    package: conmon-2.1.7-2.fc38.x86_64
    path: /usr/bin/conmon
    version: 'conmon version 2.1.7, commit: '
  cpuUtilization:
    idlePercent: 98.55
    systemPercent: 1.02
    userPercent: 0.43
  cpus: 4
  databaseBackend: boltdb
  distribution:
    distribution: fedora
    variant: coreos
    version: "38"
  eventLogger: journald
  freeLocks: 2038
  hostname: localhost.localdomain
  idMappings:
    gidmap:
    - container_id: 0
      host_id: 1000
      size: 1
    - container_id: 1
      host_id: 100000
      size: 1000000
    uidmap:
    - container_id: 0
      host_id: 501
      size: 1
    - container_id: 1
      host_id: 100000
      size: 1000000
  kernel: 6.4.7-200.fc38.x86_64
  linkmode: dynamic
  logDriver: journald
  memFree: 32334708736
  memTotal: 32849022976
  networkBackend: netavark
  networkBackendInfo:
    backend: netavark
    dns:
      package: aardvark-dns-1.7.0-1.fc38.x86_64
      path: /usr/libexec/podman/aardvark-dns
      version: aardvark-dns 1.7.0
    package: netavark-1.7.0-1.fc38.x86_64
    path: /usr/libexec/podman/netavark
    version: netavark 1.7.0
  ociRuntime:
    name: crun
    package: crun-1.8.6-1.fc38.x86_64
    path: /usr/bin/crun
    version: |-
      crun version 1.8.6
      commit: 73f759f4a39769f60990e7d225f561b4f4f06bcf
      rundir: /run/user/501/crun
      spec: 1.0.0
      +SYSTEMD +SELINUX +APPARMOR +CAP +SECCOMP +EBPF +CRIU +LIBKRUN +WASM:wasmedge +YAJL
  os: linux
  pasta:
    executable: /usr/bin/pasta
    package: passt-0^20230625.g32660ce-1.fc38.x86_64
    version: |
      pasta 0^20230625.g32660ce-1.fc38.x86_64
      Copyright Red Hat
      GNU Affero GPL version 3 or later <https://www.gnu.org/licenses/agpl-3.0.html>
      This is free software: you are free to change and redistribute it.
      There is NO WARRANTY, to the extent permitted by law.
  remoteSocket:
    exists: true
    path: /run/user/501/podman/podman.sock
  security:
    apparmorEnabled: false
    capabilities: CAP_CHOWN,CAP_DAC_OVERRIDE,CAP_FOWNER,CAP_FSETID,CAP_KILL,CAP_NET_BIND_SERVICE,CAP_SETFCAP,CAP_SETGID,CAP_SETPCAP,CAP_SETUID,CAP_SYS_CHROOT
    rootless: true
    seccompEnabled: true
    seccompProfilePath: /usr/share/containers/seccomp.json
    selinuxEnabled: true
  serviceIsRemote: true
  slirp4netns:
    executable: /usr/bin/slirp4netns
    package: slirp4netns-1.2.0-12.fc38.x86_64
    version: |-
      slirp4netns version 1.2.0
      commit: 656041d45cfca7a4176f6b7eed9e4fe6c11e8383
      libslirp: 4.7.0
      SLIRP_CONFIG_VERSION_MAX: 4
      libseccomp: 2.5.3
  swapFree: 0
  swapTotal: 0
  uptime: 0h 3m 22.00s
plugins:
  authorization: null
  log:
  - k8s-file
  - none
  - passthrough
  - journald
  network:
  - bridge
  - macvlan
  - ipvlan
  volume:
  - local
registries:
  search:
  - docker.io
store:
  configFile: /var/home/core/.config/containers/storage.conf
  containerStore:
    number: 0
    paused: 0
    running: 0
    stopped: 0
  graphDriverName: overlay
  graphOptions: {}
  graphRoot: /var/home/core/.local/share/containers/storage
  graphRootAllocated: 106769133568
  graphRootUsed: 10838384640
  graphStatus:
    Backing Filesystem: xfs
    Native Overlay Diff: "true"
    Supports d_type: "true"
    Using metacopy: "false"
  imageCopyTmpDir: /var/tmp
  imageStore:
    number: 72
  runRoot: /run/user/501/containers
  transientStore: false
  volumePath: /var/home/core/.local/share/containers/storage/volumes
version:
  APIVersion: 4.6.0
  Built: 1689942206
  BuiltTime: Fri Jul 21 14:23:26 2023
  GitCommit: ""
  GoVersion: go1.20.6
  Os: linux
  OsArch: linux/amd64
  Version: 4.6.0

### Podman in a container

No

### Privileged Or Rootless

Rootless

### Upstream Latest Release

Yes

### Additional environment details

The workaround is:

killall qemu-system-x86_64 podman machine start



and then it starts fine.

### Additional information

Additional information like issue happens only occasionally or issue happens with a particular architecture or on a particular setting
baude commented 1 year ago

does podman --log-level=debug ... reveal anything

chevdor commented 1 year ago

I ran again into the issue the last days. It worked today so I cannot tell anymore.

xordspar0 commented 1 year ago

I have this issue every day. I used to experience https://github.com/containers/podman/issues/17403 but since upgrading and recreating my VM it seems I have a different issue. Whereas #17403 would present as machine start not working at all or only once in a long while, this issue seems fixable by just killing the VM and trying again.

The debug logs don't reveal a lot:

INFO[0000] podman filtering at log level debug
DEBU[0000] Using Podman machine with `qemu` virtualization provider
Starting machine "podman-machine-default"
[/usr/local/opt/podman/libexec/podman/gvproxy -listen-qemu unix:///var/folders/_j/ltwgd27d71g5n15hg52z8ny80000gp/T/podman/qmp_podman-machine-default.sock -pid-file /var/folders/_j/ltwgd27d71g5n15hg52z8ny80000gp/T/podman/podman-machine-default_proxy.pid -ssh-port 62983 -forward-sock /Users/Z002KR6/.local/share/containers/podman/machine/qemu/podman.sock -forward-dest /run/user/502/podman/podman.sock -forward-user core -forward-identity /Users/Z002KR6/.ssh/podman-machine-default --debug]
DEBU[0000] qemu cmd: [/usr/local/bin/qemu-system-x86_64 -m 2048 -smp 1 -fw_cfg name=opt/com.coreos/config,file=/Users/Z002KR6/.config/containers/podman/machine/qemu/podman-machine-default.ign -qmp unix:/var/folders/_j/ltwgd27d71g5n15hg52z8ny80000gp/T/podman/qmp_podman-machine-default.sock,server=on,wait=off -netdev socket,id=vlan,fd=3 -device virtio-net-pci,netdev=vlan,mac=5a:94:ef:e4:0c:ee -device virtio-serial -chardev socket,path=/var/folders/_j/ltwgd27d71g5n15hg52z8ny80000gp/T/podman/podman-machine-default_ready.sock,server=on,wait=off,id=apodman-machine-default_ready -device virtserialport,chardev=apodman-machine-default_ready,name=org.fedoraproject.port.0 -pidfile /var/folders/_j/ltwgd27d71g5n15hg52z8ny80000gp/T/podman/podman-machine-default_vm.pid -machine q35,accel=hvf:tcg -cpu host -virtfs local,path=/Users,mount_tag=vol0,security_model=none -virtfs local,path=/private,mount_tag=vol1,security_model=none -virtfs local,path=/var/folders,mount_tag=vol2,security_model=none -drive if=virtio,file=/Users/Z002KR6/.local/share/containers/podman/machine/qemu/podman-machine-default_fedora-coreos-38.20230902.2.0-qemu.x86_64.qcow2]
Waiting for VM ...
Error: machine did not transition into running state
DEBU[0027] Shutting down engines

The qemu console shows the VM booting up normally. It finishes pretty quickly. Some time after the VM is up and healthy, podman gives up and shows the above message.

shankarpnsn commented 1 year ago

I'm having the same issue as above. Recreating the VM doesn't help, and the delay-workaround in [ https://github.com/containers/podman/issues/17403#issuecomment-1536636874] doesn't work anymore.

violin0622 commented 1 year ago

I ran into same issue recently. podman debug log prints:

INFO[0000] podman filtering at log level debug          
DEBU[0000] Using Podman machine with `qemu` virtualization provider 
Starting machine "podman-machine-default"
[/opt/homebrew/opt/podman/libexec/podman/gvproxy -listen-qemu unix:///var/folders/52/32qxwg251tvcdtqbk8c6ll0w0000gp/T/podman/qmp_podman-machine-default.sock -pid-file /var/folders/52/32qxwg251tvcdtqbk8c6ll0w0000gp/T/podman/podman-machine-default_proxy.pid -ssh-port 49852 -forward-sock /Users/violin/.local/share/containers/podman/machine/qemu/podman.sock -forward-dest /run/user/502/podman/podman.sock -forward-user core -forward-identity /Users/violin/.ssh/podman-machine-default --debug]
DEBU[0000] qemu cmd: [/opt/homebrew/bin/qemu-system-aarch64 -m 2048 -smp 1 -fw_cfg name=opt/com.coreos/config,file=/Users/violin/.config/containers/podman/machine/qemu/podman-machine-default.ign -qmp unix:/var/folders/52/32qxwg251tvcdtqbk8c6ll0w0000gp/T/podman/qmp_podman-machine-default.sock,server=on,wait=off -netdev socket,id=vlan,fd=3 -device virtio-net-pci,netdev=vlan,mac=5a:94:ef:e4:0c:ee -device virtio-serial -chardev socket,path=/var/folders/52/32qxwg251tvcdtqbk8c6ll0w0000gp/T/podman/podman-machine-default_ready.sock,server=on,wait=off,id=apodman-machine-default_ready -device virtserialport,chardev=apodman-machine-default_ready,name=org.fedoraproject.port.0 -pidfile /var/folders/52/32qxwg251tvcdtqbk8c6ll0w0000gp/T/podman/podman-machine-default_vm.pid -accel hvf -accel tcg -cpu host -M virt,highmem=on -drive file=/opt/homebrew/share/qemu/edk2-aarch64-code.fd,if=pflash,format=raw,readonly=on -drive file=/Users/violin/.local/share/containers/podman/machine/qemu/podman-machine-default_ovmf_vars.fd,if=pflash,format=raw -virtfs local,path=/Users,mount_tag=vol0,security_model=none -virtfs local,path=/private,mount_tag=vol1,security_model=none -virtfs local,path=/var/folders,mount_tag=vol2,security_model=none -drive if=virtio,file=/Users/violin/.local/share/containers/podman/machine/qemu/podman-machine-default_fedora-coreos-38.20230722.2.1-qemu.aarch64.qcow2] 
Waiting for VM ...
Error: machine did not transition into running state
DEBU[0029] Shutting down engines  

And qemu terminal prints:

截屏2023-10-07 16 04 43

It seems that podman didn't login into qemu linux .

My podman version is latest 4.7.0, qemu version is 8.1.1

jonnyz32 commented 1 year ago

I'm also hitting this issue

blva commented 1 year ago

also experiencing this on MacOS x86

jfayot commented 1 year ago

Same issue here on MacOS M2.

However, the script provided by @jamesmikesell here works fine.

jeankhawand commented 1 year ago

Same M1

baude commented 1 year ago

couple of questions, any chance you all have the extra fast m2s? and does it reproduce every time for folks?

rishabhj1717 commented 1 year ago

Facing a similar issue on Macbook pro using an intel chip.

podman version 4.7.1 qemu-img version 8.1.1

jamars commented 1 year ago

I got it working by upgrading podman with brew install podman After upgrading, I have:

Prior to the upgrade I tried resetting podman's (default) machine using podman machine init; don't know if it helped with making the upgrade work, but when I tried to run the new machine, before the upgrade, it kept failing.

utsw-nicholaideguzman commented 1 year ago

Similar workaround as @chevdor on my M2 when using Podman Desktop

  1. Close Podman Desktop, if running
  2. Using Activity Monitor, end qemu-system-aarach64
  3. Relaunch Podman Desktop
baude commented 1 year ago

I think Podman 4.8 will solve this problem ... RC1 was cut on Monday and we expect a release cut next week.

chrishoina commented 1 year ago

Similar workaround as @chevdor on my M2 when using Podman Desktop

1. Close Podman Desktop, if running

2. Using Activity Monitor, end qemu-system-aarach64

3. Relaunch Podman Desktop

The second step also resolved the issue when starting the machine from a terminal session. I was getting the same exact error. Thanks!

chevdor commented 12 months ago

@chrishoina could you confirm the version you used ? It would be interesting to know whether you bumped into the issue with the new version or if you still were on an older version.

chrishoina commented 12 months ago

@chevdor Sorry for the delay. I'm on v4.8.0. And I'm still experiencing the same issue. The above steps help though. Screenshot 2023-12-06 at 10 52 35 AM Although, when Podman desktop fails, I do see this (that is the Run Podman button).

rshillington commented 11 months ago

For what it's worth

podman machine rm
podman machine init
podman machine start

worked for me. I only started to experience this issue after a recent os update.

ammarik commented 7 months ago

I've had the same problem, on MacOS version 13.6.6 in Podman 5.0.1. None of the suggestions above helped me. Only the following worked for me:

podman machine stop
podman machine rm
rm -rf ~/.config/containers/
rm -rf ~/.local/share/containers
podman machine init
podman machine start

Alternatively, if that doesn't help, I would also try reinstalling Podman:

podman machine stop
podman machine rm
brew uninstall podman 
rm -rf ~/.config/containers/
rm -rf ~/.local/share/containers
brew install podman
podman machine init
podman machine start
nguyentthai96 commented 6 months ago

M1 same on violin0622 stuck when login, I see when debug

podman machine stop podman machine rm

Update podman version brew install podman podman machine init podman machine start And re-init new qemu after running normally