containers / podman

Podman: A tool for managing OCI containers and pods.
https://podman.io
Apache License 2.0
23.97k stars 2.43k forks source link

After starting a podman machine on Windows WSL with podman 5.2.5 - Error: unable to connect to Podman socket: failed to connect #24570

Open odockal opened 1 week ago

odockal commented 1 week ago

Issue Description

Using Podman 5.2.5 on Windows with WSL provider I can initialize and start the machine (at least it seems running), but podman info then shows an error: unable to connect to Podman socket

Full Error:

Cannot connect to Podman. Please verify your connection to the Linux system using `podman system connection list`, or try `podman machine init` and `podman machine start` to manage a new Linux VM
Error: unable to connect to Podman socket: failed to connect: dial tcp 127.0.0.1:54985: connectex: No connection could be made because the target machine actively refused it.

Steps to reproduce the issue

Steps to reproduce the issue

  1. Install Podman 5.2.5 on Windows 10 (might be occurring elsewhere)
  2. podman machine init --now
  3. podman info

Describe the results you received

 podman machine init --now
Downloading VM image: v20241112031159-5.2-rootfs-amd64.tar.zst: done
Extracting compressed file: podman-machine-default-amd64: done
Importing operating system into WSL (this may take a few minutes on a new WSL install)...
Import in progress, this may take a few minutes.
The operation completed successfully.
Configuring system...
Machine init complete
Starting machine "podman-machine-default"

This machine is currently configured in rootless mode. If your containers
require root permissions (e.g. ports < 1024), or if you run into compatibility
issues with non-podman clients, you can switch using the following command:

        podman machine set --rootful

API forwarding listening on: npipe:////./pipe/docker_engine

Docker API clients default to this address. You do not need to set DOCKER_HOST.
Machine "podman-machine-default" started successfully
PS C:\Users\podmanqe\pde2e> podman info
OS: windows/amd64
provider: wsl
version: 5.2.5

Cannot connect to Podman. Please verify your connection to the Linux system using `podman system connection list`, or try `podman machine init` and `podman machine start` to manage a new Linux VM
Error: unable to connect to Podman socket: failed to connect: dial tcp 127.0.0.1:54985: connectex: No connection could be made because the target machine actively refused it.
PS C:\Users\podmanqe\pde2e> podman machine ls
NAME                    VM TYPE     CREATED        LAST UP            CPUS        MEMORY      DISK SIZE
podman-machine-default  wsl         2 minutes ago  Currently running  2           2GiB        100GiB

Describe the results you expected

Podman info returns expected outcome.

podman info output

podman info
OS: windows/amd64
provider: wsl
version: 5.2.5

Cannot connect to Podman. Please verify your connection to the Linux system using `podman system connection list`, or try `podman machine init` and `podman machine start` to manage a new Linux VM
Error: unable to connect to Podman socket: failed to connect: dial tcp 127.0.0.1:54985: connectex: No connection could be made because the target machine actively refused it.

Podman in a container

No

Privileged Or Rootless

None

Upstream Latest Release

Yes

Additional environment details

Windows 10, x64, Podman 5.2.5.

Base podman image: v20241112031159-5.2-rootfs-amd64.tar.zst

Additional information

Since we have tested podman 5.2.5 on Windows with Podman Desktop 1.14.1 (last week) I can confirm that all worked like a charm. I suspect the base image, which has changed in a meantime (timestamp 20241112).

Luap99 commented 1 week ago

Does 5.3.0 work? 5.2.5 is EOL with the release of 5.3

odockal commented 1 week ago

I can ssh into machine and pull an image from there. The problem gonna be between client and socket.

I tried also to manually remove all possible leftovers:

# I always did podman machine reset -f
PS C:\Users\podmanqe> podman machine reset -f
# this time I also added
PS C:\Users\podmanqe> rm -r .\.config\containers\
PS C:\Users\podmanqe> rm -r .\.local\share\containers\podman\
PS C:\Users\podmanqe> rm -r .\AppData\Roaming\containers\

and it now works correctly. Is it possible that some leftover, ie. connection on one of the config folder caused the problem?

odockal commented 1 week ago

I have the reproducer. But I am not sure if it is a bug or a feature. I am testing podman-remote via Podman Desktop and this is a part of test env. preparation. I am using podman machine to spin up a VM with podman, removing the podman-machine connections and then reusing existing ssh key to create new podman remote connection to get in via podman-remote.

Snippet:

$podmanMachine="podman-machine"
$remoteMachine="remote-machine"
podman machine init --now podman-machine
$json = podman system connection ls --format json | ConvertFrom-Json
foreach ($item in $json) { 
    if ($item.Default -match "True" ) { 
        $name=$($item.Name) 
        $uri=$($item.URI)
        $identity=$($item.Identity)
    }
}
podman system connection rm $podmanMachine
podman system connection rm $podmanMachine-root
rm -r $env:APPDATA\containers\*
rm -r $env:USERPROFILE\.config\containers
podman system connection add $remoteMachine --identity $identity $uri 

At this point, I can test podman-remote via Podman desktop and it works.

to reproduce:

podman machine reset -f # at this point this is not sufficient
podman machine init --now
podman info # Error as mentioned above

Using podman 5.3.0 shows the same result. I think that ssh key used is the same for both machine and it might be a problem (~\.local\share\containers\podman\machine\machine). But any new machine uses the same path to a key...

Luap99 commented 1 week ago

and it now works correctly. Is it possible that some leftover, ie. connection on one of the config folder caused the problem?

That is certainly possible but only if you had a old 4.X podman installed before.

I think it is expected that we share the ssh key. It seems odd that removing and recreating the exact same connection makes it work.

$env:APPDATA\containers*

This is used for containers.conf and the connections file where we store the remote connections.

$env:USERPROFILE.config\containers

This is used for podman machine config files. Removing that means podman machine will no longer see the machine I think.

~.local\share\containers

That is used for actual VM image.

Luap99 commented 1 week ago

I don't have a windows install to test but maybe @l0rd can have a look at this next week.

mil1i commented 1 week ago

I'm also experiencing this on a MBP/macOS; however on v5.3.0.

It appears to work if I run podman as sudo, but not as my own user. I can pull, and run containers via sudo. I can connect via podman machine ssh. Just anything without sudo fails.

I've tried uninstalling and reinstalling. Wiping out ~/.config/containers and ~/.local/share/containers. Haven't spent a lot more time digging into it than that though. I've been playing with OrbStack lately.

I also noticed that Podman Desktop doesn't appear to recognize the machine is running; and request to install podman v5.2.5.

❯ podman info
OS: darwin/arm64
provider: applehv
version: 5.3.0

❯ podman machine init --cpus 6 --memory=8096 --now

❯ podman system connection list
Name                         URI                                                         Identity                                                     Default     ReadWrite
podman-machine-default       ssh://core@127.0.0.1:54700/run/user/501/podman/podman.sock  /Users/username/.local/share/containers/podman/machine/machine  true        true
podman-machine-default-root  ssh://root@127.0.0.1:54700/run/podman/podman.sock           /Users/username/.local/share/containers/podman/machine/machine  false       true

❯ podman machine info
host:
    arch: arm64
    currentmachine: podman-machine-default
    defaultmachine: podman-machine-default
    eventsdir: /var/folders/19/wfx8wt2x17g52mbbf_zkt5c00000gn/T/storage-run-501/podman
    machineconfigdir: /Users/username/.config/containers/podman/machine/applehv
    machineimagedir: /Users/username/.local/share/containers/podman/machine/applehv
    machinestate: Running
    numberofmachines: 1
    os: darwin
    vmtype: applehv
version:
    apiversion: 5.3.0
    version: 5.3.0
    goversion: go1.23.3
    gitcommit: 874bf2c301ecf0ba645f1bb45f81966cc755b7da
    builttime: Tue Nov 12 09:10:17 2024
    built: 1731427817
    osarch: darwin/arm64
    os: darwin

❯ podman machine ls
NAME                     VM TYPE     CREATED       LAST UP            CPUS        MEMORY      DISK SIZE
podman-machine-default*  applehv     16 hours ago  Currently running  6           7.906GiB    100GiB

Cannot communicate with podman host in rootless mode

❯ podman ps
Cannot connect to Podman. Please verify your connection to the Linux system using `podman system connection list`, or try `podman machine init` and `podman machine start` to manage a new Linux VM
Error: unable to connect to Podman socket: failed to connect: ssh: handshake failed: ssh: unable to authenticate, attempted methods [none publickey], no supported methods remain

Can communicate with podman host with root

❯ sudo podman ps
CONTAINER ID  IMAGE       COMMAND     CREATED     STATUS      PORTS       NAMES
Luap99 commented 1 week ago

@mil1i This issue seems specific to windows, do you have a local ssh config? I guess it is more likely you run into https://github.com/containers/podman/issues/24567

cabbirolo commented 1 week ago

I'm seeing something similar in Mac and this possibly will be the same for Linux. The new version of Podman
doesn't appear to expand shortcuts, like "~" which means a user's home directory. It only works when the full path
is given. So for example, if we set the path to the ssh key based on the users home directory using something like
"~/.local/share/containers/podman/machine/machine", we'll get an unable to connect to the socket.
If you set the path to the full path like "/Users/user1/.local/share/containers/podman/machine/machine", Podman
will work fine.

See below (note that the key exists in that that path, but Podman doesn't find it)

user1@C02F18-983AA374 config.d % podman info
OS: darwin/amd64
provider: applehv
version: 5.3.0
Cannot connect to Podman. Please verify your connection to the Linux system using `podman system connection list`,
or try `podman machine init` and `podman machine start` to manage a new Linux VM
Error: unable to connect to Podman socket: failed to read identity "~/.local/share/containers/podman/machine/machine": open ~/.local/share/containers/podman/machine/machine: no such file or directory
carlos@C02F18-983AA374 config.d %
Luap99 commented 1 week ago

@cabbirolo That is also https://github.com/containers/podman/issues/24567

l0rd commented 1 week ago

@odockal I think you should create a VM using something other than podman machine init (vagrant, maybe?). Or find a way to test a remote connection without manually deleting the machine configuration (more on that below). The problem is that you are testing a scenario that we don't support (manually deleting the config directories) and, more importantly, that is artificial (i.e. not something real users do).

Anyway I have tried to follow the instructions to use a machine to setup a remote connection but, after that, every podman command fails with error "No connection could be made because the target machine actively refused it.". But I wouldn't spend time on this, I would prefer helping you to find a more robust way to test the remote connection. For example you could look at how we run podman in "isolated" mode in our systems tests. This would still be an "artificial" test but at least these are supported commands/flags. I hope this helps.

mil1i commented 1 week ago

@mil1i This issue seems specific to windows, do you have a local ssh config? I guess it is more likely you run into https://github.com/containers/podman/issues/24567

@Luap99 That was indeed my issue. I am using 1Password's ssh agent and had the Host * configured.

Was able to resolve my issue by changing it to Host * !127.0.0.1 !localhost