crc-org / crc

CRC is a tool to help you run containers. It manages a local OpenShift 4.x cluster, Microshift or a Podman VM optimized for testing and development purposes
https://crc.dev
Apache License 2.0
1.24k stars 233 forks source link

[BUG] crc seems to conflict with Podman on Windows #3855

Open jeffmaury opened 10 months ago

jeffmaury commented 10 months ago

General information

CRC version

CRC version: 2.26.0+233df0
OpenShift version: 4.13.9
Podman version: 4.4.4

CRC status

# Put `crc status --log-level debug` output here

CRC config

DEBU CRC version: 2.26.0+233df0
DEBU OpenShift version: 4.13.9
DEBU Podman version: 4.4.4
DEBU Running 'crc status'
CRC VM:          Stopped
MicroShift:      Stopped (v4.13.9)
RAM Usage:       0B of 0B
Disk Usage:      0B of 0B (Inside the CRC VM)
Cache Usage:     34.7GB
Cache Directory: C:\Users\Jeff\.crc\cache

Host Operating System


Nom de l'h鍍e:                              DESKTOP-JEFF
Nom du syst確e d'exploitation:              Microsoft Windows 11 Professionnel
Version du syst確e:                         10.0.22621 N/A build 22621
Fabricant du syst確e d'exploitation:        Microsoft Corporation
Configuration du syst確e d'exploitation:    Station de travail autonome
Type de build du syst確e d'exploitation:    Multiprocessor Free
PropriUaire enregistr・                    Jeff MAURY
Organisation enregistrF:                   
Identificateur de produit:                  00330-80819-37372-AA576
Date d'installation originale:              09/11/2022, 16:39:06
Heure de dNarrage du syst確e:              02/10/2023, 10:47:18
Fabricant du syst確e:                       LENOVO
Mod獲e du syst確e:                          20TJS2F43N
Type du syst確e:                            x64-based PC
Processeur(s):                              1 processeur(s) install・s).
                                            [01]: Intel64 Family 6 Model 165 Stepping 2 GenuineIntel ~2310 MHz
Version du BIOS:                            LENOVO N2VET29W (1.14 ), 15/03/2021
RQertoire Windows:                         C:\Windows
RQertoire syst確e:                         C:\Windows\system32
PSiphSique d'amor㌢ge:                    \Device\HarddiskVolume1
Option rHionale du syst確e:                fr;Fran㌢is (France)
Param閣res rHionaux d'entrF:              fr;Fran㌢is (France)
Fuseau horaire:                             (UTC+01:00) Bruxelles, Copenhague, Madrid, Paris
MNoire physique totale:                    32525 Mo
MNoire physique disponible:                11039 Mo
MNoire virtuelle: taille maximale:        61197 Mo
MNoire virtuelle: disponible:             28953 Mo
MNoire virtuelle: en cours d'utilisation: 32244 Mo
Emplacements des fichiers d'Dhange:        C:\pagefile.sys
Domaine:                                    WORKGROUP
Serveur d'ouverture de session:             \\DESKTOP-JEFF
Correctif(s):                               4 Corrections installFs.
                                            [01]: KB5029921
                                            [02]: KB5012170
                                            [03]: KB5030219
                                            [04]: KB5028756
Carte(s) rTeau:                            5 carte(s) rTeau installF(s).
                                            [01]: Intel(R) Wi-Fi 6 AX201 160MHz
                                                  Nom de la connexion: Wi-Fi
                                                  DHCP activ・:         Oui
                                                  Serveur DHCP:        192.168.1.254
                                                  Adresse(s) IP
                                                  [01]: 192.168.1.187
                                                  [02]: fe80::5102:6cd:7e09:85b2
                                                  [03]: 2a01:e34:ec7e:3020:8484:9c06:53b9:4b53
                                                  [04]: 2a01:e34:ec7e:3020:5116:66ca:8d7a:292a
                                            [02]: Bluetooth Device (Personal Area Network)
                                                  Nom de la connexion: Connexion rTeau Bluetooth
                                                  腎at:                Support dDonnect・
                                            [03]: Wintun Userspace Tunnel
                                                  Nom de la connexion: OpenVPN Wintun
                                                  腎at:                Support dDonnect・
                                            [04]: TAP-Windows Adapter V9
                                                  Nom de la connexion: OpenVPN TAP-Windows6
                                                  腎at:                Support dDonnect・
                                            [05]: OpenVPN Data Channel Offload
                                                  Nom de la connexion: OpenVPN Data Channel Offload
                                                  DHCP activ・:         Oui
                                                  Serveur DHCP:        255.255.255.255
                                                  Adresse(s) IP
                                                  [01]: 10.39.193.52
                                                  [02]: fe80::7c66:3db3:58ac:e5cd
                                                  [03]: fd10:39:192:1::1133
Configuration requise pour Hyper-V:         Un hyperviseur a U・dUect・ Les fonctionnalitT nDessaires ・Hyper-V ne seront pas affichFs.

Steps to reproduce

  1. crc start

Expected

crc should start

Actual

INFO Checking minimum RAM requirements
INFO Checking if running in a shell with administrator rights
INFO Checking Windows release
INFO Checking Windows edition
INFO Checking if Hyper-V is installed and operational
INFO Checking if Hyper-V service is enabled
INFO Checking if crc-users group exists
INFO Checking if current user is in crc-users and Hyper-V admins group
INFO Checking if vsock is correctly configured
INFO Checking if the daemon task powershell script is present
INFO Checking if the daemon task is installed
INFO Checking if the daemon task is running
INFO Checking admin helper service is running
INFO Downloading bundle: crc_microshift_hyperv_4.13.9_amd64...
1.45 GiB / 1.45 GiB [------------------------------------------------------------------------------] 100.00% 7.36 MiB/s
INFO Extracting bundle: crc_microshift_hyperv_4.13.9_amd64...
crc.vhdx:  4.48 GiB / 4.48 GiB [------------------------------------------------------------------------------] 100.00%
oc.exe:  114.17 MiB / 114.17 MiB [----------------------------------------------------------------------------] 100.00%
CRC requires a pull secret to download content from Red Hat.
You can copy it from the Pull Secret section of https://console.redhat.com/openshift/create/local.
? Please enter the pull secret *****************************************************************************************INFO Creating CRC VM for MicroShift 4.13.9...
INFO Generating new SSH key pair...
INFO Starting CRC VM for microshift 4.13.9...
failed to expose port 127.0.0.1:2222 -> 192.168.127.2:22: listen tcp 127.0.0.1:2222: bind: Une seule utilisation de chaque adresse de socket (protocole/adresse réseau/port) est habituellement autorisée.

Using tcpview, I noticed that gvproxy.exe (from the Podman distribution) was listening on port 2222

Logs

Before gather the logs try following if that fix your issue

$ crc delete -f
$ crc cleanup
$ crc setup
$ crc start --log-level debug

Please consider posting the output of crc start --log-level debug on http://gist.github.com/ and post the link in the issue.

cfergeau commented 10 months ago

Both will indeed try to use port 2222 for SSH, not sure why this never came up before. macOS most likely have the same issue.

cfergeau commented 10 months ago

Looking closer at podman machine code, it looks like it's using a random port for SSH. It's only with wsl that it's not 100% clear what it's doing, there is some code to use a different ssh port when it's trying to use an already used port. Maybe it defaults to 2222 though. crc on the other hand expects to be able to use 2222 and has no fallback if it's not available.

jeffmaury commented 10 months ago

Seems to be related to user mode networking as gvproxy.exe is not launched if you don't use this feature

gbraad commented 10 months ago

not sure why this never came up

it looks like it's using a random port for SSH.

this, as it never conflicted for me. Wonder what causes the conflict to occur.

cfergeau commented 10 months ago

Could it be the order in which they are started? crc daemon first, followed by podman machine/wsl -> OK, podman machine/wsl followed by crc daemon -> failure?

jeffmaury commented 10 months ago

No it does not matter as both are using port 2222

cfergeau commented 10 months ago

No it does not matter as both are using port 2222

the WSL code in podman seemed to have some fallback in case port 2222 is already in use, hence the question.

jeffmaury commented 10 months ago

I can't see the fallback: If I start CRC first then podman machine then gvproxy.exe is not running at all.

gbraad commented 10 months ago

Perhaps this is because Podman machine recently introduced usermode networking, and they might not have considered the situation that CRC already uses these ports. /CC: @n1hility => https://github.com/containers/podman/issues/20327

https://github.com/containers/gvisor-tap-vsock/blob/2a3419da952638147a42db9d49bd74bd10d2340c/pkg/types/gvproxy_command.go#L35

gbraad commented 8 months ago

@vyasgun You were able to reproduce this on Windows and macOS. Can you describe the setup you used and the output? On macOS it seems no error occurred, but the SSH connection to the VM was denied.

gbraad commented 7 months ago

Unfortunately she did not add any further comments, so from the top of my head I will explain what we did:

  1. we have something listening on port 2222; nc -L
  2. we start crc and this will fail with a bunch of SSH retries
  3. times out.
jeffmaury commented 7 months ago
  1. Start podman with user-mode-networking enabled -> gvproxy.exe listening on 2222
  2. crc start ->
$ crc start
INFO Using bundle path C:\Users\Jeff\.crc\cache\crc_hyperv_4.14.3_amd64.crcbundle
INFO Checking minimum RAM requirements
INFO Checking if running in a shell with administrator rights
INFO Checking Windows release
INFO Checking Windows edition
INFO Checking if Hyper-V is installed and operational
INFO Checking if Hyper-V service is enabled
INFO Checking if crc-users group exists
INFO Checking if current user is in crc-users and Hyper-V admins group
INFO Checking if vsock is correctly configured
INFO Checking if the daemon task powershell script is present
INFO Checking if the daemon task is installed
INFO Checking if the daemon task is running
INFO Checking admin helper service is running
INFO Loading bundle: crc_hyperv_4.14.3_amd64...
INFO Creating CRC VM for OpenShift 4.14.3...
INFO Generating new SSH key pair...
INFO Generating new password for the kubeadmin user
INFO Starting CRC VM for openshift 4.14.3...
failed to expose port 127.0.0.1:2222 -> 192.168.127.2:22: listen tcp 127.0.0.1:2222: bind: Une seule utilisation de chaque adresse de socket (protocole/adresse réseau/port) est habituellement autorisée.
gbraad commented 7 months ago

Thanks @jeffmaury, but the instructions I meant were more low-level and actually blocking the port with something like nc -l or another application, as that allows for easier tests to prevent regression.

It seems on macOS the same issue occurs, but it does NOT fail with the error you show. (It will actually continue, but the SSH connectivity will timeout after 60 times). This is what @vyasgun investigated... @evidolob will continue working on this to provide the port-check.

vyasgun commented 7 months ago

I started a server on 2222 port on Mac and got the following result indicating a port conflict:

DEBU retry loop: attempt 74
DEBU Running SSH command: exit 0
DEBU Using ssh private keys: [/Users/gvyas/.crc/machines/crc/id_ecdsa /Users/gvyas/.crc/cache/crc_microshift_vfkit_4.13.14_amd64/id_ecdsa_crc]
DEBU SSH command results: err: ssh: handshake failed: read tcp 127.0.0.1:62268->127.0.0.1:2222: read: connection reset by peer, output:
DEBU error: Temporary error: ssh command error:
command : exit 0
err     : ssh: handshake failed: read tcp 127.0.0.1:62268->127.0.0.1:2222: read: connection reset by peer
 - sleeping 1s
DEBU retry loop: attempt 75
DEBU Running SSH command: exit 0
DEBU Using ssh private keys: [/Users/gvyas/.crc/machines/crc/id_ecdsa /Users/gvyas/.crc/cache/crc_microshift_vfkit_4.13.14_amd64/id_ecdsa_crc]
DEBU SSH command results: err: ssh: handshake failed: read tcp 127.0.0.1:62270->127.0.0.1:2222: read: connection reset by peer, output:
DEBU error: Temporary error: ssh command error:
command : exit 0
err     : ssh: handshake failed: read tcp 127.0.0.1:62270->127.0.0.1:2222: read: connection reset by peer
 - sleeping 1s
DEBU RetryAfter timeout after 76 tries
DEBU Running 'sw_vers -productVersion'
DEBU Sending 'identify' to segment
Failed to connect to the CRC VM with SSH -- virtual machine might be unreachable: Temporary error: ssh command error:
command : exit 0
err     : ssh: handshake failed: read tcp 127.0.0.1:62141->127.0.0.1:2222: read: connection reset by peer

Temporary error: ssh command error:
command : exit 0
err     : ssh: handshake failed: read tcp 127.0.0.1:62143->127.0.0.1:2222: read: connection reset by peer

On Windows, I did the same thing. crc start failed with a different error but it only happened with the server running.

DEBU retry loop: attempt 3
DEBU Running SSH command: exit 0
DEBU Using ssh private keys: [C:\Users\gvyas\.crc\machines\crc\id_ecdsa C:\Users\gvyas\.crc\cache\crc_hyperv_4.14.3_amd64\id_ecdsa_crc]
DEBU SSH command results: err: <nil>, output:
INFO CRC VM is running
DEBU Using root access: disable core user password
DEBU Running SSH command: sudo passwd --lock core
DEBU SSH command results: err: <nil>, output: Locking password for user core.
passwd: Success
DEBU Running SSH command: cat /home/core/.ssh/authorized_keys
DEBU SSH command results: err: Process exited with status 1, output:
INFO Updating authorized keys...
DEBU Creating /home/core/.ssh/authorized_keys with permissions 0644 in the CRC VM
DEBU Running SSH command: <hidden>
DEBU SSH command succeeded
DEBU Running SSH command: rm /home/core/.ssh/authorized_keys.d/ignition
DEBU SSH command results: err: <nil>, output:
DEBU Using root access: Get device id
DEBU Running SSH command: sudo /usr/sbin/blkid -t TYPE=xfs -o device
DEBU SSH command results: err: <nil>, output: /dev/sda4
DEBU Using root access: Growing /dev/sda4 partition
DEBU Running SSH command: sudo /usr/bin/growpart /dev/sda 4
DEBU SSH command results: err: Process exited with status 1, output: NOCHANGE: partition 4 is size 63961055. it cannot be grown
DEBU No free space after /dev/sda4, nothing to do
DEBU Using root access: make root Podman socket accessible
DEBU Running SSH command: sudo chmod 777 /run/podman/ /run/podman/podman.sock
DEBU SSH command results: err: <nil>, output:
Error running post start: host file not writable, try running with elevated privileges
evidolob commented 6 months ago

I was looking on this, and only have a few ideas how to fix this.

I can add check for 2222 port on deamon start, but it would be enough to solve the issue, as some other program could use that port in between daemon start and actual CRC VM start, as we start use that port only during the VM starting, so we could not occupy that port in advance.

So, proper solution could be, using a random free port, if 2222 is occupied. But in this case we need to share the port number between daemon and CLI, as it(port number) could be/used on both sides.

This leads me to think on moving start command execution to daemon just to avoid port number sharing complication.

Also, if we chose to use random port, we need to inform user somehow about it, as that port is used in debug process, like SSH'ing in to CRC VM.

@gbraad @praveenkumar @cfergeau WDYT? Or, maybe, there are any better solution?

cfergeau commented 6 months ago

I can add check for 2222 port on daemon start, but it would be enough to solve the issue, as some other program could use that port in between daemon start and actual CRC VM start, as we start use that port only during the VM starting, so we could not occupy that port in advance.

Imo, the main thing to solve is if there are port conflicts between podman-machine and crc, preventing them from running at the same time. I'm not sure this specific scenario is a problem at the moment? podman-machine seems to always pick a random port. podman has podman machine ssh which helps with that.

We've occasionally had reports of people hitting port conflicts with other tools, but they've been rare. For a similar situation, we've introduced ingress-http-port and ingress-https-port to let the user specify which alternate port they want, this could be an option here?

If you prefer a random port, one alternative to pushing start to the daemon could be to have some "create ssh connection" functionality in the daemon, and use it from the client, but likely to be more complicated/messy than it sounds ;)

I agree that long term it will be nice to do everything from the daemon, but we are not there yet. But as long as podman machine and crc can run at the same time, I don't think this is a huge issue. If podman-machine + crc already works, then a ssh-port config option is imo enough if we want a short term fix for it.

evidolob commented 6 months ago

I just check, on Windows and MacOS, podman(4.8.3) and CRC(2.31) runs at the same time, without any issues.

Should we just close this?

gbraad commented 6 months ago

It is the order in which this happens... Podman Machine detects, but we don't, so who comes first matters.

praveenkumar commented 6 months ago

Does podman machine assign 2222 as part of random port mapping (I highly doubt), so it is mostly some other application which is consuming this port so blocking it for crc usecase. May be the detection on daemon side (even it is not full proof) provide user a sense that this port is consumed by some other application?

Also adding option about ssh-port on crc side will not work until daemon also read that configuration and use it?

evidolob commented 6 months ago

May be the detection on daemon side (even it is not full proof) provide user a sense that this port is consumed by some other application?

If we add port usage detection only on daemon side, not all users will know about that, as not all of them is constantly reading daemon logs, IMHO it will be better UX if we show that port in use in CLI, during the start checks.

As for ssh-port config, we already has ingress-http-port and ingress-https-port config option, and both of them are used in start on both side CLI(https://github.com/crc-org/crc/blob/e6f13c391915ec48b01db3ae08926202069096c2/cmd/crc/cmd/start.go#L70-L88) and daemon(https://github.com/crc-org/crc/blob/2875a7441f467e630c62760b49a800f60857afbf/pkg/crc/api/handlers.go#L120) so, it would be a matter if adding another one.

gbraad commented 6 months ago

if the port is in use we can not continue... so that is not just a log entry; but a clear failure.

evidolob commented 6 months ago

if the port is in use we can not continue... so that is not just a log entry; but a clear failure.

So, when that check should be performed? During crc start? Or during daemon start?

gbraad commented 6 months ago

as part of a preflight?

evidolob commented 6 months ago

OK, that should work. Should I add only check or check and configuration option?

gbraad commented 6 months ago

incremental...

but I see a situation: "it fails. now what?", so eventually you need an alternative strategy; assign random, config option, etc.

praveenkumar commented 5 months ago

@jeffmaury @slemeur With the latest release of CRC we added check to figure out if port 2222 is consumed by any process and let user know about it. Can you please test latest CRC with latest version of podman and let us know if you still hitting this issue?

jeffmaury commented 5 months ago

Error is reported but specific message is not displayed:

image

From CLI:

image

Will check with latest version of the extension

kid1412621 commented 2 months ago

Can you please test latest CRC with latest version of podman and let us know if you still hitting this issue?

what is the latest version? using these versions met the same issue:

CRC version: 2.35.0+3956e8
OpenShift version: 4.15.10
Podman version: 4.4.4
gbraad commented 2 months ago

@kid1412621 you mean you have a conflicting port? Can you check which other processes uses this port?

We decided at first to not add a method/config option to change this, as we want to see how often this happens; we had earlier issues that might have been caused by this but were never reported as such, as those were mostly 'ssh connection failures'.

gbraad commented 2 months ago

Will check with latest version of the extension

@jeffmaury we might have to create an issue for a proper error message on the end of the extension to report this correctly. Follow-up might be to have this configurable. WDYT?

kid1412621 commented 2 months ago

@kid1412621 you mean you have a conflicting port? Can you check which other processes uses this port?

We decided at first to not add a method/config option to change this, as we want to see how often this happens; we had earlier issues that might have been caused by this but were never reported as such, as those were mostly 'ssh connection failures'.

gvproxy, I guess it's to do with podman user net.

gim- commented 2 months ago

Stumbled upon this thread while troubleshooting. Recreating Podman machine with user network disabled solved the issue. Thank you for the hint @kid1412621 !