Open jeffmaury opened 10 months ago
Both will indeed try to use port 2222 for SSH, not sure why this never came up before. macOS most likely have the same issue.
Looking closer at podman machine code, it looks like it's using a random port for SSH. It's only with wsl that it's not 100% clear what it's doing, there is some code to use a different ssh port when it's trying to use an already used port. Maybe it defaults to 2222 though. crc on the other hand expects to be able to use 2222 and has no fallback if it's not available.
Seems to be related to user mode networking as gvproxy.exe is not launched if you don't use this feature
not sure why this never came up
it looks like it's using a random port for SSH.
this, as it never conflicted for me. Wonder what causes the conflict to occur.
Could it be the order in which they are started? crc daemon first, followed by podman machine/wsl -> OK, podman machine/wsl followed by crc daemon -> failure?
No it does not matter as both are using port 2222
No it does not matter as both are using port 2222
the WSL code in podman seemed to have some fallback in case port 2222 is already in use, hence the question.
I can't see the fallback: If I start CRC first then podman machine then gvproxy.exe is not running at all.
Perhaps this is because Podman machine recently introduced usermode networking, and they might not have considered the situation that CRC already uses these ports. /CC: @n1hility => https://github.com/containers/podman/issues/20327
@vyasgun You were able to reproduce this on Windows and macOS. Can you describe the setup you used and the output? On macOS it seems no error occurred, but the SSH connection to the VM was denied.
Unfortunately she did not add any further comments, so from the top of my head I will explain what we did:
nc -L
crc
and this will fail with a bunch of SSH retries$ crc start
INFO Using bundle path C:\Users\Jeff\.crc\cache\crc_hyperv_4.14.3_amd64.crcbundle
INFO Checking minimum RAM requirements
INFO Checking if running in a shell with administrator rights
INFO Checking Windows release
INFO Checking Windows edition
INFO Checking if Hyper-V is installed and operational
INFO Checking if Hyper-V service is enabled
INFO Checking if crc-users group exists
INFO Checking if current user is in crc-users and Hyper-V admins group
INFO Checking if vsock is correctly configured
INFO Checking if the daemon task powershell script is present
INFO Checking if the daemon task is installed
INFO Checking if the daemon task is running
INFO Checking admin helper service is running
INFO Loading bundle: crc_hyperv_4.14.3_amd64...
INFO Creating CRC VM for OpenShift 4.14.3...
INFO Generating new SSH key pair...
INFO Generating new password for the kubeadmin user
INFO Starting CRC VM for openshift 4.14.3...
failed to expose port 127.0.0.1:2222 -> 192.168.127.2:22: listen tcp 127.0.0.1:2222: bind: Une seule utilisation de chaque adresse de socket (protocole/adresse réseau/port) est habituellement autorisée.
Thanks @jeffmaury, but the instructions I meant were more low-level and actually blocking the port with something like nc -l
or another application, as that allows for easier tests to prevent regression.
It seems on macOS the same issue occurs, but it does NOT fail with the error you show. (It will actually continue, but the SSH connectivity will timeout after 60 times). This is what @vyasgun investigated... @evidolob will continue working on this to provide the port-check.
I started a server on 2222 port on Mac and got the following result indicating a port conflict:
DEBU retry loop: attempt 74
DEBU Running SSH command: exit 0
DEBU Using ssh private keys: [/Users/gvyas/.crc/machines/crc/id_ecdsa /Users/gvyas/.crc/cache/crc_microshift_vfkit_4.13.14_amd64/id_ecdsa_crc]
DEBU SSH command results: err: ssh: handshake failed: read tcp 127.0.0.1:62268->127.0.0.1:2222: read: connection reset by peer, output:
DEBU error: Temporary error: ssh command error:
command : exit 0
err : ssh: handshake failed: read tcp 127.0.0.1:62268->127.0.0.1:2222: read: connection reset by peer
- sleeping 1s
DEBU retry loop: attempt 75
DEBU Running SSH command: exit 0
DEBU Using ssh private keys: [/Users/gvyas/.crc/machines/crc/id_ecdsa /Users/gvyas/.crc/cache/crc_microshift_vfkit_4.13.14_amd64/id_ecdsa_crc]
DEBU SSH command results: err: ssh: handshake failed: read tcp 127.0.0.1:62270->127.0.0.1:2222: read: connection reset by peer, output:
DEBU error: Temporary error: ssh command error:
command : exit 0
err : ssh: handshake failed: read tcp 127.0.0.1:62270->127.0.0.1:2222: read: connection reset by peer
- sleeping 1s
DEBU RetryAfter timeout after 76 tries
DEBU Running 'sw_vers -productVersion'
DEBU Sending 'identify' to segment
Failed to connect to the CRC VM with SSH -- virtual machine might be unreachable: Temporary error: ssh command error:
command : exit 0
err : ssh: handshake failed: read tcp 127.0.0.1:62141->127.0.0.1:2222: read: connection reset by peer
Temporary error: ssh command error:
command : exit 0
err : ssh: handshake failed: read tcp 127.0.0.1:62143->127.0.0.1:2222: read: connection reset by peer
On Windows, I did the same thing. crc start
failed with a different error but it only happened with the server running.
DEBU retry loop: attempt 3
DEBU Running SSH command: exit 0
DEBU Using ssh private keys: [C:\Users\gvyas\.crc\machines\crc\id_ecdsa C:\Users\gvyas\.crc\cache\crc_hyperv_4.14.3_amd64\id_ecdsa_crc]
DEBU SSH command results: err: <nil>, output:
INFO CRC VM is running
DEBU Using root access: disable core user password
DEBU Running SSH command: sudo passwd --lock core
DEBU SSH command results: err: <nil>, output: Locking password for user core.
passwd: Success
DEBU Running SSH command: cat /home/core/.ssh/authorized_keys
DEBU SSH command results: err: Process exited with status 1, output:
INFO Updating authorized keys...
DEBU Creating /home/core/.ssh/authorized_keys with permissions 0644 in the CRC VM
DEBU Running SSH command: <hidden>
DEBU SSH command succeeded
DEBU Running SSH command: rm /home/core/.ssh/authorized_keys.d/ignition
DEBU SSH command results: err: <nil>, output:
DEBU Using root access: Get device id
DEBU Running SSH command: sudo /usr/sbin/blkid -t TYPE=xfs -o device
DEBU SSH command results: err: <nil>, output: /dev/sda4
DEBU Using root access: Growing /dev/sda4 partition
DEBU Running SSH command: sudo /usr/bin/growpart /dev/sda 4
DEBU SSH command results: err: Process exited with status 1, output: NOCHANGE: partition 4 is size 63961055. it cannot be grown
DEBU No free space after /dev/sda4, nothing to do
DEBU Using root access: make root Podman socket accessible
DEBU Running SSH command: sudo chmod 777 /run/podman/ /run/podman/podman.sock
DEBU SSH command results: err: <nil>, output:
Error running post start: host file not writable, try running with elevated privileges
I was looking on this, and only have a few ideas how to fix this.
I can add check for 2222
port on deamon start, but it would be enough to solve the issue, as some other program could use that port in between daemon start and actual CRC VM start, as we start use that port only during the VM starting, so we could not occupy that port in advance.
So, proper solution could be, using a random free port, if 2222
is occupied. But in this case we need to share the port number between daemon and CLI, as it(port number) could be/used on both sides.
This leads me to think on moving start
command execution to daemon just to avoid port number sharing complication.
Also, if we chose to use random port, we need to inform user somehow about it, as that port is used in debug process, like SSH'ing in to CRC VM.
@gbraad @praveenkumar @cfergeau WDYT? Or, maybe, there are any better solution?
I can add check for 2222 port on daemon start, but it would be enough to solve the issue, as some other program could use that port in between daemon start and actual CRC VM start, as we start use that port only during the VM starting, so we could not occupy that port in advance.
Imo, the main thing to solve is if there are port conflicts between podman-machine and crc, preventing them from running at the same time. I'm not sure this specific scenario is a problem at the moment? podman-machine seems to always pick a random port. podman has podman machine ssh
which helps with that.
We've occasionally had reports of people hitting port conflicts with other tools, but they've been rare.
For a similar situation, we've introduced ingress-http-port
and ingress-https-port
to let the user specify which alternate port they want, this could be an option here?
If you prefer a random port, one alternative to pushing start to the daemon could be to have some "create ssh connection" functionality in the daemon, and use it from the client, but likely to be more complicated/messy than it sounds ;)
I agree that long term it will be nice to do everything from the daemon, but we are not there yet. But as long as podman machine and crc can run at the same time, I don't think this is a huge issue. If podman-machine + crc already works, then a ssh-port
config option is imo enough if we want a short term fix for it.
I just check, on Windows and MacOS, podman(4.8.3) and CRC(2.31) runs at the same time, without any issues.
Should we just close this?
It is the order in which this happens... Podman Machine detects, but we don't, so who comes first matters.
Does podman machine assign 2222
as part of random port mapping (I highly doubt), so it is mostly some other application which is consuming this port so blocking it for crc usecase. May be the detection on daemon side (even it is not full proof) provide user a sense that this port is consumed by some other application?
Also adding option about ssh-port
on crc side will not work until daemon also read that configuration and use it?
May be the detection on daemon side (even it is not full proof) provide user a sense that this port is consumed by some other application?
If we add port usage detection only on daemon side, not all users will know about that, as not all of them is constantly reading daemon logs, IMHO it will be better UX if we show that port in use in CLI, during the start checks.
As for ssh-port
config, we already has ingress-http-port
and ingress-https-port
config option, and both of them are used in start on both side CLI(https://github.com/crc-org/crc/blob/e6f13c391915ec48b01db3ae08926202069096c2/cmd/crc/cmd/start.go#L70-L88) and daemon(https://github.com/crc-org/crc/blob/2875a7441f467e630c62760b49a800f60857afbf/pkg/crc/api/handlers.go#L120) so, it would be a matter if adding another one.
if the port is in use we can not continue... so that is not just a log entry; but a clear failure.
if the port is in use we can not continue... so that is not just a log entry; but a clear failure.
So, when that check should be performed? During crc start
? Or during daemon start?
as part of a preflight?
OK, that should work. Should I add only check or check and configuration option?
incremental...
but I see a situation: "it fails. now what?", so eventually you need an alternative strategy; assign random, config option, etc.
@jeffmaury @slemeur With the latest release of CRC we added check to figure out if port 2222
is consumed by any process and let user know about it. Can you please test latest CRC with latest version of podman and let us know if you still hitting this issue?
Error is reported but specific message is not displayed:
From CLI:
Will check with latest version of the extension
Can you please test latest CRC with latest version of podman and let us know if you still hitting this issue?
what is the latest version? using these versions met the same issue:
CRC version: 2.35.0+3956e8
OpenShift version: 4.15.10
Podman version: 4.4.4
@kid1412621 you mean you have a conflicting port? Can you check which other processes uses this port?
We decided at first to not add a method/config option to change this, as we want to see how often this happens; we had earlier issues that might have been caused by this but were never reported as such, as those were mostly 'ssh connection failures'.
Will check with latest version of the extension
@jeffmaury we might have to create an issue for a proper error message on the end of the extension to report this correctly. Follow-up might be to have this configurable. WDYT?
@kid1412621 you mean you have a conflicting port? Can you check which other processes uses this port?
We decided at first to not add a method/config option to change this, as we want to see how often this happens; we had earlier issues that might have been caused by this but were never reported as such, as those were mostly 'ssh connection failures'.
gvproxy, I guess it's to do with podman user net.
Stumbled upon this thread while troubleshooting. Recreating Podman machine with user network disabled solved the issue. Thank you for the hint @kid1412621 !
General information
crc setup
before starting it (Yes/No)?YesCRC version
CRC status
CRC config
Host Operating System
Steps to reproduce
Expected
crc should start
Actual
Using tcpview, I noticed that gvproxy.exe (from the Podman distribution) was listening on port 2222
Logs
Before gather the logs try following if that fix your issue
Please consider posting the output of
crc start --log-level debug
on http://gist.github.com/ and post the link in the issue.