crc-org / crc

CRC is a tool to help you run containers. It manages a local OpenShift 4.x cluster, Microshift or a Podman VM optimized for testing and development purposes
https://crc.dev
Apache License 2.0
1.26k stars 242 forks source link

[BUG] [Mac] Connection reset by peer; Connection reset by 127.0.0.1 port 2222 #4459

Open shuawest opened 1 week ago

shuawest commented 1 week ago

General information

CRC version

# Put `crc version` output here
CRC version: 2.43.0+268795
OpenShift version: 4.17.1
MicroShift version: 4.17.1

CRC status

# Put `crc status --log-level debug` output here
DEBU CRC version: 2.43.0+268795
DEBU OpenShift version: 4.17.1
DEBU MicroShift version: 4.17.1
DEBU Running 'crc status'
CRC VM:                  Running
MicroShift:              Unreachable (v4.17.1)
Disk Usage:              0B of 0B (Inside the CRC VM)
Persistent Volume Usage: 0B of 0B (Allocated)
Cache Usage:             35.09GB
Cache Directory:         /Users/jowest/.crc/cache

CRC config

# Put `crc config view` output here
- consent-telemetry                     : no
- cpus                                  : 8
- disk-size                             : 175
- enable-cluster-monitoring             : true
- kubeadmin-password                    : ********
- memory                                : 20480
- no-proxy                              : *.local;169.254/16
- persistent-volume-size                : 300
- preset                                : microshift (tried both microshift and openshift)
- pull-secret-file                      : /Users/jowest/dev/apps/ocp/pull-secret.txt

Host Operating System

# put the output of `sw_vers` in case of Mac
ProductName:        macOS
ProductVersion:     15.1
BuildVersion:       24B83

Steps to reproduce

  1. crc stop
  2. crc delete
  3. crc setup
  4. crc start
  5. failure:
    > oc get nodes
    E1114 11:58:42.029032   67782 memcache.go:265] couldn't get current server API group list: Get "https://api.crc.testing:6443/api?timeout=32s": net/http: TLS handshake timeout
    E1114 11:58:52.031313   67782 memcache.go:265] couldn't get current server API group list: Get "https://api.crc.testing:6443/api?timeout=32s": net/http: TLS handshake timeout

Expected

Having crc using microshift and/or openshift to continue to run until shutdown.

Actual

For the last month or two updating crc using OpenShift or Microshift on two different macos laptops with significant memory/cpu capacity I have not seen crc run with stability. After the machine starts, it works for a few minutes or hours, then becomes unresponsive. Sometimes it fails to start from the beginning. When it starts to fail I am unable to connect to the API server, and unable to ssh into the machine:

➜  oc get nodes
E1114 11:58:42.029032   67782 memcache.go:265] couldn't get current server API group list: Get "https://api.crc.testing:6443/api?timeout=32s": net/http: TLS handshake timeout
E1114 11:58:52.031313   67782 memcache.go:265] couldn't get current server API group list: Get "https://api.crc.testing:6443/api?timeout=32s": net/http: TLS handshake timeout
...
Unable to connect to the server: net/http: TLS handshake timeout

➜  ssh -i ~/.crc/machines/crc/id_ed25519 -o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null -p 2222 core@127.0.0.1
kex_exchange_identification: read: Connection reset by peer
Connection reset by 127.0.0.1 port 2222

➜  ssh -i ~/.crc/machines/crc/id_ed25519.pub -o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null -p 2222 core@127.0.0.1
kex_exchange_identification: read: Connection reset by peer
Connection reset by 127.0.0.1 port 2222

Logs

Before gather the logs try following if that fix your issue

$ crc delete -f
$ crc cleanup
$ crc setup
$ crc start --log-level debug

Please consider posting the output of crc start --log-level debug on http://gist.github.com/ and post the link in the issue.

https://gist.github.com/shuawest/b14b587fe6b5b47ba461aa91c0a30b3e

shuawest commented 1 week ago

This behavior happens on two different macos laptops with significant memory and CPU capacity. I have been able to deploy some basic containers and have it work for short amounts of time, as well as install operators.

The failure state and behavior are slightly different at times. For example, sometimes trying to ssh into the CRC machine never connects, and other times it immediately reports "read: Connection reset by peer; Connection reset by 127.0.0.1 port 2222"

praveenkumar commented 1 week ago

@shuawest do you have podman desktop or docker running on same laptop?