Open deboer-tim opened 4 years ago
When stopping CRC please remove the kube context, remove the bridge network, remove the host resolution, or do something similar so that clients can tell it doesn't exist or will fail immediately trying to connect.
@praveenkumar any idea what causes the response not to reply 'Host unreachable' or 'Connection refused'? Also, would removing the context be possible?
I tested this on linux, will check on the mac also but I didn't get that much waiting time as described in the issue.
$ oc whoami
kube:admin
$ crc stop
INFO Stopping the OpenShift cluster, this may take a few minutes...
Stopped the OpenShift cluster
$ time oc whoami -v=10
I1007 14:03:42.797261 693344 loader.go:375] Config loaded from file: /home/prkumar/.kube/config
I1007 14:03:42.798023 693344 round_trippers.go:423] curl -k -v -XGET -H "Accept: application/json, */*" -H "User-Agent: oc/openshift (linux/amd64) kubernetes/d7f3ccf" -H "Authorization: Bearer oUurQFo7e5xjPoz1h3QPFUGVBLL8tEaXBquoz9oaans" 'https://api.crc.testing:6443/apis/user.openshift.io/v1/users/~'
I1007 14:03:45.905233 693344 round_trippers.go:443] GET https://api.crc.testing:6443/apis/user.openshift.io/v1/users/~ in 3107 milliseconds
I1007 14:03:45.905329 693344 round_trippers.go:449] Response Headers:
I1007 14:03:45.905665 693344 helpers.go:234] Connection error: Get https://api.crc.testing:6443/apis/user.openshift.io/v1/users/~: dial tcp 192.168.130.11:6443: connect: no route to host
F1007 14:03:45.905769 693344 helpers.go:115] Unable to connect to the server: dial tcp 192.168.130.11:6443: connect: no route to host
real 0m3.233s
user 0m0.152s
sys 0m0.038s
$ time odo version -v=9
I1007 14:05:06.924601 693547 preference.go:165] The path for preference file is /home/prkumar/.odo/preference.yaml
I1007 14:05:06.924638 693547 occlient.go:448] Trying to connect to server api.crc.testing:6443
I1007 14:05:07.925073 693547 occlient.go:451] unable to connect to server: dial tcp 192.168.130.11:6443: i/o timeout
odo v1.1.3 (44440eeac)
real 0m1.106s
user 0m0.138s
sys 0m0.038s
What I see is below - when context is to stopped docker-desktop (or any other context) it fails fast. CRC contexts are fine while using it, but timeouts after I stop CRC. Interestingly enough, if I switch context to Minikube immediately after running CRC I see the same problem - but if I start Minikube and stop it the problem goes away. This leads me to think there is some hyperkit/network cleanup that Minikube is doing but CRC is not.
deboer-mac:crc-macos-1.15.0-amd64 deboer$ kubectl config use-context docker-desktop
Switched to context "docker-desktop".
deboer-mac:crc-macos-1.15.0-amd64 deboer$ time kubectl get pods
The connection to the server kubernetes.docker.internal:6443 was refused - did you specify the right host or port?
real 0m0.062s
user 0m0.057s
sys 0m0.017s
deboer-mac:crc-macos-1.15.0-amd64 deboer$ ./crc start
...
Started the OpenShift cluster
WARN The cluster might report a degraded or error state. This is expected since several operators have been disabled to lower the resource usage. For more information, please consult the documentation
deboer-mac:crc-macos-1.15.0-amd64 deboer$ kubectl config use-context crc-admin
Switched to context "crc-admin".
deboer-mac:crc-macos-1.15.0-amd64 deboer$ time kubectl get pods
No resources found in default namespace.
real 0m2.165s
user 0m0.145s
sys 0m0.175s
deboer-mac:crc-macos-1.15.0-amd64 deboer$ ./crc stop
Stopping the OpenShift cluster, this may take a few minutes...
Stopped the OpenShift cluster
deboer-mac:crc-macos-1.15.0-amd64 deboer$ time kubectl get pods
Unable to connect to the server: dial tcp 192.168.64.2:6443: i/o timeout
real 0m30.209s
user 0m0.101s
sys 0m0.063s
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.
Found this bug entry after running into the same issue on my Mac with CRC 1.20.0. Running "kubectl get pods" failed with "Unable to connect to the server: dial tcp 192.168.64.2:6443: i/o timeout" after stopping CRC and logging into another k8s cluster. Thanks to @deboer-tim 's comment above I found I could fix the issue as follows:
Determine current context: kubectl config current-context This was "sample-app/api-crc-testing:6443/kube:admin" for me.
Get list of current contexts and take note of the one you want to use: kubectl config get-contexts
Switch to that context: kubectl config use-context context-name Yup, use-context, not set-context which does something different.
After this kubectl get pods again worked as expected.
I would like to look into this issue. Could someone please assign it to me?
I would like to look into this issue. Could someone please assign it to me?
Done
I can reproduce this issue. When I also do crc stop
and try to access pods using kubectl get pods
I get these errors after some wait:
E1010 21:50:00.401494 159438 memcache.go:265] couldn't get current server API group list: Get "https://api.crc.testing:6443/api?timeout=32s": net/http: TLS handshake timeout
E1010 21:50:32.402863 159438 memcache.go:265] couldn't get current server API group list: Get "https://api.crc.testing:6443/api?timeout=32s": context deadline exceeded - error from a previous attempt: read tcp 127.0.0.1:35508->127.0.0.1:6443: read: connection reset by peer
E1010 21:51:04.403878 159438 memcache.go:265] couldn't get current server API group list: Get "https://api.crc.testing:6443/api?timeout=32s": context deadline exceeded - error from a previous attempt: read tcp 127.0.0.1:54090->127.0.0.1:6443: read: connection reset by peer
E1010 21:51:36.405070 159438 memcache.go:265] couldn't get current server API group list: Get "https://api.crc.testing:6443/api?timeout=32s": context deadline exceeded - error from a previous attempt: read tcp 127.0.0.1:34104->127.0.0.1:6443: read: connection reset by peer
E1010 21:52:08.406982 159438 memcache.go:265] couldn't get current server API group list: Get "https://api.crc.testing:6443/api?timeout=32s": context deadline exceeded - error from a previous attempt: read tcp 127.0.0.1:58892->127.0.0.1:6443: read: connection reset by peer
error: Get "https://api.crc.testing:6443/api?timeout=32s": context deadline exceeded - error from a previous attempt: read tcp 127.0.0.1:58892->127.0.0.1:6443: read: connection reset by peer
I think this issue is happening because crc
is not cleaning up current-context
field in ~/.kube/config
. Here is my observation for behavior of crc
and minikube
start/stop commands with kubeconfig:
CRC
crc start
current-context: default/api-crc-testing:6443/kubeadmin
crc stop
current-context: default/api-crc-testing:6443/kubeadmin
Minikube
minikube start
current-context: minikube
minikube stop
current-context: ""
It seems crc does not perform clean up in kubeconfig during crc stop
command. I do see code for cleaning up kubeconfig :
https://github.com/crc-org/crc/blob/5611baa4fc9614f838da088fe72f80a369a4fe9d/pkg/crc/machine/kubeconfig.go#L230
It gets invoked in crc delete
command here:
https://github.com/crc-org/crc/blob/5611baa4fc9614f838da088fe72f80a369a4fe9d/pkg/crc/machine/delete.go#L38
When I compare it with minikube
, minikube
seems to be cleaning up kubeconfig in case of both stop
and delete
commands:
I see these two ways to solve this issue:
crc
consistent with minikube
, also invoke cleanKubeconfig
method while stopping cluster.current-context
field in kubeconfig to ""
. Keep Clusters
, AuthInfos
and Contexts
inside the kubeconfig.I see these two ways to solve this issue:
* Make the behavior of `crc` consistent with `minikube`, also invoke `cleanKubeconfig` method while stopping cluster. * While stopping the cluster, only set `current-context` field in kubeconfig to `""`. Keep `Clusters`, `AuthInfos` and `Contexts` inside the kubeconfig.
If it's easy to regenerate Clusters
, AuthInfos
and Contexts
on cluster start, we can go with the first option and remove everything, especially if the code for that already exists.
General information
crc setup
before starting it (Yes/No)? YesCRC version
CodeReady Containers version: 1.15.0+e317bed OpenShift version: 4.5.7 (embedded in binary)
CRC status
DEBU CodeReady Containers version: 1.15.0+e317bed DEBU OpenShift version: 4.5.7 (embedded in binary) CRC VM: Stopped OpenShift: Stopped Disk Usage: 0B of 0B (Inside the CRC VM) Cache Usage: 12.8GB Cache Directory: /Users/deboer/.crc/cache
CRC config
no output
Host Operating System
ProductName: Mac OS X ProductVersion: 10.15.6 BuildVersion: 19G2021
Steps to reproduce
Expected
If I connect to a remote OpenShift cluster or use other local Kube tools and then disconnect/stop, the Kube context is left pointing to a cluster that I can't connect to anymore, but it 'fails fast': tools that try to connect fail immediately.
e.g. after stopping minikube and running 'kubectl get pods' it immediately responds with:
The connection to the server localhost:8080 was refused - did you specify the right host or port?
I expect CRC to have the same behaviour.Actual
After stopping CRC the Kube context is left pointing to a cluster (api-crc-testing or api.crc.testing) on a bridge network (192.168.*). For some reason clients can't tell this host doesn't exist anymore and connections to it don't fail fast, which eventually causes timeouts on the client side. This is bad enough with kubectl (20s timeout?), but odo has an even longer timeout (4min?) which makes it unusable and appear to hang.
When stopping CRC please remove the kube context, remove the bridge network, remove the host resolution, or do something similar so that clients can tell it doesn't exist or will fail immediately trying to connect.