Closed IBMRob closed 11 months ago
@IBMRob Can you try to ssh in the VM using https://github.com/code-ready/crc/wiki/Debugging-guide and try to use podman to pull any image to make sure if that is network issue in the VM or on the cluster side.
Inside the VM I was able to curl both Quay.io and also redhat so the VM it self appears to have network access but when attempting to use podman we see the error:
curl
curl -s -o /dev/null -w "%{http_code}" https://quay.io
200
curl -s -o /dev/null -w "%{http_code}" https://registry.access.redhat.com
301
podman
[core@crc-j2d48-master-0 ~]$ podman pull registry.redhat.io/redhat/certified-operator-index:v4.11
Trying to pull registry.redhat.io/redhat/certified-operator-index:v4.11...
Error: initializing source docker://registry.redhat.io/redhat/certified-operator-index:v4.11: pinging container registry registry.redhat.io: Get "https://registry.redhat.io/v2/": dial tcp: lookup registry.redhat.io on 192.168.127.1:53: read udp 192.168.127.2:46800->192.168.127.1:53: i/o timeout
I did notice when listing all pods its listing
openshift-dns-operator dns-operator-dbb946c7b-t4c6r 1/2 CrashLoopBackOff 8 (28s ago) 25d
I wonder if this is causing networking issues within the cluster?
This looks like #2597 can you provide details like mentioned in https://github.com/code-ready/crc/issues/2597#issuecomment-884763233 ?
@IBMRob I get the same issue but the dns pods are running ... so, this should be unrelated. I've opened a similar issue https://github.com/code-ready/crc/issues/3377.
oc get pods -n openshift-dns-operator
NAME READY STATUS RESTARTS AGE
dns-operator-85cf76d46-hwzqk 2/2 Running 0 11d
oc get pods -n openshift-dns
NAME READY STATUS RESTARTS AGE
dns-default-7m6jv 2/2 Running 0 12d
node-resolver-kkvnz 1/1 Running 0 12d
Hi @IBMRob - are you running on Apple M1 Max silicon? I didn't have issues with CRC in the Intel-chip MacOS. Just got this on the new Apple M1 Max silicon.
@mhcastro - Yes this only happens on my M1 Max version. I didn't have any problems on my previous intel Mac
@mhcastro - Yes this only happens on my M1 Max version. I didn't have any problems on my previous intel Mac
Same as me.
By any chance you have podman-machine
also running in parallel?
No I don't have podman-machine
installed
Hi @praveenkumar - any plans to fix this issue?
I've just tried this with the latest crc release (2.10.1) on a M1 macbook and I don't see this issue. The openshift-marktplace
pods are all running. Connecting to registry.redhat.io from within the VM "works" (it connects but fails with 'invalid username/password' which is expected)
I tried again.
$ crc version
CRC version: 2.10.1+7e7f6b2d
OpenShift version: 4.11.7
Podman version: 4.2.0
... and I still get the same issue. And this is what happens when I run crc setup
and crc start
.
$ crc setup
INFO Using bundle path /Users/xxx/.crc/cache/crc_vfkit_4.11.7_arm64.crcbundle
INFO Checking if running as non-root
INFO Checking if crc-admin-helper executable is cached
INFO Checking for obsolete admin-helper executable
INFO Checking if running on a supported CPU architecture
INFO Checking minimum RAM requirements
INFO Checking if crc executable symlink exists
INFO Creating symlink for crc executable
INFO Checking if running emulated on a M1 CPU
INFO Checking if vfkit is installed
INFO Checking if CRC bundle is extracted in '$HOME/.crc'
INFO Checking if /Users/xxx/.crc/cache/crc_vfkit_4.11.7_arm64.crcbundle exists
INFO Checking if old launchd config for tray and/or daemon exists
INFO Checking if crc daemon plist file is present and loaded
INFO Adding crc daemon plist file and loading it
Your system is correctly setup for using CRC. Use 'crc start' to start the instance
$ crc start
INFO Checking if running as non-root
INFO Checking if crc-admin-helper executable is cached
INFO Checking for obsolete admin-helper executable
INFO Checking if running on a supported CPU architecture
INFO Checking minimum RAM requirements
INFO Checking if crc executable symlink exists
INFO Checking if running emulated on a M1 CPU
INFO Checking if vfkit is installed
INFO Checking if old launchd config for tray and/or daemon exists
INFO Checking if crc daemon plist file is present and loaded
INFO Loading bundle: crc_vfkit_4.11.7_arm64...
CRC requires a pull secret to download content from Red Hat.
You can copy it from the Pull Secret section of https://console.redhat.com/openshift/create/local.
? Please enter the pull secret ************************
INFO Creating CRC VM for openshift 4.11.7...
INFO Generating new SSH key pair...
INFO Generating new password for the kubeadmin user
INFO Starting CRC VM for openshift 4.11.7...
INFO CRC instance is running with IP 127.0.0.1
INFO CRC VM is running
INFO Updating authorized keys...
INFO Configuring shared directories
INFO Check internal and public DNS query...
INFO Check DNS query from host...
INFO Verifying validity of the kubelet certificates...
INFO Starting kubelet service
INFO Waiting for kube-apiserver availability... [takes around 2min]
INFO Adding user's pull secret to the cluster...
INFO Updating SSH key to machine config resource...
INFO Waiting for user's pull secret part of instance disk...
INFO Changing the password for the kubeadmin user
INFO Updating cluster ID...
INFO Updating root CA cert to admin-kubeconfig-client-ca configmap...
INFO Starting openshift instance... [waiting for the cluster to stabilize]
INFO 4 operators are progressing: image-registry, network, openshift-controller-manager, service-ca
INFO 4 operators are progressing: image-registry, network, openshift-controller-manager, service-ca
INFO 4 operators are progressing: image-registry, network, openshift-controller-manager, service-ca
INFO 4 operators are progressing: image-registry, network, openshift-controller-manager, service-ca
INFO 4 operators are progressing: image-registry, network, openshift-controller-manager, service-ca
INFO 4 operators are progressing: image-registry, network, openshift-controller-manager, service-ca
INFO 2 operators are progressing: image-registry, network
INFO Operator image-registry is progressing
INFO All operators are available. Ensuring stability...
INFO Operators are stable (2/3)...
INFO Operators are stable (3/3)...
INFO Adding crc-admin and crc-developer contexts to kubeconfig...
Started the OpenShift cluster.
Are you also running a crc_vfkit_4.11.7_arm64
bundle?
Perhaps this also helps to understand the issue.
CRC is able to successfully pull images from "quay.io", but fails to pull from other registries, including "registry.redhat.io".
This is evidence of a successful pull in the same CRC.
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 20d default-scheduler Successfully assigned openshift-multus/multus-98qmd to crc-m9jbq-master-0 by crc-m9jbq-bootstrap
Normal Pulling 20d kubelet Pulling image "quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:050edf40d065b60c642be113092d2ebc157fcab62345324398bc81673794ecf7af"
Normal Pulled 20d kubelet Successfully pulled image "quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:050edf40d065b60c642be113092d2ebc157fcab6242342342c81673794ecf7af" in 2.7922208s
It is only working with quay.io.
what does registry.redhat.io
resolves to in the VM? host registry.redhat.io
Can't get into the VM to check.
oc get nodes
NAME STATUS ROLES AGE VERSION
crc-m9jbq-master-0 Ready master,worker 20d v1.24.0+3882f8f
$ oc debug node/crc-m9jbq-master-0
Warning: would violate PodSecurity "restricted:latest": host namespaces (hostNetwork=true, hostPID=true), privileged (container "container-00" must not set securityContext.privileged=true), allowPrivilegeEscalation != false (container "container-00" must set securityContext.allowPrivilegeEscalation=false), unrestricted capabilities (container "container-00" must set securityContext.capabilities.drop=["ALL"]), restricted volume types (volume "host" uses restricted volume type "hostPath"), runAsNonRoot != true (pod or container "container-00" must set securityContext.runAsNonRoot=true), runAsUser=0 (container "container-00" must not set runAsUser=0), seccompProfile (pod or container "container-00" must set securityContext.seccompProfile.type to "RuntimeDefault" or "Localhost")
Starting pod/crc-m9jbq-master-0-debug ...
To use host binaries, run `chroot /host`
warning: Container container-00 is unable to start due to an error: Back-off pulling image "quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:b303c164dcb97cff2e317f1f9a297ba42916858e498e8xxxx671171b103915da"
But I executed it from another running pod - e.g., like "multus" where it was able to pull from "quay.io":
$ oc exec -it multus-98qmd -n openshift-multus -- /bin/bash
[root@crc-m9jbq-master-0 /]# host quay.io
quay.io has address 50.19.184.112
quay.io has address 52.5.230.17
quay.io has address 52.70.198.51
quay.io has address 52.205.244.25
quay.io has address 75.101.245.134
quay.io has address 50.16.37.250
[root@crc-m9jbq-master-0 /]# host registry.redhat.io
registry.redhat.io has address 92.122.161.165
Then pinged from the "multus" pod...
[root@crc-m9jbq-master-0 /]# ping quay.io
PING quay.io (52.5.230.17) 56(84) bytes of data.
64 bytes from 52.5.230.17 (52.5.230.17): icmp_seq=1 ttl=64 time=0.365 ms
64 bytes from 52.5.230.17 (52.5.230.17): icmp_seq=2 ttl=64 time=0.770 ms
64 bytes from 52.5.230.17 (52.5.230.17): icmp_seq=3 ttl=64 time=0.489 ms
[root@crc-m9jbq-master-0 /]# ping registry.redhat.io
PING registry.redhat.io (92.122.161.165) 56(84) bytes of data.
64 bytes from 92.122.161.165 (92.122.161.165): icmp_seq=1 ttl=64 time=0.312 ms
64 bytes from 92.122.161.165 (92.122.161.165): icmp_seq=2 ttl=64 time=1.69 ms
64 bytes from 92.122.161.165 (92.122.161.165): icmp_seq=3 ttl=64 time=0.324 ms
64 bytes from 92.122.161.165 (92.122.161.165): icmp_seq=4 ttl=64 time=3.74 ms
Progress.
openshift-marketplace
looked like before the fix.myuser@mymac ~ % oc get pods
NAME READY STATUS RESTARTS AGE
certified-operators-h2lr5 0/1 ImagePullBackOff 0 20d
certified-operators-zj64b 0/1 ImagePullBackOff 3 21d
community-operators-lwzds 0/1 ImagePullBackOff 0 20d
community-operators-tr5xs 0/1 ImagePullBackOff 3 21d
marketplace-operator-8485c7444b-95prb 1/1 Running 0 20d
redhat-marketplace-hgwv8 0/1 ImagePullBackOff 0 20d
redhat-marketplace-wqdpg 0/1 ImagePullBackOff 3 21d
redhat-operators-87h27 0/1 ImagePullBackOff 0 5m14s
redhat-operators-pf5gn 0/1 ImagePullBackOff 0 3m24s
$ crc delete -f
$ crc delete --clear-cache
$ crc cleanup
$ ps - ef | grep crc (delete any hanging pid)
$ crc setup
$ crc start
Collected IPV4 from the /etc/resolv.conf
from my M1 Mac.
SSH'ed to the VM, commented any nameserver entry present in the file and added the IPV4 collected from my Mac into the VM /etc/resolv.conf
.
ssh -i .crc/machines/crc/id_ecdsa core@127.0.0.1 -p 2222
sudo vi /etc/resolv.conf
redhat-operators
pod in the openshift-markeplace
. Then all the pods in this namespace started to be recreated. And this is how it looks now.Note: the connection to the cluster will be lost while the marketplace-operator
is restarted, but it automatically reconnected.
oc get pods -n openshift-marketplace
NAME READY STATUS RESTARTS AGE
certified-operators-crkvd 0/1 ContainerCreating 0 20m
certified-operators-zj64b 1/1 Running 0 21d
community-operators-r89sz 0/1 ContainerCreating 0 20m
community-operators-tr5xs 1/1 Running 0 21d
marketplace-operator-8485c7444b-95prb 1/1 Running 2 (22m ago) 20d
redhat-marketplace-68l7p 0/1 ContainerCreating 0 20m
redhat-marketplace-wqdpg 1/1 Running 0 21d
redhat-operators-87h27 1/1 Running 0 32m
redhat-operators-kxtfh 0/1 ContainerCreating 0 15m
But now I have a different error:
Warning FailedCreatePodSandBox 38s kubelet Failed to create pod sandbox: rpc error: code = Unknown desc = failed to create pod network sandbox k8s_certified-operators-pkjtz_openshift-marketplace_0133d97d-ac62-49ed-9111-c6552cd7a58b_0(a5d98752bca56ff26909a2f5abe94b3c4987def2ae8009cc4909433d906f4885): error adding pod openshift-marketplace_certified-operators-pkjtz to CNI network "multus-cni-network": plugin type="multus" name="multus-cni-network" failed (add): Multus: [openshift-marketplace/certified-operators-pkjtz/0133d97d-ac62-49ed-9111-c6552cd7a58b]: error getting pod: Get "https://[api-int.crc.testing]:6443/api/v1/namespaces/openshift-marketplace/pods/certified-operators-pkjtz?timeout=1m0s": dial tcp: lookup api-int.crc.testing on 192.168.0.1:53: read udp 192.168.127.2:34211->192.168.0.1:53: i/o timeout
I added api-int.crc.testing
to my Mac /etc/hosts
and restarted the pods but no success. But it is awkwards as the other respective pod is running.
SSH'ed to the VM, commented any nameserver entry present in the file and added the IPV4 collected from my Mac into the VM /etc/resolv.conf.
Do you know what was wrong with the DNS the VM had?
It didn't include the actual DNS entry of the hosting machine.
@cfergeau , but now I have this issue with the CNI network. So, unsure if my DNS change to the VM fixed the issued or introduced another problem.
@mhcastro VM's /etc/resolv.conf
have the user stack network nameserver which suppose to handle requests like api-int.crc.testing
you shouldn't remove those.
Something is really wrong with this DNS setup in the crc VM, but couldn't find a way to permanently solve it.
For quay.io
it works just fine.
One additional type of information:
From the CRC VM:
[core@crc-xxx-master-0 ~]$ host quay.io
quay.io has address 52.205.244.25
quay.io has address 3.230.30.87
quay.io has address 3.232.106.42
quay.io has address 18.215.228.51
quay.io has address 50.16.37.250
quay.io has address 52.70.198.51
[core@crc-xxx-master-0 ~]$ host registry.redhat.io
registry.redhat.io has address 92.122.161.165
From the host M1 Mac:
myuser@mym1mac ~ % host quay.io
quay.io has address 3.230.30.87
quay.io has address 3.232.106.42
quay.io has address 18.215.228.51
quay.io has address 50.16.37.250
quay.io has address 52.70.198.51
quay.io has address 52.205.244.25
quay.io has IPv6 address 2600:1f18:483:cf00:73b2:5198:e32b:d790
quay.io has IPv6 address 2600:1f18:483:cf00:f053:17f8:315:f7eb
quay.io has IPv6 address 2600:1f18:483:cf01:1e95:f72a:23b:23ff
quay.io has IPv6 address 2600:1f18:483:cf01:f33f:9173:c6d7:dcdd
quay.io has IPv6 address 2600:1f18:483:cf02:ab11:6617:79b0:cf33
quay.io has IPv6 address 2600:1f18:483:cf02:eec9:4ec2:8f41:f7a2
quay.io mail is handled by 1 us-smtp-inbound-1.mimecast.com.
myuser@mym1mac ~ % host registry.redhat.io
registry.redhat.io is an alias for registry.redhat.io.edgekey.net.
registry.redhat.io.edgekey.net is an alias for e14353.g.akamaiedge.net.
e14353.g.akamaiedge.net has address 92.122.161.165
It also fails for registry.docker.io
From the CRC VM:
$ host registry.docker.io
registry.docker.io has address 44.205.64.79
registry.docker.io has address 3.216.34.172
registry.docker.io has address 34.205.13.154
From the host M1 Mac:
host registry.docker.io
registry.docker.io is an alias for registry-1.docker.io.
registry-1.docker.io has address 3.216.34.172
registry-1.docker.io has address 34.205.13.154
registry-1.docker.io has address 44.205.64.79
The CRC VM is not properly configured to handle aliases. It works for quay.io
which doesn't have an alias and failed in the same way for both registry.redhat.io
and registry.docker.io
which have aliases.
@mhcastro @IBMRob I had this same issue as well on M1 Max and I resolved it by making sure that my DNS had 8.8.8.8 at the top of the stack of DNS server entries in my Network preferences. I didn't restart anything else and it started to pull from registry.redhat. I did have 8.8.8.8 at the bottom of the stack previously so it looks like one of the other internal IBM DNS servers (I see you are both from IBM) is getting in the way for some reason.
This can be closed. It is working now after upgrading to Mac OS Ventura 13.3 on M1 chip, without any change to DNS.
oc get pods -n openshift-marketplace
NAME READY STATUS RESTARTS AGE
certified-operators-zj64b 1/1 Running 0 194d
community-operators-tr5xs 1/1 Running 0 194d
marketplace-operator-8485c7444b-95prb 1/1 Running 0 193d
redhat-marketplace-wqdpg 1/1 Running 0 194d
redhat-operators-r7zk5 1/1 Running 0 194d
@IBMRob I am closing it, please create a new one if issue still persist with latest version of OpenShift local.
General information
crc setup
before starting it (Yes/No)? YesCRC version
CRC status
CRC config
Host Operating System
Steps to reproduce
openshift-marketplace
projectExpected
All pods to be running
Actual
All pods bar 1 are failing as they are timing out trying to talk to redhat.
Looks like a networking issue given
Failed to pull image "registry.redhat.io/redhat/certified-operator-index:v4.11": rpc error: code = Unknown desc = Get "https://registry.redhat.io/auth/realms/rhcc/protocol/redhat-docker-v2/auth?scope=repository%3Aredhat%2Fcertified-operator-index%3Apull&service=docker-registry": dial tcp: lookup registry.redhat.io on 192.168.127.1:53: read udp 192.168.127.2:60694->192.168.127.1:53: i/o timeout
If I try and pull a random image it also fails i.e.
Failed to pull image "nginx": rpc error: code = Unknown desc = pinging container registry registry-1.docker.io: Get "https://registry-1.docker.io/v2/": dial tcp: lookup registry-1.docker.io on 192.168.127.1:53: read udp 192.168.127.2:55975->192.168.127.1:53: i/o timeout
Logs
Before gather the logs try following if that fix your issue
Please consider posting the output of
crc start --log-level debug
on http://gist.github.com/ and post the link in the issue.