Open ygao-armada opened 2 months ago
Thanks for reporting @ygao-armada. We are looking into this issue and will get back with any information we find.
@sp1999 Some update, I find it's related to gpu-operator, look like, if we install argocd before gpu-operator, there is no such issue. And I install argocd with:
kubectl create namespace argocd
kubectl apply -n argocd -f https://raw.githubusercontent.com/argoproj/argo-cd/stable/manifests/install.yaml
And I install gpu-operator with instruction from: https://github.com/NVIDIA/gpu-operator/blob/release-23.9/scripts/install-gpu-operator-nvaie.sh
What happened: In EKSA cluster for vSphere, we have a strange error, on worker node, if we replace the /etc/resolv.conf with that from pod argocd-server-xxx:
The nslook up command will resolve the IP (10.96.221.1) first, then wait for 10 seconds til timeout
We can see the IP (10.96.221.1) is correct as follows:
And 10.96.192.10 is the coredns IP:
Am I missing something?
What you expected to happen: No timeout should happen for command "nslookup argocd-redis"
How to reproduce it (as minimally and precisely as possible): Install argoCD on a EKSA vSphere cluster, and take the steps in above description.
Anything else we need to know?:
Environment: