Closed jmbravo closed 4 months ago
Ok so it seems the workflow pod is not getting the docker credentials. The parent pod has a volume:
volumeMounts:
- mountPath: /home/runner/.docker/
name: docker-secret
readOnly: true
volumes:
- name: docker-secret
secret:
items:
- key: .dockerconfigjson
path: config.json
secretName: regcred
But the workflow pod doesn't have it.
Is this normal?
Hey @jmbravo,
Please correct me if I'm wrong, but you should configure image pull secrets in this case. In container mode kubernetes, instead of using docker and running docker pull, we run a pod. So if you are using a private image for your pod, you would have to configure image pull secrets in order to allow kubernetes to pull the image properly. Can you please tell me how are you providing those credentials? If credentials are provided within a workflow, the hook will set the imagePullSecrets
field.
Hey @nikola-jokic, thanks for your response.
I tried two things with no luck:
1- Add
imagePullSecrets:
- name: regcred
to my RunnerDeployment
2 - Add
imagePullSecrets:
- name: regcred
image:
actionsRunnerImagePullSecrets:
- regcred
to Helm values.yml
What am I missing?
Where am I suppose to add the imagePullSecrets so my workflow pod gets it?
This is my complete RunnerDeploymeny yml
apiVersion: actions.summerwind.dev/v1alpha1
kind: RunnerDeployment
metadata:
name: arc-runner-cloudops-test
namespace: runner
spec:
template:
metadata:
spec:
imagePullSecrets:
- name: regcred
tolerations:
- key: node-pool
effect: NoSchedule
operator: Equal
value: runner
- key: "node.kubernetes.io/unreachable"
operator: "Exists"
effect: "NoExecute"
tolerationSeconds: 10
dnsConfig:
nameservers:
- 8.8.8.8
containerMode: kubernetes
serviceAccountName: default
workVolumeClaimTemplate:
storageClassName: "ebs-pool"
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 10Gi
volumeMounts:
- mountPath: /home/runner/.docker/
name: docker-secret
readOnly: true
volumes:
- name: docker-secret
secret:
items:
- key: .dockerconfigjson
path: config.json
secretName: regcred
organization: mycompany
group: amazon-github-runners-cloudops-test
labels:
- arc-runner-cloudops-test
env:
- name: ACTIONS_RUNNER_PRINT_LOG_TO_STDOUT
value: "true"
- name: DISABLE_RUNNER_UPDATE
value: "true"
- name: RUNNER_GRACEFUL_STOP_TIMEOUT
value: "120"
terminationGracePeriodSeconds: 180
imagePullPolicy: IfNotPresent
Thank you!
Oh of course, happy to help! :relaxed:
With the current setup, the hook will not be able to see the image pull secrets you specified. There are two ways you can do this:
Thanks, that makes sense!
However, if I understood you correctly, you mean adding the credentials in the Github workflow. I have done that, but I still get the same error:
jobs:
Getting-data-from-RDS:
runs-on: arc-runner-cloudops-test
container:
image: artifactory.mycompany.com/cloudops-test/ubuntu-sqlplus:1.0
credentials:
username: $ARTIFACTORY_USER_TEST
password: $ARTIFACTORY_PASSWORD_TEST
Did you mean this? Sorry, I've been like this for a couple of days now and I'm a bit blind.
Yes, that is exactly what I meant. Is it not working? Can you please show output of kubectl get pod $YOUR_JOB_POD -o yaml
to see what is actually applied to it?
It seems I'm totally blind. I can see now the imagePullSecrets:
imagePullSecrets:
- name: arc-runner-cloudops-test-7pqzd-868s8-secret-065e7127
But for some reason, I'm getting the same DNS error:
kubelet Failed to pull image "artifactory.mycompany.com/test-cloudops-images/ubuntu-sqlplus:1.0": failed to pull and unpack image "artifactory.mycompany.com/test-cloudops-images/ubuntu-sqlplu │ │ s:1.0": failed to resolve reference "artifactory.mycompany.com/test-cloudops-images/ubuntu-sqlplus:1.0": failed to do request: Head "https://artifactory.mycompany.com/v2/test-cloudops-images/ubuntu-sqlplus/manifests/1.0": dial tcp: loo │ │ kup artifactory.mycompany.com on 10.77.252.41:53: no such host
I got the base64 content and it's totally fine, so I don't know why job-pods can't resolve my artifactory url, but can resolve ECR or dockerhub ones.
To be honest, I don't know whose ip that is. There's no pod or service with that 10.77.252.41:53 ip
I'm running out of ideas, I also run the workflow with an ECR image with a sleep, got into the pod and I was able to resolve artifactory url without any problem.
Thanks again!
Oh, that IP looks like internal Kubernetes IP since it is in a private range. I am not sure exactly why is it resolving to that IP, but this is definitely outside of hook's control. The port is DNS so it is probably trying to resolve it and failing.
This is the first time I'm seeing this problem, and I'm curious why did you add dns config to the deployment? If that is the requirement, then you are probably out of luck, and you will have to use a hook extension... But, in that case, if you already added credentials to your workflows, your extension can only modify dns configuration and it should eliminate the issue. Please let me know if that makes sense.
Actually, adding the DNS config to the runnerDeployment is not neccesary, since I added to my coredns corefile before, but I was desperate and I tried this. No luck at all.
Above is my coredns corefile, and the artifactory ip is resolving on every pod but the workflow one. I am at an impasse.
.:53 {
errors
health {
lameduck 5s
}
ready
kubernetes cluster.local in-addr.arpa ip6.arpa {
pods insecure
fallthrough in-addr.arpa ip6.arpa
}
prometheus :9153
forward . /etc/resolv.conf
cache 30
loop
reload
loadbalance
}
artifactory.mycompany.com:53 {
errors
cache 30
forward . 8.8.8.8
reload
}
Anyway, thanks for your help, I'll keep trying!
Closing this. It was a conflict with our private DNS and kubelet's, since EKS nodes are in the VPC that has Direct Connect.
Thank you for your patience and support!
Hi,
I have changed the containerMode from dind to Kubernetes.
The problem is that when I launch the workflow the newly created pod cannot pull the image, why can this happen?
It seems to be a DNS error. On the other hand, the "parent" pod has configured the secret regcred and in dind mode it was pulling without problem.
What could have happened so that in kubernetes mode it can't pull?
Thanks!
---- ------ ---- ---- ------- Normal Pulling 24s kubelet Pulling image "artifactory.mycompany.com/cloudops-images/ubuntu-sqlplus:1.0" Warning Failed 24s kubelet Failed to pull image "artifactory.mycompany.com/cloudops-images/ubuntu-sqlplus:1.0": failed to pull and unpack image "artifactory.mycompany.com/cloudops-images/ubuntu-sqlplus:1.0": failed to resolve reference "artifactory.mycompany.com/cloudops-images/ubuntu-sqlplus:1.0": failed to do request: Head "https://artifactory.mycompany.com/v2/cloudops-images/ubuntu-sqlplus/manifests/1.0": dial tcp: lookup artifactory.mycompany.com on 10.77.252.41:53: no such host Warning Failed 24s kubelet Error: ErrImagePull Normal BackOff 24s kubelet Back-off pulling image "artifactory.mycompany.com/cloudops-images/ubuntu-sqlplus:1.0" Warning Failed 24s kubelet Error: ImagePullBackOff
Edit: my container registry has public IP so I need nameserver 8.8.8.8, I don't know why it's trying 10.77.252.41:53