amenasse / antkube

Artifacts for my personal Kubernetes cluster
1 stars 0 forks source link

Can't pull image from private repository #5

Open amenasse opened 4 years ago

amenasse commented 4 years ago

Nodes can't pull images on deployment when referencing the private registry (registry.dev.fullbacksystems.com)

Tried using both internal and external dns names.

apiVersion: apps/v1
kind: Deployment
metadata:
  name: fullback-website
  labels:
    app: fullback-website
spec:
  replicas: 2
  selector:
    matchLabels:
      app: fullback-website
  template:
    metadata:
      labels:
        app: fullback-website
    spec:
      hostAliases:
      - ip: "192.168.0.107"
        hostnames:
        - "registry.dev.fullbacksystems.com"
        - "docker-registry.default"
      containers:
      - name: fullback-website
        image: registry.dev.fullbacksystems.com/fullbacksystems-website:8523786c
        imagePullPolicy: Always
        ports:
        - containerPort: 80

Using the external dns name (registry.dev.fullbacksystems.com)

Aug 11 11:10:25 mico k3s[6292]: E0811 11:10:25.036322    6292 kuberuntime_image.go:50] Pull image "registry.dev.fullbacksystems.com/fullbacksystems-website:8523786c" failed: rpc error: code = NotFound desc = failed to pull and unpack image "registry.dev.

This is resolvable on the node:

$ nslookup registry.dev.fullbacksystems.com
Server:     127.0.0.53
Address:    127.0.0.53#53

Non-authoritative answer:
Name:   registry.dev.fullbacksystems.com
Address: 192.168.0.107

But not within the cluster via coredns:

$ kubectl exec -i -t dnsutils -- nslookup registry.dev.fullbackystems.com
Server:     10.43.0.10
Address:    10.43.0.10#53

** server can't find registry.dev.fullbackystems.com: NXDOMAIN

command terminated with exit code 1

Using internal dns name (docker-regsitry.default.svc.cluster.local):

Failed to pull image "docker-registry.default.svc.cluster.local/fullbacksystems-website:8523786c": rpc error: code = Unknown 
desc = failed to pull and unpack image "docker-registry.default.svc.cluster.local/fullbacksystems-website:8523786c": failed to resolve reference "docker-registry.default.svc.cluster.local/fullbacksystems-website:8523786c": failed to do request: Head https://docker-registry.default.svc.cluster.local/v2/fullbacksystems-website/manifests/8523786c: dial tcp: lookup docker-registry.default.svc.cluster.local: no such host
amenasse commented 4 years ago

I can't resolve registry.dev.fullbacksystems.com from within the cluster because the systemd resolver is bound to loopback (127.0.0.53) apparently this can't be modified

https://unix.stackexchange.com/questions/445782/how-to-allow-systemd-resolved-to-listen-to-an-interface-other-than-loopback

amenasse commented 4 years ago

Possibly relevant:

https://coredns.io/plugins/k8s_external/

Possibly related issues:

https://github.com/rancher/k3s/issues/1640

https://github.com/rancher/k3s/issues/1581

https://github.com/rancher/k3s/issues/1863

amenasse commented 4 years ago

For now i've installed dnsmasq and disabled the systemd DNS stub resolver. In other words, local name resolution is handled by dnsmasq now which unlike systemd resolver can be configured to listen on a specific interface (not just loopback)

I then modified the coredns config map to use this resolver:

           fallthrough
         }
         prometheus :9153
-        forward . /etc/resolv.conf
+        forward . 192.168.0.107
         cache 30
         loop
         reload

This works :tada: but will need to be done in a repeatable manner. Restarting k3s will revert to the old coredns map. Creating a seperate ticket for this.

amenasse commented 4 years ago

Replicating directly with containerd not running in cluster with --debug , looks like same issue, extra debug info shows its not a resolution error though but a 404 response from the registry

$ sudo ctr  --debug image pull registry.dev.fullbacksystems.com/fullbacksystems-website:8523786

DEBU[0000] fetching                                      image="registry.dev.fullbacksystems.com/fullbacksystems-website:8523786"
DEBU[0000] resolving                                    
DEBU[0000] do request                                    request.headers=map[Accept:[application/vnd.docker.distribution.manifest.v2+json, application/vnd.docker.distribution.manifest.list.v2+json, application/vnd.oci.image.manifest.v1+json, application/vnd.oci.image.index.v1+json, *]] request.method=HEAD url="https://registry.dev.fullbacksystems.com/v2/fullbacksystems-website/manifests/8523786"
DEBU[0000] fetch response received                       response.headers=map[Content-Length:[97] Content-Type:[application/json; charset=utf-8] Date:[Tue, 11 Aug 2020 08:06:55 GMT] Docker-Distribution-Api-Version:[registry/2.0] Vary:[Accept-Encoding] X-Content-Type-Options:[nosniff]] status="404 Not Found" url="https://registry.dev.fullbacksystems.com/v2/fullbacksystems-website/manifests/8523786"
ctr: failed to resolve reference "registry.dev.fullbacksystems.com/fullbacksystems-website:8523786": registry.dev.fullbacksystems.com/fullbacksystems-website:8523786 not found

On registry:

$ kubectl logs docker-registry-7d6758cbcb-5xh2c 

10.42.0.6 - - [11/Aug/2020:11:06:20 +0000] "HEAD /v2/fullbacksystems-website/manifests/8523786 HTTP/1.1" 404 97 "" "containerd/1.2.12"
time="2020-08-11T11:06:20.822302582Z" level=error msg="response completed with error" err.code="manifest unknown" err.detail="unknown tag=8523786" err.message="manifest unknown" go.version=go1.11.2 http.request.host=registry.dev.fullbacksystems.com http.request.id=3cb0db45-49e4-4107-bbde-8e141f4280a0 http.request.method=HEAD http.request.remoteaddr=10.42.0.1 http.request.uri="/v2/fullbacksystems-website/manifests/8523786" http.request.useragent="containerd/1.2.12" http.response.contenttype="application/json; charset=utf-8" http.response.duration=1.538491ms http.response.status=404 http.response.written=97 vars.name=fullbacksystems-website vars.reference=8523786 
amenasse commented 4 years ago

This works so not sure why its 404'ing via containerd:

$ curl -H "Accept: application/vnd.oci.image.manifest.v1+json"  -v https://registry.dev.fullbacksystems.com/v2/fb/fullback-systems-website/manifests/8523786c |jq
amenasse commented 4 years ago

Oh theres a typo in the repository name, repository is fullback-systems-website not fullbacksystems-website

But now getting new errror:

ctr: failed to extract layer sha256:13cb14c2acd34e45446a50af25cb05095a17624678dbafbcc9e26086547c1d74: mount callback failed on /var/lib/containerd/tmpmounts/containerd-mount550143959: archive/tar: invalid tar header: unknown

Works fine with podman:

 podman image pull registry.dev.fullbacksystems.com/fullback-systems-website:8523786c
amenasse commented 4 years ago

Looks like the failed to extract layer issue was due to the image being pushed with podman, repushing with docker and pulling again with ctr worked. Will investigate this separately. See #6