cnoe-io / idpbuilder

Spin up a complete internal developer platform with only Docker required as a dependency.
https://cloud-native.slack.com/archives/C05TN9WFN5S
Apache License 2.0
174 stars 56 forks source link

Feature: CoreDNS configuration for core packages #300

Closed nabuskey closed 3 months ago

nabuskey commented 3 months ago

Have you searched for this feature request?

Problem Statement

When interacting with core packages like argocd and gitea, the in-cluster DNS name and external (local machine) DNS name may not match out of the box. We work around this by creating a CoreDNS configuration https://github.com/cnoe-io/stacks/blob/main/ref-implementation/coredns/manifests/cm-coredns.yaml in ref impl.

I don't think we want to do this in external packages. I'd rather take care of these kind of things in idpbuilder itself.

Possible Solution

Once the cluster is up, we should:

  1. Update the coreDNS CM to use import. https://coredns.io/plugins/import/
  2. Enter name resolution information in the referenced file.
  3. Wait until name resolves correctly. This is a one-time operation. We should not do this in case another package overwrites the CM.

Alternatives Considered

No response

cmoulliard commented 3 months ago

2. Enter name resolution information in the referenced file.

According to the doc page, the rewrite rule should include FROM -> TO but in your example

        rewrite name cnoe.localtest.me ingress-nginx-controller.ingress-nginx.svc.cluster.local

the incoming names to be resolved including cnoe.localtest.me will be rewritten to ingress-nginx-controller.ingress-nginx.svc.cluster.local. Are you sure that the rule is correct ?

cmoulliard commented 3 months ago

Is there a way to easily test the rewrite rule post update of the codedns CM ?

cmoulliard commented 3 months ago

I did a test where I patched the coreDNS CM

Data
====
Corefile:
----
.:53 {
    errors
    health {
       lameduck 5s
    }
    ready
    rewrite name cnoe.localtest.me ingress-nginx-controller.ingress-nginx.svc.cluster.local
    kubernetes cluster.local in-addr.arpa ip6.arpa {
       pods insecure
       fallthrough in-addr.arpa ip6.arpa
       ttl 30
    }
    prometheus :9153
    forward . /etc/resolv.conf {
       max_concurrent 1000
    }
    cache 30
    loop
    reload
    loadbalance
}

of my idpbuilder cluster to include the rewrite rule, restarted the coreDNS deployment and argocd server still complain with the same DNS resolution error when it must create an Application for the git repository -> http://gitea.cnoe.localtest.me:8443/qshift/my-quarkus-app-job.git

Error creating argo app: application spec for my-quarkus-app-job-bootstrap is invalid: InvalidSpecError: repository not accessible: repositories not accessible: &Repository{Repo: "http://gitea.cnoe.localtest.me:8443/qshift/my-quarkus-app-job.git", Type: "", Name: "", Project: ""}: repo client error while testing repository: rpc error: code = Unknown desc = error testing repository connectivity: Get "http://gitea.cnoe.localtest.me:8443/qshift/my-quarkus-app-job.git/info/refs?service=git-upload-pack": dial tcp 127.0.0.1:8443: connect: connection refused
cmoulliard commented 3 months ago

If we add to the CM the log parameter, then we can see the DNS records logged when the gitea host FQDN is resolved

[INFO] 10.244.0.8:46270 - 38264 "AAAA IN gitea.cnoe.localtest.me.svc.cluster.local. udp 70 false 1232" NXDOMAIN qr,aa,rd 152 0.000110333s
[INFO] 10.244.0.8:55605 - 47160 "AAAA IN gitea.cnoe.localtest.me.cluster.local. udp 66 false 1232" NXDOMAIN qr,aa,rd 148 0.000076375s
[INFO] 10.244.0.8:46066 - 48261 "A IN gitea.cnoe.localtest.me.dns.podman. udp 63 false 1232" NXDOMAIN qr,rd,ra 52 0.000522584s 
[INFO] 10.244.0.8:44104 - 28307 "AAAA IN gitea.cnoe.localtest.me. udp 52 false 1232" NOERROR qr,rd,ra 41 0.001008668s
[INFO] 10.244.0.8:51231 - 60626 "A IN gitea.cnoe.localtest.me. udp 52 false 1232" NOERROR qr,rd,ra 80 0.001741836s

Such a behavior can also be observed if we do a DNS request inside a pod

kubectl run -it --rm --restart=Never --image=infoblox/dnstools:latest dnstools
If you don't see a command prompt, try pressing enter.
...
dnstools# host gitea.cnoe.localtest.me
gitea.cnoe.localtest.me has address 127.0.0.1
Host gitea.cnoe.localtest.me not found: 3(NXDOMAIN)
dnstools#

So it seems that requests to gitea.cnoe.localtest.me are still resolved to the address 127.0.0.1 and the rewrite rule is perhaps (to be confirmed) not correct

If now we change the rewrite rule to:

rewrite name gitea.cnoe.localtest.me my-gitea-http.gitea.svc.cluster.local

then resolution works

[INFO] 10.244.0.43:58969 - 44638 "A IN gitea.cnoe.localtest.me.default.svc.cluster.local. udp
67 false 512" NXDOMAIN qr,aa,rd 160 0.000115584s
[INFO] 10.244.0.43:35271 - 16935 "A IN gitea.cnoe.localtest.me.cluster.local. udp 55 false 512
" NXDOMAIN qr,aa,rd 148 0.000055458s

and 

dnstools# host gitea.cnoe.localtest.me
gitea.cnoe.localtest.me has address 10.244.0.14

If I scaffold a backstage template using the argocd:create-resources action, then the resolution looks good except that the port number used is not correct.

error testing repository connectivity: Get "http://gitea.cnoe.localtest.me:8443/qshift/my-quarkus-app-job.git/info/refs?service=git-upload-pack
": dial tcp 10.244.0.14:8443: connect: connection refused

We can also internally now curl to the gitea server

kubectl run -it --rm --restart=Never --image=infoblox/dnstools:latest dnstools
curl -s http://gitea.cnoe.localtest.me:3000/api/v1/orgs
[{"id":2,"name":"qshift","full_name":"","email":"","avatar_url":"https://gitea.cnoe.localtest.me:8443/avatars/cd15cef2eff694839d401ca9984dfa4d","description":"","website":"","location":"","visibility":"public","repo_admin_change_team_access":false,"username":"qshift"}]

Question: Can we rewrite too the port 8443 to 3000 ?

Interesting link: https://coredns.io/2017/05/08/custom-dns-entries-for-kubernetes/

nabuskey commented 3 months ago

Port numbers cannot be rewritten as part of name resolution as far as I know.

It does work for me with certificate and dns resolution with the following config.

Core DNS:

kubectl get cm -n kube-system coredns -o yaml
apiVersion: v1
data:
  Corefile: |
    .:53 {
        errors
        health {
           lameduck 5s
        }
        ready

        rewrite name gitea.cnoe.localtest.me ingress-nginx-controller.ingress-nginx.svc.cluster.local
        rewrite name cnoe.localtest.me ingress-nginx-controller.ingress-nginx.svc.cluster.local

        kubernetes cluster.local in-addr.arpa ip6.arpa {
           pods insecure
           fallthrough in-addr.arpa ip6.arpa
           ttl 30
        }
        prometheus :9153
        forward . /etc/resolv.conf {
           max_concurrent 1000
        }
        cache 30
        loop
        reload
        loadbalance
    }
kind: ConfigMap
metadata:
  creationTimestamp: "2024-06-18T19:24:58Z"
  name: coredns
  namespace: kube-system
  resourceVersion: "35270"
  uid: bd822510-1a59-4595-ba61-dd5c0ac21aa0
$ kubectl get application -n argocd test -o yaml
apiVersion: argoproj.io/v1alpha1
kind: Application
metadata:
  name: test
  namespace: argocd
spec:
  destination:
    namespace: default
    server: https://kubernetes.default.svc
  project: default
  source:
    path: path
    repoURL: https://gitea.cnoe.localtest.me:8443/giteaAdmin/test.git
    targetRevision: HEAD

I do have a self cert issued as the default ingress TLS certificate. This feature is coming soon but fundamentally, it's just comes down to:

  1. Create a self signed cert
  2. Create a TLS secret.
  3. Reference it in ingress-nginx. e.g. --default-ssl-certificate=default/foo-tls
  4. Update argocd-tls-certs-cm with cert.
cmoulliard commented 3 months ago

So you confirm that the rewrite rule of the ref implementation was not correct and the good to be used are

        rewrite name gitea.cnoe.localtest.me ingress-nginx-controller.ingress-nginx.svc.cluster.local
        rewrite name cnoe.localtest.me ingress-nginx-controller.ingress-nginx.svc.cluster.local
cmoulliard commented 3 months ago
  1. Create a self signed cert

Did you added alt names part of the certificate generated ?

cmoulliard commented 3 months ago

If I scaffold a backstage template using the argocd:create-resources action, then the resolution looks good except that the port number used is not correct.

error testing repository connectivity: Get "http://gitea.cnoe.localtest.me:8443/qshift/my-quarkus-app-job.git/info/refs?service=git-upload-pack
": dial tcp 10.244.0.14:8443: connect: connection refused

Are you sure that your example is working as I got a connection refused except if I change the port from 8443 to 3000 for internal (pod to pod) communication ?

nabuskey commented 3 months ago

So you confirm that the rewrite rule of the ref implementation was not correct and the good to be used are


        rewrite name gitea.cnoe.localtest.me ingress-nginx-controller.ingress-nginx.svc.cluster.local

        rewrite name cnoe.localtest.me ingress-nginx-controller.ingress-nginx.svc.cluster.local

It is correct for ref implementation because it uses path based routing which means everything is exposed under a single domain name. So we don't need gitea.cone.localtest.me

Did you added alt names part of the certificate generated ?

Yes SANs are configured.

https://github.com/nabuskey/idpbuilder/blob/5065c7a452e9cfda3631dcb05d1cc3604d9eb1a2/pkg/util/tls.go#L89-L151

Are you sure that your example is working as I got a connection refused except if I change the port from 8443 to 3000 for internal (pod to pod) communication ?

Yes. Yours is not working because you are pointing it at the gitea service. Point it at the ingress instead like I did.

cmoulliard commented 3 months ago

I did a new test without success even if DNS name is well resolved:

   rewrite name gitea.cnoe.localtest.me my-gitea-http.gitea.svc.cluster.local

Test

kubectl run -it --rm --restart=Never --image=infoblox/dnstools:latest dnstools
If you don't see a command prompt, try pressing enter.
dnstools# host gitea.cnoe.localtest.me
gitea.cnoe.localtest.me has address 10.244.0.13

BUT curl command executed within the `dnstools` pod fails

curl -k https://gitea.cnoe.localtest.me:8443/api/v1/orgs
curl: (7) Failed to connect to gitea.cnoe.localtest.me port 8443: Connection refused
nabuskey commented 3 months ago

I did a new test without success even if DNS name is well resolved:

   rewrite name gitea.cnoe.localtest.me my-gitea-http.gitea.svc.cluster.local

Test

kubectl run -it --rm --restart=Never --image=infoblox/dnstools:latest dnstools
If you don't see a command prompt, try pressing enter.
dnstools# host gitea.cnoe.localtest.me
gitea.cnoe.localtest.me has address 10.244.0.13

BUT curl command executed within the `dnstools` pod fails

curl -k https://gitea.cnoe.localtest.me:8443/api/v1/orgs
curl: (7) Failed to connect to gitea.cnoe.localtest.me port 8443: Connection refused

Point it to ingress. Not gitea service.

rewrite name gitea.cnoe.localtest.me ingress-nginx-controller.ingress-nginx.svc.cluster.local
cmoulliard commented 3 months ago

Sorry. That works. I changed manually the CM but argocd reverted it to the default :-) New test

kubectl run -it --rm --restart=Never --image=infoblox/dnstools:latest dnstools
If you don't see a command prompt, try pressing enter.
dnstools# host gitea.cnoe.localtest.me
gitea.cnoe.localtest.me has address 10.96.60.85
dnstools# curl -k https://gitea.cnoe.localtest.me:8443/api/v1/orgs
[]

and if I create an org

dnstools# curl -k https://gitea.cnoe.localtest.me:8443/api/v1/orgs
[{"id":2,"name":"qshift","full_name":"","email":"","avatar_url":"https://gitea.cnoe.localtest.me:8443/avatars/cd15cef2eff694839d401ca9984dfa4d","description":"","website":"","location":"","visibility":"public","repo_admin_change_team_access":false,"username":"qshift"}]

I did a new test using my backstage template + argocd:create-resources action and except the TLS issue DNS resolution is working fine

Get "https://gitea.cnoe.localtest.me:8443/q-shift/my-quarkus-app-job.git/info/refs?service=git-upload-pack":
tls: failed to verify certificate: x509: certificate is valid for ingress.local, not [gitea.cnoe.localtest.me](http://gitea.cnoe.localtest.me/)

@nabuskey

cmoulliard commented 3 months ago
  • Reference it in ingress-nginx. e.g. --default-ssl-certificate=default/foo-tls

  • Update argocd-tls-certs-cm with cert.

What did you defined within the argocd-tls-certs-cm ?

apiVersion: v1
kind: ConfigMap
metadata:
  name: argocd-tls-certs-cm
  namespace: argocd
  labels:
    app.kubernetes.io/name: argocd-cm
    app.kubernetes.io/part-of: argocd
data:
  ????: |
  <CERTIFICATE/>

@nabuskey

cmoulliard commented 3 months ago

I think that I did what it is needed but argocd:create-resources backstage action complains about tls: failed to verify certificate: x509: certificate signed by unknown authority

nabuskey commented 3 months ago

Something like:

apiVersion: v1
data:
  cnoe.localtest.me: |
    -----BEGIN CERTIFICATE-----
    MIIBrzCCAVWgAwIBAgIRAIQj3n3aJFrHGMS0fVUJgiUwCgYIKoZIzj0EAwIwEjEQ
    MA4GA1UEChMHY25vZS5pbzAeFw0yNDA2MTkxNzM2MTlaFw0yNTA2MTkyMzM2MTla
    MBIxEDAOBgNVBAoTB2Nub2UuaW8wWTATBgcqhkjOPQIBBggqhkjOPQMBBwNCAARo
    u/zAIGhxqfP01ozg4MG8ISrGZMy1+6vGlv2V5rvBZF/lcvfJVTKYjz4bXi/rXbvy
    4eIjqMFqJ6r6aKpZNBLjo4GLMIGIMA4GA1UdDwEB/wQEAwIChDATBgNVHSUEDDAK
    BggrBgEFBQcDATAPBgNVHRMBAf8EBTADAQH/MB0GA1UdDgQWBBTYNT/ktulK/q9c
    ZteIu3lItJti/TAxBgNVHREEKjAoghFjbm9lLmxvY2FsdGVzdC5tZYITKi5jbm9l
    LmxvY2FsdGVzdC5tZTAKBggqhkjOPQQDAgNIADBFAiEAzNHjPgmT83gIJHdxS8Lp
    jrQfiyIfglYxsaB07iiefo8CICY9nIKRoQLz4GqJW+FCrrTFZHqeB9CirFhyHM+9
    EsCc
    -----END CERTIFICATE-----
  gitea.cnoe.localtest.me: |
    -----BEGIN CERTIFICATE-----
    MIIBrzCCAVWgAwIBAgIRAIQj3n3aJFrHGMS0fVUJgiUwCgYIKoZIzj0EAwIwEjEQ
    MA4GA1UEChMHY25vZS5pbzAeFw0yNDA2MTkxNzM2MTlaFw0yNTA2MTkyMzM2MTla
    MBIxEDAOBgNVBAoTB2Nub2UuaW8wWTATBgcqhkjOPQIBBggqhkjOPQMBBwNCAARo
    u/zAIGhxqfP01ozg4MG8ISrGZMy1+6vGlv2V5rvBZF/lcvfJVTKYjz4bXi/rXbvy
    4eIjqMFqJ6r6aKpZNBLjo4GLMIGIMA4GA1UdDwEB/wQEAwIChDATBgNVHSUEDDAK
    BggrBgEFBQcDATAPBgNVHRMBAf8EBTADAQH/MB0GA1UdDgQWBBTYNT/ktulK/q9c
    ZteIu3lItJti/TAxBgNVHREEKjAoghFjbm9lLmxvY2FsdGVzdC5tZYITKi5jbm9l
    LmxvY2FsdGVzdC5tZTAKBggqhkjOPQQDAgNIADBFAiEAzNHjPgmT83gIJHdxS8Lp
    jrQfiyIfglYxsaB07iiefo8CICY9nIKRoQLz4GqJW+FCrrTFZHqeB9CirFhyHM+9
    EsCc
    -----END CERTIFICATE-----
kind: ConfigMap

The certificate differs between installations obviously.

cmoulliard commented 3 months ago

The certificate differs between installations obviously.

Is the certificate you generate signed by a well-know authority otherwise you will get like me a signed by unknown authority ? Be aware that It could be possible that you will also get such an error : https://github.com/RoadieHQ/roadie-backstage-plugins/issues/1232