Australian-Imaging-Service / charts

Apache License 2.0
3 stars 18 forks source link

use postgres service fqdn #141

Closed barrettMCW closed 7 months ago

barrettMCW commented 7 months ago

I think this is related to #107 he seems to have run into the same issue as me, but I think he didn't quite get what the external name was doing. I think it's an RKE2 thing, but I cannot reach a service in my cluster with just the service name, I need the namespaced service domain name like so: {{service}}.{{namespace}}.svc(.cluster.local) This commit simply changes xnat's postgres configuration to use the fqdn of the external name service: {{service}}.{{namespace}}.svc.cluster.local Should be completely compatible with existing deployments. Thanks!

fxusyd commented 7 months ago

The change is fine. But the root cause that the service can only be accessed by FQDN may be caused by K8s DNS function. Do you have kube-dns enabled? I wonder what is the content of the pod's /etc/resolv.conf? It should have a few search domains to allow service discovery.

barrettMCW commented 7 months ago

Thanks! /etc/resolv.conf:

search xnat-system.svc.cluster.local svc.cluster.local cluster.local
nameserver 10.43.0.10
options ndots:5

I'm running coredns with a pretty generic config Corefile:

.:53 {

    errors 

    health  {

        lameduck 5s

    }

    ready 

    kubernetes   cluster.local  cluster.local in-addr.arpa ip6.arpa {

        pods insecure

        fallthrough in-addr.arpa ip6.arpa

        ttl 30

    }

    prometheus   0.0.0.0:9153

    # resolve *.mcw.edu with our institution's DNS

    hosts custom.hosts dev.lavlab.mcw.edu {

      # host file for dev stuff

      fallthrough

    }

    forward   . /etc/resolv.conf

    cache   30

    loop 

    reload 

    loadbalance

}
fxusyd commented 7 months ago

The config looks fine. What version of the K8s is it? Maybe the existing coredns pods malfunctioning? Do you see similar behaviour on other services that are not created by XNAT chart?

barrettMCW commented 7 months ago

Kubernetes Version: v1.27.9+rke2r1 Restarting the pods doesn't change behavior This behavior is consistent throughout the cluster, prior to looking into this chart I was under the impression the namespaced method is the only way to reach services in kubernetes.

dean-taylor commented 7 months ago

Please note the cluster domain cluster.local is common and the default, however when a cluster is configured this is a configurable option and not all clusters are guaranteed to use this domain.

dean-taylor commented 7 months ago

Particular attention should also be paid to this doc. https://kubernetes.io/docs/concepts/services-networking/dns-pod-service/. The original reason for not placing the FQDN into the internal service reference was to use Kubernetes DNS expected resolution process to avoid conflict with non standard deployments.

dean-taylor commented 7 months ago

Recommendations. 1. Use Kubernetes Kustomise overlay to force this change for your specific deployment. 2. There could be argument to add an override configuration variable for the database connection string for instances where the database configuration is highly optimised such as 1x write many reads cluster.

barrettMCW commented 7 months ago

Thanks @dean-taylor ! took some time to learn a bit about dns in kubernetes. I'll modify our DNS to handle this. I appreciate the work y'all do!