Closed barrettMCW closed 7 months ago
The change is fine. But the root cause that the service can only be accessed by FQDN may be caused by K8s DNS function. Do you have kube-dns enabled? I wonder what is the content of the pod's /etc/resolv.conf? It should have a few search domains to allow service discovery.
Thanks! /etc/resolv.conf:
search xnat-system.svc.cluster.local svc.cluster.local cluster.local
nameserver 10.43.0.10
options ndots:5
I'm running coredns with a pretty generic config Corefile:
.:53 {
errors
health {
lameduck 5s
}
ready
kubernetes cluster.local cluster.local in-addr.arpa ip6.arpa {
pods insecure
fallthrough in-addr.arpa ip6.arpa
ttl 30
}
prometheus 0.0.0.0:9153
# resolve *.mcw.edu with our institution's DNS
hosts custom.hosts dev.lavlab.mcw.edu {
# host file for dev stuff
fallthrough
}
forward . /etc/resolv.conf
cache 30
loop
reload
loadbalance
}
The config looks fine. What version of the K8s is it? Maybe the existing coredns pods malfunctioning? Do you see similar behaviour on other services that are not created by XNAT chart?
Kubernetes Version: v1.27.9+rke2r1 Restarting the pods doesn't change behavior This behavior is consistent throughout the cluster, prior to looking into this chart I was under the impression the namespaced method is the only way to reach services in kubernetes.
Please note the cluster domain cluster.local
is common and the default, however when a cluster is configured this is a configurable option and not all clusters are guaranteed to use this domain.
Particular attention should also be paid to this doc. https://kubernetes.io/docs/concepts/services-networking/dns-pod-service/. The original reason for not placing the FQDN into the internal service reference was to use Kubernetes DNS expected resolution process to avoid conflict with non standard deployments.
Recommendations. 1. Use Kubernetes Kustomise overlay to force this change for your specific deployment. 2. There could be argument to add an override configuration variable for the database connection string for instances where the database configuration is highly optimised such as 1x write many reads cluster.
Thanks @dean-taylor ! took some time to learn a bit about dns in kubernetes. I'll modify our DNS to handle this. I appreciate the work y'all do!
I think this is related to #107 he seems to have run into the same issue as me, but I think he didn't quite get what the external name was doing. I think it's an RKE2 thing, but I cannot reach a service in my cluster with just the service name, I need the namespaced service domain name like so:
{{service}}.{{namespace}}.svc(.cluster.local)
This commit simply changes xnat's postgres configuration to use the fqdn of the external name service:{{service}}.{{namespace}}.svc.cluster.local
Should be completely compatible with existing deployments. Thanks!