Kamaji assumes cluster dns domain (cluster.local)

MathiasPius commented 5 months ago

When setting up my cluster I made the choice of using a non-standard DNS domain, so instead of cluster.local I opted for local.kronform.pius.dev. I've learned to live with this by manually specifying the domain override in Helm charts that require it, but the Kamaji chart provides no way of doing so.

I wouldn't mind contributing some time to fix this in the helm chart by providing a default clusterDomain helm variable and substituting this wherever needed, but it looks like the value is also hardcoded in a handful of Go modules, and I'm not entirely sure about the correct way of propagating that information down into the relevant functions.

prometherion commented 5 months ago

Hey @MathiasPius, if I understand correctly you're pointing to the cluster.local suffix used to communicate with the Datastore endpoints, isn't it?

I think we can easily solve it, wondering if you're open to work on this.

MathiasPius commented 5 months ago

That's exactly right.

I initially noticed the issue when following the Getting Started guide and having the etcd nodes fail to communicate with each other. Some troubleshooting revealed that they were attempting to communicate with the other etcd nodes on:

etcd-<ID>.etcd.kamaji-system.svc.cluster.local:2379

which of course did not work since my dns domain is local.kronform.pius.dev.

I am open to work on fixing this as far as the helm chart is concerned, but I'm not familiar enough with the code base to determine if the assumption holds elsewhere, such as in the in the /deploy directory or here: https://github.com/clastix/kamaji/blob/a849a84fd07c0b62543d59a7a818ab2bdea232ac/internal/kubeadm/configuration.go#L52-L56

I was able to deploy an etcd cluster based on a combination of the built-in etcd deployment and Bitnami's etcd helm chart, and use the rest of the Kamaji ecosystem without issue, so I assume the helm chart is the only critical component.

clastix / kamaji

Kamaji assumes cluster dns domain (cluster.local) #433