in0rdr commented 2 months ago

etcdctl can be run with the --insecure-skip-tls-verify to skip tls verification of the etcd endpoint.

This is useful in some deployments, for instance, when the etcd cluster is external to Kubernetes and the Kubernetes endpoint name (e.g., etcd.kube-system.svc.cluster.local) does not match the names in the certificates of the external etcd cluster.

Checklist

[x] This PR contains a description of the changes I'm making
[x] I updated the version in Chart.yaml
- bumped the minor version
[x] I updated the changelog with an artifacthub.io/changes annotation in Chart.yaml, check the example in the documentation.
[x] I updated applicable README.md files using pre-commit run
[x] I documented any high-level concepts I'm introducing in docs/
[x] CI is currently green and this is ready for review
[x] I am ready to test changes after they are applied and released

4censord commented 2 months ago

Wouldn't it be better to just specify the correct hostname? Or is that not an option in your case?

in0rdr commented 2 months ago

Wouldn't it be better to just specify the correct hostname? Or is that not an option in your case?

Of course would be better, but the nodes are external to the cluster and these DNS names are not resolvable from within the Kubernetes cluster. The cluster internal etcd service in the kube-system namespace resolves these external endpoints though.

4censord commented 2 months ago

Hmm, how does the kubelet resolve/verify etdc then? Or does the kubelet just not verify as well?

tongpu commented 2 months ago

This is useful in some deployments, for instance, when the etcd cluster is external to Kubernetes and the Kubernetes endpoint name (e.g., etcd.kube-system.svc.cluster.local) does not match the names in the certificates of the external etcd cluster.

AFAIK only the Kubernetes API server talks directly to etcd, so only they would need to have the correct hostname configured.

eyenx commented 2 months ago

Hmm, how does the kubelet resolve/verify etdc then? Or does the kubelet just not verify as well?

The IPs are hardcoded in the Endpoints CR. The TLS skip-verify is only needed cause Nutanix's Implementation of ETCD Cluster uses a certificate that is only valid for

hostname (not resolvable within cluster)
hostname.cluster.local (no idea why that would help)
*.cluster.local (does not help as well)

The only way I see not having to set TLS SKIP Verify is to make sure the hostnames are resolvable within the cluster. Or using directly IPs (which are in the SAN of the cert, as we are using them for grabbing etcd metrics for prometheus with insecureSkipVerify: false, but iwth serverName: etcd.cluster.local)

hairmare commented 2 months ago

The TLS skip-verify is only needed cause Nutanix's Implementation of ETCD

Should we also reflect this issue in upstream Nutanix so it gets unborked at some point?

4censord commented 2 months ago

ETCD Cluster uses a certificate that is only valid for

*.cluster.local (does not help as well)

Now I'm even more confused, because shouldn't that be valid for the internal service etcd.kube-system.svc.cluster.local?

tongpu commented 2 months ago

ETCD Cluster uses a certificate that is only valid for

*.cluster.local (does not help as well)

Now I'm even more confused, because shouldn't that be valid for the internal service etcd.kube-system.svc.cluster.local?

No, because the wildcard is only valid for a single subdomain, but not for a subdomain of a subdomain (of a subdomain), as is the case with service.namespace.svc.cluster.local.

4censord commented 2 months ago

oh, so for TLS certs it differs from how wildcards work for DNS. Because with dns, *.cluster.local would also resolve to service.namespace.svc.cluster.local I did not know that.

eyenx commented 2 months ago

The TLS skip-verify is only needed cause Nutanix's Implementation of ETCD

Should we also reflect this issue in upstream Nutanix so it gets unborked at some point?

I'll discuss this with the customer tomorrow.

eyenx commented 2 months ago

This needs now a rebase

in0rdr commented 2 months ago

rebased, I think I almost need to bump to 1.3.0 now

in0rdr commented 2 months ago

@eyenx you only keep the latest artifacthub annotation right?

eyenx commented 2 months ago

yes, bump to 1.3.0 and just add the your change to artifacthub

adfinis / helm-charts

feat(kubernetes-etcd-backup): skip tls verify #1292

Checklist