coreos / etcd-operator

etcd operator creates/configures/manages etcd clusters atop Kubernetes
https://coreos.com/blog/introducing-the-etcd-operator.html
Apache License 2.0
1.75k stars 741 forks source link

Allow specifying Cluster Domain for clientURLs #2082

Closed mikkeloscar closed 5 years ago

mikkeloscar commented 5 years ago

This allows setting the cluster domain as a flag on the operator. The cluster domain is used as a suffix in the Client URLs for the etcd members.

The ability to set a custom cluster domain is desirable when running in clusters with a custom DNS configuration.

In our case we need this because we have decided to default the DNS config to use ndots:2 instead of the default ndots:5 setting which can result in up to 10 additional DNS queries for every single name lookup, which causes an unnecessary overhead for the DNS infrastructure and for applications.

Ref: https://github.com/kubernetes/dns/issues/159#issuecomment-336449931

An alternative way to fix this would be to allow specifying the DNS config of the pod spec: https://kubernetes.io/docs/concepts/services-networking/dns-pod-service/#pod-s-dns-config

etcd-bot commented 5 years ago

Can one of the admins verify this patch?

etcd-bot commented 5 years ago

Can one of the admins verify this patch?

etcd-bot commented 5 years ago

Can one of the admins verify this patch?

hasbro17 commented 5 years ago

@etcd-bot ok to test

mikkeloscar commented 5 years ago

Any update on this?

hasbro17 commented 5 years ago

@mikkeloscar Sorry for the delay on this. Appending the ClusterDomain seems okay to me, however I'm wondering how this is fixed via the DNS config?

Also I'm considering if we should pass in the ClusterDomain suffix as a flag to the operator, vs specifying it as a field in the PodPolicy of the EtcdCluster CR i.e spec.pod.clusterDomainSuffix?

Currently the restore-operator also creates a new seed member for an existing EtcdCluster. And with this change it won't be setting the member's ClusterDomain suffix: https://github.com/coreos/etcd-operator/blob/master/pkg/controller/restore-operator/sync.go#L214-L223

So instead of passing in another flag to the restore-operator we can set it once in the EtcdCluster PodPolicy and use it across the two operators. WDYT?

mikkeloscar commented 5 years ago

@hasbro17 thanks for the feedback

@mikkeloscar Sorry for the delay on this. Appending the ClusterDomain seems okay to me, however I'm wondering how this is fixed via the DNS config?

In our setup we default to ndots:2 but if the user specifies another DNS config on the pod then we respect that, so you would be able to "opt-out" by specifying:

dnsConfig:
    options:
      - name: ndots
        value: "5"

Obviously you wouldn't benefit from reducing the number of useless DNS queries, which is why I think setting the suffix is better.

So instead of passing in another flag to the restore-operator we can set it once in the EtcdCluster PodPolicy and use it across the two operators. WDYT?

My motivation for setting this as a flag is that you would usually want to have the same clusterDomain for all etcd clusters within a single Kubernetes cluster, so it's simpler to set it centrally rather than configure it for each etcd cluster.

If you prefer the podPolicy solution then I could add it there instead. Otherwise I would add the flag to the restore-operator as well.

hasbro17 commented 5 years ago

@mikkeloscar Thanks for the explanation.

I would still lean towards setting the ClusterDomain in the EtcdCluster's PodPolicy just so the restore operator can reuse the existing EtcdCluster's setting. Plus making it part of the API would more clearly document this setting for each cluster.

PS can you also please add a line to the CHANGELOG to mention the new API.

mikkeloscar commented 5 years ago

@hasbro17 ok, I changed it now to use a ClusterDomain field on the PodPolicy.

mikkeloscar commented 5 years ago

@hasbro17 added the nil checks!

mikkeloscar commented 5 years ago

Can you help me understand why jenkins-ci is failing?

I can build it locally:

./hack/build/build
building operator...
building backup-operator...
building restore-operator...
hasbro17 commented 5 years ago

@etcd-bot retest this please

mikkeloscar commented 5 years ago

@hasbro17 thanks!

mikkeloscar commented 5 years ago

Friendly ping! :) If you need anything from me let me know!

hasbro17 commented 5 years ago

@mikkeloscar Just waiting on @hexfusion but I think we'll be able to merge this later today. Apologies for the delay but can you please rebase this again?

mikkeloscar commented 5 years ago

Apologies for the delay but can you please rebase this again?

Done!

hexfusion commented 5 years ago

@mikkeloscar one last thing can we make this a single commit something like below.

*: Allow specifying Cluster Domain for clientURLs

After this is done we can merge right away.

mikkeloscar commented 5 years ago

There you go!

aermakov-zalando commented 5 years ago

@hexfusion Is there an ETA on when this would make it into a release?