Open rothgar opened 7 years ago
I suspect we need to clarify in the Installer. If I understand correctly, you're referring to the two steps in screenshots below.
I'd describe the purpose of these values as:
Reasonable values (in order) might be:
kubernetes.example.com
- Kubernetes API will live heretectonic.example.com
- Tectonic dashboard will live here, via ingressnode1.example.com
, node2.example.com
, etc.apiserver.crt
X509v3 Subject Alternative Name:
DNS:kubernetes.example.com, DNS:kubernetes, DNS:kubernetes.default, DNS:kubernetes.default.svc, DNS:kubernetes.default.svc.cluster.local, IP Address:10.3.0.1
It sounds like a hostname was provided for Controller DNS, which gets put in the apiserver cert SANs. The workers will try to register with the controller via that same ControllerDNS (plus https) so I'm not sure where the mismatch occurs there. Are workers able to resolve that hostname? Have an example of some hostname you might use and the corresponding server entry in a worker's /etc/kubernetes/kubeconfig
?
Concerning merging the two names, they serve different purposes. This is quite unclear when creating a single controller, single worker cluster. But take the example above for instance. One would not want to register the Controller DNS entry as node1.example.com
and the Tectonic DNS entry as node2.example.com
, even if those were indeed controller and worker (though its valid). It might be more prudent to use two DNS records for that purpose. If more controllers are added, one could add A records to their static IPs behind the controller DNS name. If fanout to workers for ingress is insufficient or needs to tolerate more worker outages, there is flexibility to change those mappings.
Its on us to make the Installer instructions super clear, validate more inputs, and find ways to avoid possible mistakes. Feel free to share if you find some phrasing would have been more helpful here. More than anything, I want the Installer to educate about what its doing so its not magic so all feedback is appreciated.
Oh gosh this is long. Sorry.
You are correct in the steps that caused the problems. Fortunately I have used the scripts manually in the past so I knew what was being done but the steps were still unclear for what the certs were being used for. I assumed in the define workers step the mac address and hostname were only being used for bootcfg groups matching but forgot the kubelets manually create certs in this step.
I think most helpful of all would be moving the nodes to use the bootstraping api like kubeadm uses so certificates wouldn't need to be defined per node prior to provisioning.
I think the 2nd thing that is confusing is the fact that tectonic currently only allows 1 controller which means setting up a load balancer (with CNAME) in front of the controller wouldn't be helpful.
One thing I think could be helpful (although possibly less secure) is to enforce a FQDN in creating the kubernetes certificate but to also automatically sign the certificate with the hostname/CNAM so if the user puts k8s.example.com the cert would be signed with k8s.example.com as well as k8s automatically.
hey @rothgar - thanks for your suggestions here and filing this issue. we've noted your suggestions and I'll make sure to bring them up at our weekly planning meeting! installing HA controller nodes is on our immediate roadmap, so that should help address your second point.
Relates to #50
Issue Report Template
Tectonic Version
1.5.1
Environment
What hardware/cloud provider/hypervisor is being used with Tectonic? Bare Metal
Expected Behavior
Cert signing for API server should be less prone to errors.
Actual Behavior
It's easy to end up with an api cert not signed with proper DNS aliases
Reproduction Steps
Other Information
Users should only be prompted for machine names once to avoid this problem. When asked later for the mac address the hostname should already be filled out from the previous step.