cloudfoundry-attic / bosh-notes

Collection of proposals for BOSH
Apache License 2.0
51 stars 23 forks source link

Feature-request: DNS management via CPI #44

Open evanfarrar opened 6 years ago

evanfarrar commented 6 years ago

Typically DNS for BOSH deployed software is seen as a pre-install concern, but I would like to propose that it be something that can be implemented in each CPI. CFAR and CFCR both have some implicit requirements for configuring external DNS to meet their conventions (CFAR has always had this need, CFCR has just added this pre-requisite).

AWS, GCP, Azure, and Openstack all have some form of DNSaaS, so this is nearly a universal Cloud resource. Even for users of vSphere or in regions which do not support DNS (China, GovCloud), there could be significant benefit just to know explicitly what are the requirements around DNS in the sample deployment manifest for a specific SemVer of a BOSH release rather than correlate documentation and software to infer this information.

If this were available to be configured via manifests and cloud configs, then we also could begin to implement BOSH releases which are much more dependent on runtime modification of DNS records. For example, Let's Encrypt's wildcard certs only work with DNS based validation, where a DNS entry must be made containing a challenge response. This challenge must be renewed every three weeks, so doing this during bootstrapping is of limited benefit.

Some counterpoints / risks I see:

cppforlife commented 6 years ago

@evanfarrar i was suspecting that if we venture into LB mgmt territory we would have to do something about DNS. could you describe a bit more about different requirements. given your experiences with bbl would you mind creating a PR into this repo that describes some of the different DNS impl details per IaaS.

evanfarrar commented 6 years ago

Definitely, I can flesh this out more, but some brief thoughts:

  1. I've mainly dealt with three public IaaSes so far (AWS, Azure, GCP). The domain terminology seems fairly settled: you can create a zone, which gets assigned to a DNS server(s) which are returned in the response to creating a zone. Given an instantiated zone, you may create entries for the zone. Usually the entries accept every field that a DNS packet does: domain, type, value, and TTL. IaaSes also offer shortcuts which set the value dynamically to IaaS entities in the same vendor: VM ID and LB ID. However, VMs and LBs can also have static IPs in the three IaaSes so utilizing the ID based shortcut is not strictly necessary (but cheaper).
  2. DNSaaS for public IaaS intentionally load balance the DNS servers they return after a zone is created: they have n DNS servers and they only put your DNS zone on n choose 3 of them. For this reason, it would be essential to know the identity of a zone and be careful about recreating it if the "manifest" or whatever were re-entered. Accidentally destroying and recreating a zone means that your entries get "balanced" to new DNS servers, which means that the parent domain may be delegating NS records to servers that do not have entries for your zone.
  3. Most IaaSes do not complain if you set up identical zones, so it is very hard to tell which one is actually propagating to root DNS servers. Furthermore, the API facilities for searching zones and entries that exist are rather limited. We are currently trying to implement features like "set up a zone with X as your api.cf.X will be set up in it" and it is hard in every IaaS.
  4. For the above two reasons I suggest: leave zones out of scope. Make users set up the zone, check that it is propagating, and then identify it somehow to BOSH. BOSH just creates entries in that zone, which will be much simpler to troubleshoot and lower risk than accidentally breaking propagation on their production zone. Similarly, we can advise users to only give IAM permissions for entries and not zones.
  5. Setting up DNS is hard, dangerous, and infrequent. A validly propagating zone is a valuable thing, so Platform Engineers are more unwilling to move their DNS zones or delegate them between IaaSes. For this reason, DNS would be much be more likely than anything else to encourage Platform Engineers to be "multi cpi" and it would not be proper to assume that a BOSH deployment on one IaaS would have DNSaaS in the same IaaS
evanfarrar commented 6 years ago

Counter-point to my own suggestion: maybe there are just too many vendors in this space to hope to support them all. Look at how many DNS vendors caddy supports: https://caddyserver.com/docs/automatic-https#enabling-the-dns-challenge