desec-io / desec-stack

Backbone of the deSEC Free Secure DNS Hosting Service
https://desec.io/
MIT License
372 stars 48 forks source link

API: Query for DNS record propagation #577

Open s-hamann opened 2 years ago

s-hamann commented 2 years ago

For some applications (ACME challenges, TLSA records, ...), it is interesting to know, if a record that was just added/updated/removed/... has fully propagated to the authoritative nameservers. Just querying the authoritative nameservers is not sufficient, due to the anycast network. Some frontend servers may already have the data, others may not, but I can only query those, that are closest to me.

Particularly when the network is slow to update, it would be useful to have some way to find out if all servers have/publish the same information. Since I do not believe this can be done in DNS, I suggest adding it to the API.

I don't have a clear idea, of how this should look like. It could be an additional JSON field that is returned for all RRSets. I'm not sure how to indicate the propagation status of record removal. Another option might be to provide a list of pending changes.

peterthomassen commented 2 years ago

Replication is done on a per-domain (per-zone) level, not per record set. The proper object to add this would therefore be the domain object, and it could indicate whether all secondaries have a current copy. (Or perhaps indicate the last time a current copy was seen everywhere; if this is less than last time the domain was changed, it should reflect replication freshness. Have to think about it more.)

In any case, putting this at the domain object will also cover the "RRset deleted" case.

nils-wisiol commented 2 years ago

Depending on the replication mechanism this information is very hard to obtain, and conflicts with replication plans that we recently introduced (#571). I hence suggest we close this issue as "won't fix".

That being said, desec.io (not desec-stack) could publish information on how long DNS updates (typically) take. I believe currently we have some 99% of updates done in <1min.

peterthomassen commented 2 years ago

571 doesn't contradict this. But it would indeed be problematic if the nodes we replicate to continue with additional replication to second-layer nodes whose IP addresses we don't have. To me, it's not clear yet that this will be the case in our situation, so I'd like to keep this open for now.

For context: We are planning a cooperation with pch.net who will run some nodes for us, and they'll likely do some internal replication. The question will be how we can determine when replication has finished on their side. I'd hope that there would be some way to do that, not only for the purposes of this issue, but generally -- we should have insight into that.

cluck commented 11 months ago

For me as a reader of this bug it is unclear if the requirement is:

  1. to measure the instant when all authoritative nameservers (be them reachable via anycast or unicast) start serving a specific updated record,
  2. to compute the worst-case instant that any hypothetical (caching) querier behind a caching resolver is guaranteed to receive the updated record,
  3. to compute the worst-case instant when any hypothetical (caching) querier behind a caching resolver resolving a given DNS record that tecnically depends on the updated record starts to see the effect of the updated record (e.g. think of changing the A record for www2 to which the CNAME record for www points to, or the many records including glue records involved in a zone delegation, and other records that depend upon other records, e.g. DNAME, SRV... and then some DNSSEC records),
  4. to compute the worst-case instant when any hypothetical (caching) querier behind a caching resolver resolving any of the DNS records depending on the updated record starts to see the effect of the updated record,
  5. any of the above, also watching updates to corresponding PTR records

In my experience, most of the unforeseen disruptions happen because planning should have considered at least case 3, but effectively "just" modeled with case 1. In my opinion, cases 1 and 2 are values of purely academic value that are just misleading in practice and shoulnd't be offered at all. On the other side, values for case 3+ are often underestimated by large (e.g. many records have TTLs in the range of days) and are only found out "when it's too late". Thus, I think, the calculation should actually support the prediction of changes before they're commited and receive a prominent place in the UX.

peterthomassen commented 11 months ago

For me as a reader of this bug it is unclear if the requirement is:

  1. to measure the instant when all authoritative nameservers (be them reachable via anycast or unicast) start serving a specific updated record,

This. It's important e.g. for ACME clients to know when it's safe to tell the server that the ACME challenge can be found in the DNS. There might be other reasons why a user might want to know which version of their zone is served in which region.

  1. to compute the worst-case instant that any hypothetical (caching) querier behind a caching resolver is guaranteed to receive the updated record, ...

This has / We have nothing to do with caching resolvers; authoritative DNS ends with putting the zones in place.

values for case 3+ are often underestimated by large (e.g. many records have TTLs in the range of days) and are only found out "when it's too late". Thus, I think, the calculation should actually support the prediction of changes before they're commited and receive a prominent place in the UX.

That may be a good idea, but it's a different issue.

peterthomassen commented 11 months ago

In fact, calculating 3+ is only possible based on the information about what the propagation status on authoritative servers is. So, implementing the feature discussed here is a prerequisite for your feature.

cluck commented 11 months ago

The propagation status is publicly available in DNS. The status required to calculate the delays for a planned change are in DNS before and up until the change is pushed and propagated.

The SOA record declare the timings until all potentially anycasted servers are returning coherent answers again. In case of true multi-master hosting the SOAs will have different serial numbers and then each zone must be considered in parallel (this is perfectly legal and Microsoft Active Directory integrated DNS is a prominent case).

ACME is indeed a very practical usecase, and actually the most concrete I see causing confusion very often.

Let's assume for a moment the day we're changing this record is also the day we were unhappy with our previous hoster and decided to migrate our zone to a cool new one. The aforementioned resolver (e.g. a letsencrypt verifier) has cached our old NS reconds and our A record just moments before. Now we're pushing new (delegation and) NS records and a changed A record on our new authoritative servers. The aforementioned resolver is then asked again about our A record, which it realizes has expired. It still holds valid cached NS records though, because our previous DNS hoster and the parent zone had choosen to serve them with high TTL. Thus, the resolver will start recursive resolution, but using cached records will shortcut to querying our previous providers nameservers, possibly obtaining a tecnically valid response, but not the one we expected.

Example

$ dig +trace www.desec.io A
.                       21098   IN      NS      i.root-servers.net.
.                       21098   IN      NS      c.root-servers.net.
io.                     172800  IN      NS      b0.nic.io.
io.                     172800  IN      NS      a2.nic.io.
desec.io.               3600    IN      NS      ns1.desec.io.
desec.io.               3600    IN      NS      ns2.desec.org.
desec.io.               900     IN      A       88.99.64.5
www.desec.io.           3600    IN      CNAME   desec.io.

So I get that www.desec.io resolves to 88.99.64.5, valid for 15 minutes.

Let's say, I want to migrate desec.io to another DNS hoster and update the A record.

Let's see how quick the zone can be migrated, i.e. how long the delegation records persist in caches:

dig desec.io NS @c0.nic.io.
;; AUTHORITY SECTION:
desec.io.               3600    IN      NS      ns1.desec.io.

;; ADDITIONAL SECTION:
ns1.desec.io.           3600    IN      A       45.54.76.1

That's one hour.

Now let's consider for how long desec.io servers think they should be cached:

$ dig desec.io NS @ns1.desec.io.
;; ANSWER SECTION:
desec.io.               300     IN      NS      ns1.desec.io.

;; ADDITIONAL SECTION:
ns1.desec.io.           900     IN      A       45.54.76.1

The informational NS record indicates 5 minutes (while the parent zone requires one hour authoritatively).

So, just from the zone data we would expect the update to propagate in less than 15 miutes. But in fact, by glossing over this example, a proper answer can't be less than 75 minutes (900 seconds for the A record and 3600 seconds for the NS delegation).

peterthomassen commented 11 months ago

The SOA record declare the timings until all potentially anycasted servers are returning coherent answers again.

No.

The aforementioned resolver (e.g. a letsencrypt verifier) has cached our old NS reconds and our A record

How do you know that? (I would be surprised if Let's Encrypt's challenge fetching follows standard TTL rules.)

The informational NS record indicates 5 minutes (while the parent zone requires one hour authoritatively).

The parent is not authoritative for the NS records, as indicated by the absence of the AA bit in the response:

$ dig NS desec.io. @a0.nic.io | grep "flags: "
;; flags: qr rd; QUERY: 1, ANSWER: 0, AUTHORITY: 2, ADDITIONAL: 3

Only the answer from the child has the AA bit:

$ dig NS desec.io. @ns2.desec.org | grep "flags: "
;; flags: qr aa rd; QUERY: 1, ANSWER: 2, AUTHORITY: 0, ADDITIONAL: 3