each / draft-aname

work on a draft to standardize ANAME/ALIAS records to allow CNAME-like records at the zone apex
7 stars 4 forks source link

Missing target records should be treated like resolution failure #54

Closed gibson042 closed 5 years ago

gibson042 commented 5 years ago

We treat missing address records (i.e. NXDOMAIN or NODATA) the same successfully resolving as a set of zero address records, and distinct from "failure" which covers error responses such as SERVFAIL or REFUSED.

This is both undesirable for customers of DNS service providers (whose active sites will occasionally be inaccessible to some clients for $SOA_MINIMUM seconds), and operationally cumbersome because resolvers are not in a good position to synthesize the necessary SOA records for NXDOMAIN NODATA responses (e.g., example.com. ANAME example.invalid. alongside example.com. A 192.0.2.1).

fanf2 commented 5 years ago

The reason for this is to behave as much like CNAME as possible. I know Oracle/Dyn does not follow CNAME semantics so closely, and there are perhaps other reasons for relaxing the "like CNAME" requirement. But I think this should be discussed on the wg list because it's important.

A resolver should not have any problem obtaining SOA records (or it can return a RFC 2308 type 3 response). And lack of address records will be a NOERROR/NODATA answer, because there's a sibling ANAME.

Habbie commented 5 years ago

This is both undesirable for customers of DNS service providers (whose active sites will occasionally be inaccessible to some clients for $SOA_MINIMUM seconds)

If they do not want that, they should not pick target names that will authoritatively have no A/AAAA records - or am I missing something?

gibson042 commented 5 years ago

Customers of DNS service providers generally do not have control over records at an ALIAS target, and if a temporary misconfiguration removes them then the difference between these two operational models is stark (accepting the empty answer means caching the problem for a usually quite long SOA MINIMUM value, whereas treating as an error means the problem is resolved as soon as the upstream misconfiguration is corrected).

Habbie commented 5 years ago

I don't think this is a problem that we can, or should, realistically guard against, but one terrible idea comes to mind. We currently have text that limits the TTL of a 'chased' record to that of the ANAME itself. We could, but I'd rather not go any stronger than MAY here, do the same for the negative TTL of a nodata response, perhaps?

gibson042 commented 5 years ago

I quite like that workaround, provided it is implemented as a cap on the SOA TTL rather than MINIMUM so DNSSEC still works (and SOA TTL reduction to cap out at MINIMUM is already specified by RFC 2308, so this is just a further limit of the same kind). I'd prefer a SHOULD, but can live with MAY if people feel strongly about preserving long-lived negative caching in such cases.

But I'd also like to see at least a MAY allowing ANAME-chasing software to use siblings in place of empty results, because I really do feel like that behavior is best for users of the DNS.

matje commented 5 years ago

Jan Včelák shared how their (NS1) ANAME-like logic works, and it is quite different than Dyn's behavior. I think we need to relax the definition on how sibling address records should be replaced.

matje commented 5 years ago

Revisiting this issue, I think in the discussion it is unclear at what moment "we treat NXDOMAIN or NODATA distinct from error responses such as SERVFAIL or REFUSED". There are multiple cases where this can happen:

  1. ANAME substitution process. In fact the latest text (the to be -04) already covers what to do with an empty response. Step 4 of the process says:

    If one ore more address records are found, replace the owner of the target address records with the owner of the ANAME record. Set the TTL to the minimum of the ANAME TTL, the TTL of each intermediate record, and the TTL of the target address records. Drop any RRSIG records.

I believe this is the desired behavior for Oracle/Dyn (correct me if I am wrong @gibson042). Note that the bold text actually makes step 3 redundant, because an error response (usually) means a response with an empty answer section.

  1. Address query resolution (i.e. a resolver requesting an A or AAAA response). If these return in an error response we treat them distinct than an empty response. An empty response means that the authoritative name server does not have an answer for this request. It would not be harmful if the resolver leaves this as is, treat this not like resolution failure. It treats the response the same as a positive response: either it forwards the (empty) response to the client, or it tries to act upon the ANAME and chase down the target address records (basically doing the substitution process). Either way, it should at this point also not try to replace a positive response with an empty response.

I hope this is claryfing and is in line what @gibson042 and @fanf2 think how things should work. If not I am happy to bring this topic to the list.

matje commented 5 years ago

Ping @gibson042 @fanf2 ?

gibson042 commented 5 years ago

I don't think I understand the difference between 1 and 2, but the current text looks good to me in that the ANAME substitution process never removes a nonempty RRSet by replacing it with nothing.

matje commented 5 years ago

1 Is the ANAME resolution process, or in other words, the procedure for looking up the target address records. This may happen at provisioning, inside the authoritative name server, or at the resolver.

2 is the query lookup process, where a resolver makes a DNS request to the authoritative server.

gibson042 commented 5 years ago

How does ANAME come into play for case 2? If the authoritative doesn't respond or responds empty, then there is no ANAME and therefore no new behavior introduced. Likewise if it responds with only address records. Only if the response includes an ANAME could the resolver do anything new, but the new behavior seems to be entirely defined by case 1 (i.e., attempt to resolve the target records, and if any are found then substitute them into the response).

matje commented 5 years ago

How does ANAME come into play for case 2? If the authoritative doesn't respond or responds empty, then there is no ANAME and therefore no new behavior introduced. Likewise if it responds with only address records. Only if the response includes an ANAME could the resolver do anything new, but the new behavior seems to be entirely defined by case 1 (i.e., attempt to resolve the target records, and if any are found then substitute them into the response).

It comes into play because the authoritative may have done an ANAME target lookup before the request came in. The following can happen:

matje commented 5 years ago

Closing issue as per discussion above, feel free to reopen if you think this is an unsolved issue.