alex-nicat / ietf-dprive-phase2-requirements

Repo to work on the DPRIVE phase 2 requirements draft
0 stars 3 forks source link

Input from Brian Dickson #11

Closed jlivingood closed 4 years ago

jlivingood commented 4 years ago

Simple notion up front: add an EDNS flag or OPT code to signal ADoT support (or a bit field of all the supported transport types) on a given IP (analogous to the RA flag), and possibly include the canonical name(s) of the server at that IP, if the query include EDNS. This limits downgrades to on-path adversaries who block port 853 and modify port 53 traffic. Maybe SIG(0) or TSIG can further prevent downgrades by on-path attackers who would modify UDP/TCP port 53?

Certificates for DoT and DoH should really be DANE/TLSA entries. We're literally talking DNS here, so it would be particularly shortsighted not to standardize on this. It also potentially decouples from any hard dependency on particular CA support matrix, notwithstanding that the authoritative server might be using a CA-issued certificate. Having the ability to validate without needing to unilaterally trust some particular set of CAs is a feature not a bug.

State exhaustion protection mechanism, optionally use TCP cookies during handshake, regardless of TLS or not. This limits the TCP impact of needing established connections to do the TLS stuff to legitimate TCP connections only. Servers can limit the number of connections doing TLS handshakes, and prioritize those. The, use of aggressive timers on those prioritized connections protects against deliberate DOS via "slow transaction" attacks. Handshake-complete connections can be resumed, so making them lower priority and having aggressive connection dropping on those has comparatively low impact (since session resumption is cheaper than redoing the handshake). Experimentation is needed to establish recommendations on all the appropriate values (timeouts, number of sessions to support in each state, etc.)

The rest of the ideas mostly relate to the question "when is using DoT really optional, vs strongly desirable?".

I have a few suggestions along that line, as well as a few additional observations.

First, there is a strong correlation between the first time any client queries a given name, the visibility of the corresponding R2A query (presuming no ADoT), and the subsequent client data traffic (even with HTTPS). The R2A (recursive to authoritative) query is a strong indicator that the name isn't cached (and may never have been cached). This correlation may not be that important per se, but identifying it as somewhat distinctive means there's a reason to potentially look at the implications of that (from the perspective of ADoT being "strongly desirable").

In contrast, very popular domain names (and qtypes) are likely to be heavily served from the cache. If cache refresh (what was once identified in the HAMMER draft) is done for popular cache entries, there would no longer be any identifiable query triggering the R2A lookup, and in particular no timing side channel for identifying even a single DNS client.

What I'm suggesting is, using ADoT predominantly (or even exclusively) for "cold cache" lookups, and doing "cache refresh" lookups without ADoT, is one way to minimize the TLS overhead, which incurs a cost on the authority server (mostly CPU) and a penalty on the recursive (mostly latency, and maybe some CPU). Refresh without TLS would have lower latency, IMHO.

IMHO, the same rationale for "cold cache" also applies to "ECS" (EDNS client subnet). First instance (of a particular ECS parameter value), use ADoT; refresh, ADoT is optional.

The other question (observation?) is that ADoT is a poor substitute for DNSSEC. I.e. unsigned zones might seem like attractive reasons to force DoT for transport, but doing so would basically amount to "security theater". It would also discourage, rather than encourage, use of DNSSEC validation. It would be justifiable for an authority operator to respond "REFUSED" for ADoT queries doing "refresh cache" queries on unsigned domains, at least during periods of heavy load. Or maybe some new error code or EDE to signal "retry on non-TLS connections".

It remains an open (and perhaps interesting) question of the relative overhead of DNSSEC validation when compared with TLS transport. It might be interesting to compare those based on crypto algorithms (for negotiated TLS connections, and for DNSSEC zones), and based on various TTL values.

Having validation of DNSSEC signed zones, on popular domains being refreshed without ADoT, would potentially be the "sweet spot" for security and privacy. Security, by having cryptographically signed zone data, and privacy by the aforementioned "cold cache" and "cache refresh" distinction.

This might be where the EDNS signaling available directly between the recursive and authoritative, would allow the authoritative server to make the informed decision to reply with UDP instead of over the ADoT connection, if/when it experiences significantly increased load. The recursive could always try to use ADoT, but get signal(s) from the authority server to limit ADoT to "cold cache" queries only in response to load problems.

jlivingood commented 4 years ago

Will add to issues for IETF-106 discussion