IDEA: MTU handling - Githubissues

bemasc commented 1 year ago

I think we can use "authenticated rejection" (#6) to solve the MTU problem:

Each network chooses a "minimum intended end-to-end MTU" (minMTU), which must be at least 1280.
Consider two ASes, A and B. Suppose A is about to initiate an IKEv2 handshake to B (A is client, B is server).
First, A checks its Path MTU to B. If this is not known "a priori", A MUST measure it, and MAY do so using PMTUD on the path from ACS_A to ACS_B. We call this value MTU[A->B].
If MTU[A->B] - 24 >= minMTU[A], A can offer RISAV to B in transport mode. If MTU[A->B] - 73 >= minMTU[A], it can also offer tunnel mode.
When B receives an IKEv2 CHILD_SA_INIT message, it performs the same check from its perspective, using MTU[B->A]. It accepts only SAs that are compatible with minMTU[B], and rejects the others. If none are allowable, they are all rejected.
The inter-AS MTU value is sent to all ASBRs as part of the SA, and used as the initial PMTU estimate.
If an ASBR receives an ICMP Packet Too Big response whose echoed (i.e. inner) header matches the SA, it adjusts (i.e. reduces) its local MTU estimate.
If an ASBR's local MTU falls below minMTU, it should inform the ACS. The ACS can then reduce its estimate of MTU[X->Y] and use IKEv2 to terminate this SA if that value is too small.
(OPTIONAL?) The ASBR applies TCP MSS clamping based on the current estimate inner MTU.

In transport mode, ICMP Packet Too Big responses are forwarded through the ASBR, stripping the echoed RISAV-AH header. This allows PMTUD to work as usual for the endpoints.

In tunnel mode, each ASBR maintains a PMTU estimate for each SA, which is initialized to the MTU[X->Y] value used during the handshake. Packets exceeding this size are dropped and produce a Packet Too Big response from the ASBR. If the ASBR receives a Packet Too Big response for its own IPsec packets, it reduces its local MTU estimate for this SA. (This can happen if the initial MTU estimate is wrong for this path, or the path MTU changes.) ASBRs MAY also run their own PMTUD for their SAs.

This arrangement ensures that AS pairs with a consistent inter-AS MTU never reduce the end-to-end MTU below the value that is intended by either AS. If the MTU is variable or heterogeneous, this arrangement ensures that PMTUD continues to work correctly for endpoints. If the MTU on any path falls below the required minimum, RISAV will be disabled within ~1 second. The addition of MSS clamping ensures that non-PMTUD-capable TCP clients don't attempt to use large packets that will not work.

My biggest concern about this approach is that it enables some downgrade attacks:

A. A transit provider could simply reduce the actual MTU to 1280 in order to cause RISAV to be disabled automatically. B. An off-path attacker could send a Packet Too Big response that contains a packet from the other AS.

Attack A is probably acceptable for now (i.e. out of scope). Attack B is more concerning, and I'm not sure how to mitigate it.

BasilGuo commented 1 year ago

Sorry, but the RFC 7383 has discussed this fragmentation problem in IKEv2. This PMTUD should start after IKE_SA_INIT as it says in Section 2.5.2, not CHILD_SA_INIT which I didn't find in IKEv2 protocol | RFC 7296.

It says in RFC 7383 that in most cases, only the IKE_AUTH phase needs the PMTU probe. And shall we distinguish IPv4 and IPv6 for the MTU? The limitation of 1280 B is the minimum MTU for IPv6. For IPv4, the minMTU is 576 B described in Section 2.5.1 | RFC 7383.

BasilGuo commented 1 year ago

I also find there is an active draft for IKEv2 MTU dectection.

Well in the data plane, we don't modify the AH/ESP header if we replace RISAV-AH with the standard AH (#24), so we may use the PMTUD directly or the traditional configurations for IPsec. In other words, I think RISAV is still a standard IPsec in the data plane.

In transport mode, if we still use the original AH and AH's ICV field to carry the packet tag, no more fields are added.
In tunnel mode, the ESP tunnel format is also the same as the original ESP.

And here is what Cisco recommends for the MTU configurations of GRE + IPsec in IPv4 fragmentation, which are some best current practices I think. pmtud-ipfrag.

bemasc commented 1 year ago

And shall we distinguish IPv4 and IPv6 for the MTU?

Yes, I was just using IPv6 numbers for simplicity.

RFC 7383 has discussed this fragmentation problem in IKEv2

Interesting, thanks. We should definitely consider RFC 7383 Section 2.5.2 when writing about PMTU discovery. However, I think the requirements here are different, because I am proposing to use the resulting MTU estimate on the data plane. For example, RFC 7383 recommends using very approximate PMTUD, because only a small amount of data is transferred, but in this case the discovered MTU will apply to a large amount of data.

I think RISAV is still a standard IPsec in the data plane.

Yes, this MTU logic doesn't alter the ordinary operation of the data plane (although the proposed ICMP rewriting for AH is novel).

And here is what Cisco recommends for the MTU

Thanks, that reminds me about TCP MSS. I've edited the proposal to include MSS clamping.

bemasc commented 1 year ago

I mentioned earlier that Packet Too Big (PTB) ICMP messages are easily forged by an off-path attacker. During normal PMTUD, PTB forgery is prevented by the entropy of the original packet, which must be echoed in the response. If the PTB message doesn't match a packet that was recently sent, it is ignored. The ASBR cannot apply this defense, because it has no memory of the packet that was sent (which may not have even passed through this ASBR, if ECMP clustering or multicore implementation is in use).

Maybe a better solution is to say: if an ASBR receives a PTB response indicating a PMTU that is less than the current MTU estimate, it performs its own explicit PMTUD. This is not vulnerable to off-path attacks. However, it is somewhat unusual: we want the "inter-AS" PMTU, without regard to the packet handling inside the target AS. Therefore, the measurement proceeds as follows:

The sending ASBR runs a traceroute to the target IP listed in the PTB response. This traceroute starts at TTL=1 and stops once it receives a response from an IP address inside the target AS. This is presumed to be the receiving ASBR.
The sending ASBR performs PMTUD to this receiving ASBR

This PMTU value is the one used to update the current PMTU estimate, etc.

BasilGuo commented 1 year ago

When I send a traceroute packet, I may get a private IP address. I try to traceroute google.com with a online mtr, and the result shows that the first 4 response IPs are all private IPs. Of course, private IP must be used inside the local network. I don't know if it is a particular case in CERNET or just all the same in most ISPs' networks. Maybe I should also compile a mtr in my local machine :joy:.

bemasc commented 1 year ago

In this case, the sending ASBR is using a public source IP address, so the entire traceroute will return public IP addresses.

bemasc / risav

IDEA: MTU handling #25