DTLS client hello is not understood if it's fragmented

leduyquang753 commented 1 year ago

Summary

If the DTLS client hello message is fragmented, Mbed TLS fails to process the message and proceed with the handshake.

System information

Mbed TLS version: 3.4.0 Operating system and version: Windows 11 Configuration: Default Compiler and options: Visual studio 2022 17.5.5 – MSVC 19.35.32217.1

Expected behavior

Mbed TLS is able to reassemble the client hello message.

Actual behavior

Mbed TLS treats each client hello message fragment as one separate, complete client hello fragment and finds them to be "bad client hello message"s.

Steps to reproduce

Run the sample dtls_server program.
Use the s_client command from OpenSSL to try to connect to the server: openssl s_client -connect 127.0.0.1:4433 -dtls1_2. This creates a fragmented client hello message.

gilles-peskine-arm commented 1 year ago

See also the previous discussion about fragmentation support in https://github.com/Mbed-TLS/mbedtls/issues/1840. I'm not sure what the current situation is exactly, but I do remember that since 2018 we've only implemented fragmentation partially.

@tom-cosgrove-arm I'd classify this as [enhancement], not [bug], since this is a feature we don't fully support. Arguably it's [bug] we don't seem to clearly document it though.

tom-cosgrove-arm commented 1 year ago

@gilles-peskine-arm This is an area of the code I don't know at all, but when I looked at it there appeared to be (at least some) code handling fragmentation. A not-fully-implemented feature is, IMO, a bug :) (And #1840 was almost 5 years ago!)

gilles-peskine-arm commented 1 year ago

Well, it's not a bug in the sense that the feature is supposed to be implemented, but is not working in some scenarios. Nobody ever wrote (or more accurately merged) code to support fragmentation when receiving a handshake message (I think that's the part that's not implemented — I was on the crypto subteam when it happened and didn't follow this very closely, @mpg was on the TLS team and likely knows better).

mpg commented 1 year ago

Fragmentation is supposed to be supported with DTLS. @gilles-peskine-arm I think you might be confusing with TLS where we indeed have not even attempted to receiving fragmented messages so far. For DTLS receiving fragmented messages was a requirement from the start.

There is probably something special about ClientHello, I'd have to dig in the code to remember. I can have a look tomorrow.

mpg commented 1 year ago

Ok, I think there are two things here.

First, unlike other functions that parse a handshake message, ssl_parse_client_hello() does not call mbedtls_ssl_read_record() (which unlike the name suggests, also takes care of handshake reassembly) but instead does its own parsing. This is for historical reasons: it use to be needed to support parsing of SSLv2 ClientHello (which were useful to parse even if we never supported SSLv2 as some clients that supported better were still sending v2 ClientHellos for some time), which fortunately was removed in Mbed TLS 3.0 (that was the option MBEDTLS_SSL_SRV_SUPPORT_SSLV2_CLIENT_HELLO).

With this option gone, I think we can and should use mbedtls_ssl_read_record() in ssl_parse_client_hello() - actually I had created an issue about it two years ago: https://github.com/Mbed-TLS/mbedtls/issues/4224

The second point however might be deeper: we don't want to make DoS attacks significantly easier. I'm not saying this is the case, merely that we need to think about it. There's something special about a DTLS ClientHello: the client hasn't demonstrated reachability yet. As a result (a) a well-behaved server should not send back lots of data, or it can be use as a DoS amplifier - that's why RFC 6347 has a cookie mechanism, and (b) the server itself doesn't want to allocated per-client resources too early in order to avoid being DoSed itself - that's why the cookie mechanism was made stateless.

Now if we start supporting reassembly of ClientHello, that means allocating and keeping state on the server when we receive the first fragment, until the next fragments have arrived, or some timer expires. Previously we were only keeping state once the client had demonstrated reachability. Doesn't that make the server more vulnerable to DoS attacks?

Steps to reproduce

1. Run the sample `dtls_server` program.

2. Use the `s_client` command from OpenSSL to try to connect to the server: `openssl s_client -connect 127.0.0.1:4433 -dtls1_2`. This creates a fragmented client hello message.

Thanks for sharing reproduction information. However, I'm a bit surprised: we do have handshakes with openssl s_client in our test suites (look for DTLS.*openssl cli in tests/ssl-opt.sh) and they do succeed. What version of OpenSSL are you using, on what platform? Any non-default configuration or option?

leduyquang753 commented 1 year ago

What version of OpenSSL are you using, on what platform? Any non-default configuration or option?

I'm using OpenSSL 3.1.0 on Windows with default configurations.

mpg commented 1 year ago

I installed OpenSSL 3.1.0 on my linux machine and running openssl s_client -connect 127.0.0.1:4433 -dtls1_2 against dtls_server works like a charm. The initial ClientHello is 247 bytes and the second one (after HelloVerifyRequest) is 279. These look small enough to be unlikely to be fragmented in any environment.

Can you make a trace of the connection using Wireshark (or similar) and share the .pcapng file? Here's mine. openssl-3.1-dtls-1.2.pcapng.zip

Alternatively, instead of dtls_server can you use ssl_server2 dtls=1 debug_level=5 and share the full output?

leduyquang753 commented 1 year ago

Alternatively, instead of dtls_server can you use ssl_server2 dtls=1 debug_level=5 and share the full output?

output.txt

mpg commented 1 year ago

Thanks! However, I'm not seeing anything here suggesting that the handshake failed due to the ClientHello being fragmented.

At a high level, what I see is that the server receives a ClientHello, processes it successfully, decides to send a HelloVerifyRequest in response, and then closes the connection as it should. What should then happen is that the client sends a second ClientHello (echoing the cookie from the HelloVerifyRequest), which will be seen as a new connection by the server, and this time the handshake should succeed. However in this log it looks like the client never sends that second ClientHello. (Unless this is not the full log.)

Looking in more detail at the ClientHello that was sent, the handshake header is 01 00 00 b4 00 00 00 00 00 00 00 b4 meaning:

content type 0x01 (ClientHello)
message length 0xb4
message sequence 0x00
fragment offset 0x00
fragment length 0xb4 Since message length == fragement length (both 0xb4) this confirms the ClientHello was not fragmented.

What makes you think the handshake failures you are observing have to do with the ClientHello being fragmented?

leduyquang753 commented 1 year ago

Perhaps I pressed Ctrl+C too soon (I pressed so as to not get too much output due to reconnection attempts). Here is the output with the issue: output.txt

mpg commented 1 year ago

Thank you!

This time indeed I can confirm that the second ClientHello (with the cookie) is fragmented: its header is 01 00 00 d4 00 01 00 00 00 00 00 cb so we have fragment length = 0xcb < 0xd4 = message length. So indeed what makes the handshake fail is that we can't handle fragmented ClientHellos.

I'm still quite surprised that OpenSSL feels the need to fragment this ClientHello: it's only 212 bytes, so including handshake header (12 bytes) and record header (13 bytes), that would be a total UDP load of 237 bytes, which looks small enough not to need fragmentation. And indeed in my test with OpenSSL 3.1.0 on Linux, 237 bytes are sent in a single UDP datagram rather than fragmented.

To be clear: I acknowledge that this is a bug, I'm just trying to understand why it starts manifesting now (since we had this limitation from the very beginning), and under what circumstances we run into it. This might affect how we prioritize it.

mpg commented 1 year ago

One thought: it might be that the first time OpenSSL sends its ClientHello-with-cookie, the packet gets lost somehow, and then it starts fragmenting when re-transmitting. This wouldn't show in the server's output, so a wireshark/tcpdump/similar trace would help clarify is that's the case or not.

leduyquang753 commented 1 year ago

Here is the Wireshark packet capture: Packets.zip

mpg commented 1 year ago

Thank you!

One thought: it might be that the first time OpenSSL sends its ClientHello-with-cookie, the packet gets lost somehow, and then it starts fragmenting when re-transmitting. This wouldn't show in the server's output, so a wireshark/tcpdump/similar trace would help clarify is that's the case or not.

So, the capture rules that out. The ClientHello with the cookie is fragmented the first time it is send (212 = 203 + 9 bytes). (Then one second later is it restransmitted, fragemented the same way, but that's just because our server didn't respond with a ServerHello, so that's quite normal.)

I don't have access to a Window machine myself, but a colleague of mine (thank you @minosgalanakis !) tried reproducing the issue, but couldn't: the handshake just works (no ClientHello is fragmented).

mpg commented 1 year ago

Note: the wireshark trace also shows that the first fragment (203 bytes) results in an UDP datagram of 236 bytes and an IPv4 packets of 256 bytes. Since this is a nice round number, perhaps it's the limit your version of OpenSSL is trying to fit in, but it's unclear why that happens on your machine and not mine or my colleague's.

gilles-peskine-arm commented 1 year ago

The network settings might influence how OpenSSL breaks outgoing packets, especially with UDP. Maybe a low MTU setting, either in a configuration or from some form of automatic discovery?

wpitcairn commented 3 weeks ago

Was there any resolution for this hello with cookie fragmentation? I am seeing the exact same fragmentation with the command: openssl s_client -dtls1_2 -connect 192.168.0.129:10001 connecting to a port of the dtls_server running on the esp idf envonment. The server is running fine but the mbedtls_ssl_handshake report 0x7300 which is MBEDTLS_ERR_SSL_DECODE_ERROR. My wireshak capture show the same sequence.

What was the best way to cater for any potential fragmentation?

Mbed-TLS / mbedtls