eclipse-californium / californium

CoAP/DTLS Java Implementation
https://www.eclipse.org/californium/
Other
730 stars 367 forks source link

Malformed ClientHello message can lead to WARN logs in DTLSConnector #2271

Closed cyril2maq closed 1 month ago

cyril2maq commented 3 months ago

We encounter an issue in our environment : a (seemingly) malformed ClientHello message leads to a WARN log. Sadly, I do not have the hand on the exact message, the only clue from the trace log is that the DTLSRecord length seems low (54 bytes) with regards to other working ClientHello messages:

==[ DTLS Record ]==============================================
Content Type: Handshake (22)
Peer address: /xxxxx
Version: 254, 253
Epoch: 0
Sequence Number: 0
Length: 54 bytes
Fragment:
fragment is not decrypted yet
===============================================================

This message triggers an IllegalArgumentException in javax.crypto.Mac.update called by ClientHello.updateForCookie (l.368) called by this CookieGenerator.generateCookie (l.175) :

https://github.com/eclipse-californium/californium/blob/dbbd4bb0bbfb0e0777b0bfb70450b31feff155f0/scandium-core/src/main/java/org/eclipse/californium/scandium/CookieGenerator.java#L175

This leads to the WARN message "Processing new CLIENT_HELLO from peer [{}] failed!" in DTLSConnector class (l.2732).

I believe this IllegalArgumentException should be catched by this CookieGenerator.generateCookie method as it states it would "@throws GeneralSecurityException if the cookie cannot be computed".

This way, it would just log a DEBUG message from DTLSConnector, which seems more appropriate for this use case.

boaks commented 3 months ago

Which version are you running?

For me this looks similar to the cause of #2255, which is fixed with PR #2256 included in Californium v3.12.1.

boaks commented 3 months ago

The idea of l.2732 WARN is to detect implementation errors. Maybe this is already fixed with PR #2256. Otherwise I would try to add more logging to see, which messages are causing that.

cyril2maq commented 3 months ago

You are right, we are using 3.12.0, I did miss the #2255 issue before posting this one. I agree #2256 have high chances to solve it. I think we can close this issue and when we'll be in production with the 3.12.1 version, if the problem is still there, I will reach out. Thanks a lot.

boaks commented 2 months ago

@cyril2maq

Have you been able to test this with 3.12.1?

cyril2maq commented 2 months ago

We did deploy this version 3 weeks ago and did not get any occurence since; so as you did first analyze, it was most likely the same root cause as #2255 Thanks again.

boaks commented 2 months ago

Thanks for your feedback.