eclipse-californium / californium

CoAP/DTLS Java Implementation
https://www.eclipse.org/californium/
Other
720 stars 364 forks source link

Issue with running Contiki-NG with latest release of Leshan (2.0.0-M15) #2255

Open joakimeriksson opened 4 days ago

joakimeriksson commented 4 days ago

DTLS Client Hello issue

The Contiki-NG LwM2M with DTLS can no longer join the Leshan server. It seems to be related to DTLS implementation (Scandium). Problem occurs already at the Client hello stage.

Error description

Contiki-NG LwM2M IPSO Objects example When the client tries to join the server using DTLS there is an issue at the client hello stage:

2024-06-26 15:44:40,517 DTLSConnector        [WARN] Processing new CLIENT_HELLO from peer [[fd00:0:0:0:302:304:506:708]:5684] failed! PEER=[fd00:0:0:0:302:304:506:708]:5684 
java.lang.IllegalArgumentException: Bad arguments
    at java.base/javax.crypto.Mac.update(Mac.java:528)
    at org.eclipse.californium.scandium.dtls.ClientHello.updateForCookie(ClientHello.java:371)
    at org.eclipse.californium.scandium.CookieGenerator.generateCookie(CookieGenerator.java:175)
    at org.eclipse.californium.scandium.CookieGenerator.generateCookie(CookieGenerator.java:196)
    at org.eclipse.californium.scandium.DTLSConnector.processNewClientHello(DTLSConnector.java:2677)
    at org.eclipse.californium.scandium.DTLSConnector.access$1500(DTLSConnector.java:262)
    at org.eclipse.californium.scandium.DTLSConnector$16.run(DTLSConnector.java:2087)
    at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1144)
    at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:642)
    at java.base/java.lang.Thread.run(Thread.java:1589)

It might be related to an update in the Scandium DTLS implementation that seems to have a change in the implementation of the client hello (including that specific line).

Information ahead to speed up the processing of issues - how to repeat the problem:

Likely possible to reproduce with any client using PSK and DTLS with our settings in contiki-ng's lwm2m example. Run the leshan demo server locally then do the following in contiki-ng.

>contiki-ng/examples/lwm2m-ipso-objects
make MAKE_WITH_DTLS=1 MAKE_COAP_DTLS_WITH_PSK=1 MAKE_COAP_DTLS_WITH_CLIENT=1 MAKE_COAP_DTLS_KEYSTORE=MAKE_COAP_DTLS_KEYSTORE_SIMPLE
sudo ./build/native/example-ipso-objects.native
...

This will cause the problem.

The MAC call code update seems to have been changed by this commit and I assume that is in the current Leshan version (2.0.0-M15). https://github.com/eclipse-californium/californium/commit/332b8a37c7e509801ce180b56465b26123593f5b (but I can not spot any specific issues - just that this commit changed the code around where the error happens).

jvermillard commented 4 days ago

@joakimeriksson do you have a dump of the packet exchange, especially the CLIENT_HELLO from the client? It would be easier to reproduce

joakimeriksson commented 4 days ago

I will try to fix that later today. It is trivially "repeatable" by running the Contiki example also (with Leshan). But I will capture some pcap files for you - if that is what would be most useful?

jvermillard commented 4 days ago

Yes, the main goal is to avoid having to install/build/run Contiki and try to unit-test this bug

boaks commented 4 days ago

Thanks for reporting!

Problem occurs already at the Client hello stage.

As jvermillard already wrote, captures would be very helpful.

I guess, commit

332b8a37c 2024-02-28 Reduce cookie data for better DTLS 1.3 compatibility.

introduced that issue, but without capture, it's hard too see, why that fails.

boaks commented 4 days ago

Just to mention:

Any plans / interest to update to the current develop branch of eclipse/tinydtls?

The contiki-ng fork of tinydtls seems to be not maintained. That fork is vulnerable by a couple of CVEs, which are fixed on the Eclipse origin fork.

joakimeriksson commented 4 days ago

We are, in fact, replacing the TinyDTLS with mbedTLS at the moment, and that one triggered the problem first, so we assumed it was that - but then it was exactly the same with TinyDTLS.

boaks commented 4 days ago

In general, there are some interoperability-tests for openssl, mbedtls, gnutls and tinydtls. But it seems, that these tests didn't cover your case. Therefore a capture would be great.

boaks commented 4 days ago

Maybe I found the bug. See PR #2256 . Are you able to verify the fix on your side?

(I'm currently too busy with other stuff. I'm not sure, if I find some time before the 2. week in July to really work on this. Afterward I will add some unit tests and try to reproduce the bug and verify the fix.)

joakimeriksson commented 4 days ago

I can try to move this into the leshan demo server and see if it fix the issue.

boaks commented 4 days ago

(I adapted my prios and added some unit tests. Without the fix, the new unit test fails with the above exception, with the fix it passes the test.)

joakimeriksson commented 4 days ago

Here is a capture with the final packet here being the first DTLS package.

reading from file mycap.pcap, link-type RAW (Raw IP), snapshot length 262144
15:28:37.313836 IP6 (hlim 64, next-header ICMPv6 (58) payload length: 6) fe80::302:304:506:708 > ff02::1a: [icmp6 sum ok] ICMP6, RPL, (CLR)DODAG Information Solicitation
    0x0000:  6000 0000 0006 3a40 fe80 0000 0000 0000  `.....:@........
    0x0010:  0302 0304 0506 0708 ff02 0000 0000 0000  ................
    0x0020:  0000 0000 0000 001a 9b00 550d 0000       ..........U...
15:28:52.516646 IP6 (hlim 64, next-header ICMPv6 (58) payload length: 6) fe80::302:304:506:708 > ff02::1a: [icmp6 sum ok] ICMP6, RPL, (CLR)DODAG Information Solicitation
    0x0000:  6000 0000 0006 3a40 fe80 0000 0000 0000  `.....:@........
    0x0010:  0302 0304 0506 0708 ff02 0000 0000 0000  ................
    0x0020:  0000 0000 0000 001a 9b00 550d 0000       ..........U...
15:28:58.525507 IP6 (hlim 64, next-header UDP (17) payload length: 77) fd00::302:304:506:708.5684 > jfserver.5684: [udp sum ok] UDP, length 69
    0x0000:  6000 0000 004d 1140 fd00 0000 0000 0000  `....M.@........
    0x0010:  0302 0304 0506 0708 fd00 0000 0000 0000  ................
    0x0020:  0000 0000 0000 0001 1634 1634 004d 3287  .........4.4.M2.
    0x0030:  16fe fd00 0000 0000 0000 0000 3801 0000  ............8...
    0x0040:  2c00 0000 0000 0000 2cfe fd35 1b66 40a5  ,.......,..5.f@.
    0x0050:  d8ca 09f4 f8ce 6ab7 217d 3d0a 3b22 9425  ......j.!}=.;".%
    0x0060:  11cd daf6 1902 5fb0 85c0 8a00 0000 04c0  ......_.........
    0x0070:  a900 ff01 00                             .....

mycap.pcap.zip

Will try to build with the update DTLS and put that into leshan to confirm if it works later also!

boaks commented 4 days ago

Thanks. The ClientHello is without any Extensions. So the 2 bytes for the compression methods length instead of 1 bytes is wrong and so the calculated tail-length exceeds the message length. PR #2256 will fix it. Waiting for your feedback and then the PR will be merged soon.

joakimeriksson commented 3 days ago

Tried building on a regular linux (ubuntu / java 19) but failed for some reason on the element connector. Do you now if there are any specific requirements?


-------------------------------------------------------------------------------
Test set: org.eclipse.californium.elements.util.NamedThreadFactoryTest
-------------------------------------------------------------------------------
Tests run: 3, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 0.019 s <<< FAILURE! - in org.eclipse.californium.elements.util.NamedThreadFactoryTest
testDaemonThreadGroup(org.eclipse.californium.elements.util.NamedThreadFactoryTest)  Time elapsed: 0.006 s  <<< FAILURE!
java.lang.AssertionError: group is not destroyed
    at org.eclipse.californium.elements.util.NamedThreadFactoryTest.testDaemonThreadGroup(NamedThreadFactoryTest.java:48)
´´´
(mvn clean install)
boaks commented 3 days ago

I haven't tried java 19 so far.

joakimeriksson commented 3 days ago

Do you have a docker setup? With all the correct versions of JDK, maven, etc? Or a description of the required build-environment? (I can do a retry over the weekend...) Or if you just send me the built-files.

boaks commented 3 days ago

I will merge it and trigger a SNAPSHOT build in the afternoon.

boaks commented 3 days ago

3.13.0-SNAPSHOT including PR #2256 is available.

The failing unit test indicates, that a "daemon thread group" isn't destroyed with the last thread. Not sure, maybe intended by java 19, maybe a timing question. The test was added some years ago to test and prevent from resource/heap leaks. If you run it with java 19 and you detect such a leak, it would be very welcome, if you could report that.

My plan is still to switch to Californium 4 with java 8 as minimum requirement starting in summer. If java 19 comes with some extra requirements, I would prefer to move the support then to Californium version 4.