eclipse-californium / californium

CoAP/DTLS Java Implementation
https://www.eclipse.org/californium/
Other
723 stars 361 forks source link

generated mid [3303] already in use, cannot register Exchange[R14] #1964

Closed rahulagarwal02 closed 2 years ago

rahulagarwal02 commented 2 years ago

I am using Californium CoAP library as a server. Client runs a Qt CoAP client. Data rate is around 10 messages/sec Client receives messages for almost 1.5 hr after which message stops coming. Looking at logs getting below exception:

2022-03-21T11:02:28,516 WARN [CoapServer(main)#1] {BaseCoapStack.java:115} - error send response CON-2.05 MID= 3303, Token=10F6A0C91DAF, OptionSet={"Observe":18992, "Content-Format":"text/plain"}, "{"fhr1":-32768,"fhr2":-3".. 116 bytes java.lang.IllegalArgumentException: generated mid [3303] already in use, cannot register Exchange[R14] at org.eclipse.californium.core.network.InMemoryMessageExchangeStore.registerWithMessageId(InMemoryMessageExchangeStore.java:245) ~[californium-core-2.6.3.jar:?] at org.eclipse.californium.core.network.InMemoryMessageExchangeStore.registerOutboundResponse(InMemoryMessageExchangeStore.java:409) ~[californium-core-2.6.3.jar:?] at org.eclipse.californium.core.network.UdpMatcher.sendResponse(UdpMatcher.java:157) ~[californium-core-2.6.3.jar:?] at org.eclipse.californium.core.network.CoapEndpoint$OutboxImpl.sendResponse(CoapEndpoint.java:1088) ~[californium-core-2.6.3.jar:?] at org.eclipse.californium.core.network.stack.BaseCoapStack$StackBottomAdapter.sendResponse(BaseCoapStack.java:236) ~[californium-core-2.6.3.jar:?] at org.eclipse.californium.core.network.stack.ReliabilityLayer.sendResponse(ReliabilityLayer.java:153) ~[californium-core-2.6.3.jar:?] at org.eclipse.californium.core.network.stack.BlockwiseLayer.sendResponse(BlockwiseLayer.java:624) ~[californium-core-2.6.3.jar:?] at org.eclipse.californium.core.network.stack.ObserveLayer.sendResponse(ObserveLayer.java:123) ~[californium-core-2.6.3.jar:?] at org.eclipse.californium.core.network.stack.AbstractLayer.sendResponse(AbstractLayer.java:74) ~[californium-core-2.6.3.jar:?] at org.eclipse.californium.core.network.stack.ExchangeCleanupLayer.sendResponse(ExchangeCleanupLayer.java:83) ~[californium-core-2.6.3.jar:?] at org.eclipse.californium.core.network.stack.BaseCoapStack$StackTopAdapter.sendResponse(BaseCoapStack.java:194) ~[californium-core-2.6.3.jar:?] at org.eclipse.californium.core.network.stack.BaseCoapStack.sendResponse(BaseCoapStack.java:110) [californium-core-2.6.3.jar:?] at org.eclipse.californium.core.network.CoapEndpoint$9.run(CoapEndpoint.java:912) [californium-core-2.6.3.jar:?] at org.eclipse.californium.elements.util.SerialExecutor$1.run(SerialExecutor.java:289) [element-connector-2.6.3.jar:?] at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source) [?:1.8.0_242-internal] at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source) [?:1.8.0_242-internal] at java.lang.Thread.run(Unknown Source) [?:1.8.0_242-internal]

What could be the issue?

boaks commented 2 years ago

Just in the case, you're still on it. I merged PR #1986. If you update to the current master, you may need to adapt the "Californium???3.properties" and set the new

COAP.STRICT_EMPTY_MESSAGE_FORMAT=false

(or just add that line to the already create "Californium???3.properties".)

Just in the case, CON works and you not longer on it, please close the issue.

I tried to reproduce it, but at least until now, I wasn't successful. And until now, you also didn't confirm, that this occurs with the Plugtest-Server. So I'm not sure, if it is more an issue of your implementation of that resource or really a Californium issue.

rahulagarwal02 commented 2 years ago

Just in the case, you're still on it. I merged PR #1986. If you update to the current master, you may need to adapt the "Californium???3.properties" and set the new

COAP.STRICT_EMPTY_MESSAGE_FORMAT=false

(or just add that line to the already create "Californium???3.properties".)

Just in the case, CON works and you not longer on it, please close the issue.

I tried to reproduce it, but at least until now, I wasn't successful. And until now, you also didn't confirm, that this occurs with the Plugtest-Server. So I'm not sure, if it is more an issue of your implementation of that resource or really a Californium issue.

We tried with the CON request but the issue is still not resolved. We followed your instructions for plugtest server. Now able to run the plugtest server and getting data on the client side. Test is in progress, will let you know the result.

boaks commented 2 years ago

We tried with the CON request but the issue is still not resolved.

Really hard to say, what goes wrong. Looking forward to the plugtest-server result.

rahulagarwal02 commented 2 years ago

We tried with the CON request but the issue is still not resolved.

Really hard to say, what goes wrong. Looking forward to the plugtest-server result.

This issue occurs only when there are multiple clients and the clients are registering observers on multiple endpoints. With Single client or number of endpoints 3 or lesser we have not faced this issue. So need to look around what could go wrong when there are more clients and end points

boaks commented 2 years ago

That points for more into the direction of your implementation.

Let me explain: The MIDs are only scoped to one endpoint, means, if you have more clients or one, should not be the issue. But, if you start to reuse a Message (here I guess a Response), then you break Californium's API.

Really looking forward to your results. Just to mention: PR #1987 prevents now a application from resending a Message.

boaks commented 2 years ago

I released the 3.5.0 today. if you now want to test with the plugtest-server it's easier and the server itself works "out-of-the-box" for your test (OK, "close to out-of-the-box", because of the "non-compliant empty-messages" . You need to adapt your client to use "coap:///obs", that's it.

Download the server via the link above. it supports now the cli-option "--notify-interval".

java -jar cf-plugtest-server-3.5.0.jar --notify-interval=100[ms]

Stop it with CTRL-C and then edit "CaliforniumPlugtest3.properties" and adapt

# Process empty messages strictly according RFC7252, 4.1 as format error.
# Disable to ignore additional data as tokens or options.
# Default: true
COAP.STRICT_EMPTY_MESSAGE_FORMAT=true

to false.

Looking forward to your results with the plugtest-server.

rahulagarwal02 commented 2 years ago

I released the 3.5.0 today. if you now want to test with the plugtest-server it's easier and the server itself works "out-of-the-box" for your test (OK, "close to out-of-the-box", because of the "non-compliant empty-messages" . You need to adapt your client to use "coap:///obs", that's it.

Download the server via the link above. it supports now the cli-option "--notify-interval".

java -jar cf-plugtest-server-3.5.0.jar --notify-interval=100[ms]

Stop it with CTRL-C and then edit "CaliforniumPlugtest3.properties" and adapt

# Process empty messages strictly according RFC7252, 4.1 as format error.
# Disable to ignore additional data as tokens or options.
# Default: true
COAP.STRICT_EMPTY_MESSAGE_FORMAT=true

to false.

Looking forward to your results with the plugtest-server.

Ok. Will try this. We tried with the plugtest-server but issue did not occur with one end point. We are trying to increase the number of end points in plugtest-server and see if it gets reproduced.

rahulagarwal02 commented 2 years ago

Any reason to start with a NON observer request? That is responded also with a NON response, which has the same MID as the last CON notify. At least a first idea, what may cause the trouble in Californium. Maybe, until that gets analyzed (and fixed, if wrong), you can try to use a CON request to start the observation?

(Additionally, the empty ACKs are including tokens, which doesn't makes sense. Maybe worth to report that to report that there.)

I have some good news. Using CON request to start the observation fixed the issue we were observing. Last time when we tried looks like there was some problem with the setup. We tried couple of times again and it's working. Thanks a lot for your support.

boaks commented 2 years ago

Thanks for all your tests.

If you want to close this issue, please do so. If you want to continue to search for the root-cause, please try to reproduce it with the plugtest-server and NON request and CON notifies.

rahulagarwal02 commented 2 years ago

I will close this issue for now. Will let you know if we could reproduce the issue with plugtest-server.

boaks commented 2 years ago

Thanks! I appreciate your patience.

Just one remark:

The QT client uses malformed ACKs. (Or do you use an old Californium client? Years ago, Californium sends also such malformed empty ACKs.)

RFC 7252 - 4.1. Messages and Endpoints

An Empty message has the Code field set to 0.00. The Token Length field MUST be set to 0 and bytes of data MUST NOT be present after the Message ID field. If there are any bytes, they MUST be processed as a message format error.

But the QT client adds a token to a empty ACK.

wireshark-ACK

The issue with that is, that if Californium relaxes the processing, and ignore the "MUST be processed as a message format error", then someone may consider it as non-compliant. Therefore I fixed that in the last days and added a configuration (STRICT_EMPTY_MESSAGE_FORMAT) for a relaxed backwards compatibility.

Some weeks ago, when I tired to contact QT about offering a binary download for their client, I only got back the question about my customer number. If you have such a customer number, maybe you contact them and report that compliant issue.

ack.pcap.gz

rahulagarwal02 commented 2 years ago

Sure. I will report this issue to Qt. Thanks

boaks commented 2 years ago

I spent some time in a redesign of the observe server-side. I still didn't find any stuff, which would have caused your issue, except your CoapResource does something different.

See PR #2014 for the redesign. I consider to merge it next week. Currently I have no time schedules for the 3.6, maybe end of May or end of June. So, depending on your interest and time, your feedback is welcome, especially, if it breaks something.

One hint: if you want to retest your issue, you may set

# Base MID for multicast requests.
# Default: 65000
COAP.MULTICAST_BASE_MID=1000

The would cause the server to reuse the none-multicast MIDs after only 1000 messages and should therefore speed up your test.

boaks commented 2 years ago

Any news on this topic?

I scheduled release 3.6.0 for this week, Thursday. So, if there is something left from this issue, let us know.

rahulagarwal02 commented 2 years ago

Thanks for reaching out. We used CON request to start the observation which fixed the issue. After this change we did not face any issue on our device. We are sticking with this fix as of now. Haven't tried anything else.