calimero-project / calimero-core

Core library for KNX network access and management
Other
130 stars 64 forks source link

Throws `KNXTimeoutException` due to missing confirmation although the confirmation was received #120

Open dallmair opened 1 year ago

dallmair commented 1 year ago

We're not entirely sure yet and still digging through the code (both calimero and ours), but it seems there is a synchronization issue in calimero-core 2.5.1. It happens only occasionally and definitely depends on timing.

Please see attached log file for a DEBUG log in HTML format that includes all messages that led up to the problem in an instance last week. We use ManagementClientImpl over a Tunneling connection via a certified IP interface to a DIY device. After a few successful message exchanges, the log contains:

  1. 00:17:20.449 calimero sends T_ACK message to the device ✔️
  2. 00:17:20.453 calimero sends T_DATA_CONNECTED message to the device ✔️
  3. 00:17:20.470 + 00:17:20.471 calimero receives confirmation of the T_ACK message ✔️
  4. 00:17:20.514 calimero receives confirmation of the T_DATA_CONNECTED message ✔️
  5. 00:17:20.526 calimero receives T_ACK in response of message sent in step 2 ✔️
  6. 00:17:20.549 calimero receives T_DATA_CONNECTED (some response message from device) ✔️
  7. 00:17:23.549 device repeats the message from step 6 as it did not get an answer within 3 seconds ⚠️ Should not have happened, but just a consequence of the underlying issue, see next
  8. 00:17:24.467 calimero throws a KNXTimeoutException saying "no confirmation reply received for" the message sent in step 2 ❌ This is strange as it logged reception of the confirmation right there in step 4.

To sum it up, calimero writes a log entry about having successfully received the confirmation, but then throws an exception because it thinks it did not receive the confirmation.

Or are we reading the log file incorrectly, missing something else, or just don't get it...? It would be great if you could take a look and help us pinpoint the root cause. Many thanks in advance!

updater-2023-04-21_00-17-15.501-short.zip

dallmair commented 1 year ago

The only thing that looks a bit suspicious to me is the implementation of ClientConnection::doExtraBlockingModes: waitForStateChange already returns whether the state change happened or not, but doExtraBlockingModes ignores that result and compares the internalState field again. That seems odd to me, but I have no idea whether this is actually in the relevant code path for the issue reported here.

bmalinowsky commented 1 year ago

Thank you for the log file. I'll try to find out what's wrong.

I don't know how easy it is for you to run with changed source code, but setting the log output to TRACE, and adding some log output, e.g., array mismatch, would help. I think the .con is currently not accepted.