goToMain / libosdp

Implementation of IEC 60839-11-5 OSDP (Open Supervised Device Protocol); provides a C library with support for C++, Rust and Python3
https://libosdp.sidcha.dev
Apache License 2.0
130 stars 69 forks source link

Secure Channel gets disconnected at seemingly random intervals #106

Closed tim661811 closed 1 year ago

tim661811 commented 1 year ago

Describe the bug We use the libosdp library to communicate to Peripheral Devices, so we act as the Control Panel. We use different kinds of OSDP readers (Idesco and STiD readers). The issue we're having is that the secure channel gets disconnected at seemingly random intervals (could be twice a minute, but sometimes it's also once every 15 mins). After the Secure Channel is lost, it gets rebuild within 1 second.

Expected behavior Secure connection should not be lost

Observed behavior In case of the Idesco reader, all of a sudden libosdp sends an unsecure POLL event which the PD rejects with a NAK image

The STiD behavior is slightly different, but the problem seems to occur in the same place. A secure POLL event is sent to the PD by libosdp and then rejected by the PD because the CRC is wrong image

A Comprehensive Log file OSDP sniffer output for Idesco situation: image

OSDP sniffer output for STiD situation: image CRC should've been image

sidcha commented 1 year ago

@tim661811, Thanks for creating this comprehensive bug report. Can you also provide the logs provided by libOSDP with log level set to DEBUG and packet trace enabled? I cannot make out a whole lot of details form the screenshot you've shared.

The CP would send a POLL in plain text only if has already lost secure channel with the PD. So we need to backtrack and look at why it is loosing the secure channel.

tim661811 commented 1 year ago

Hereby the requested logs, I let both readers run for a few minutes and piped the results to a file.

idesco_reader.log stid_reader.log

sidcha commented 1 year ago

Some log messages does not make sense with the control flow in code (order). Do you have some downstream modifications to the CP state machine? If you do, can you revert them and run on master with those readers and tell me if you can still reproduce this issue? (and of course if you can reproduce it on master, create log files as you have done now).

tim661811 commented 1 year ago

We did indeed alter the order of the state machine because STiD readers had an issue where the capability request could only be performed in Secure Channel mode. I reverted this change and ran libosdp again with the Idesco reader.

Resulting log: idesco_reader.log

PS: we already had the issue "SC disconnect" issue before we made the change for STiD readers.

sidcha commented 1 year ago

The important part is this:

osdp: CP: WARN : RS485-1: Response timeout for CMD(60)
osdp: CP: WARN : RS485-1: PD replied with NAK(6) for CMD(60)

I should add timestamps to these logs soon, but it appears the CP marked that PD as offline (because the PD didn't respond within the timeout) and came back after sleeping for a while, between those two log lines. When coming back from sleep, the CP starts setting up a new SC session (and rightly so) but it appears the PD hasn't decarded the previous SC session yet so it responds with NAK(6).

This is a bug in the PD firmware (please report it to the vendor) as it should discard secure channel after a period of inactivity (of course, it cannot keep the session endlessly active). To workaround this problem for the time being, you can increase the response timeout to some higher value and debug why the response from the PD is taking that long (ie., find where time is spent most).

sidcha commented 1 year ago

STiD readers had an issue where the capability request could only be performed in Secure Channel mode

This is also wrong and a catch-22. The capability report is what the CP uses to know that a PD supports secure channel; so we cannot make that request in a secure channel. Please report this one too :)

tim661811 commented 1 year ago

Okay, interesting that two different manufacturers have the same issue. I'm gonna play around with the response timeout and see if that resolves the issue for now, and then get back to the manufacturers.

The capability request was indeed flagged as an issue and has since been resolved in their newer firmware version, but at the time we couldn't wait for the fix, so we had to change the order of the libosdp state machine for now.

Thanks for the help!

MarCovy commented 1 year ago

Hello, @tim661811, what is your program/application for analyzing the OSDP protocol? The screenshots from your first post look great! Thanks for the help!

tim661811 commented 1 year ago

Hello, @tim661811, what is your program/application for analyzing the OSDP protocol? The screenshots from your first post look great! Thanks for the help!

I used a tool called OSDPPro which is a tool made by OSDP reader manufacturer STiD. Unfortunately, I don't think this is a publicly available tool.

MarCovy commented 1 year ago

@tim661811 yes, that's right, this is not public tool :)