goToMain / libosdp

Implementation of IEC 60839-11-5 OSDP (Open Supervised Device Protocol); provides a C library with support for C++, Rust and Python3
https://libosdp.sidcha.dev
Apache License 2.0
138 stars 71 forks source link

OSDP_PD_SC_TIMEOUT_MS Value? #68

Closed dman66 closed 3 years ago

dman66 commented 3 years ago

Describe the bug Programmed 2 OSDP readers on OSDP Port 1.

Readers assigned addresses 0 and 2. Reader address 0 is not connected. Reader address 2 is PD based on libosdp.

The CP is sending CMD_POLL messages every 50-70ms to address 2 and then the CP sends a CMD_ID to address 0. The timing of CMD_POLL messages when the CMD_ID message is sent is > OSDP_PD_SC_TIMEOUT_MS (400ms).

Expected behavior The CMD_POLL restarts at approximately 480ms so connection should not drop.

Observed behavior Since the delay (480ms) is more than the OSDP_PD_SC_TIMEOUT_MS we call sc_deactivate().

https://pastebin.com/mCn2gNe7

A Comprehensive Log file

Added some timestamps when we update sc_tstamp to gather data. Extended OSDP_PD_SC_TIMEOUT_MS to 1000ms for the test:

https://pastebin.com/DL4qxgtU

Where does the value of OSDP_PD_SC_TIMEOUT_MS come from? I have OSDP Spec v2.1.6 and I don't see this number.

This is probably a CP issue in that the CMD_POLL messages shouldn't stop, but can we extend the OSDP_PD_SC_TIMEOUT_MS value and not violate the spec?

Tyco iSTAR Edge G2 with firmware 6.8.5.22814

sidcha commented 3 years ago

Hi @dman66, the value of OSDP_PD_SC_TIMEOUT_MS is not defined by OSDP but is a LibOSDP set limit. It was derived as 2 times the value of OSDP_RESP_TOUT_MS which is defined by OSDP to be 200ms (section 5.7 Timing). The spec also mentions that the PD should respond within 3ms typically and if it cannot, it should send REPLY_BUSY.

Looks like you are daisy chaining both the PDs so us examine a LibOSDP CP and PD behavior (in your connection case). The CP polls the PDs once every 50ms (4 times in a 200ms window); since PD[0] is offline, it would lock that channel for 200ms before timing out. After this timeout the CP moves on to PD[2] which should have had SC active for 200+ms till now is well within the 400ms boundary to respond to a poll command so everything works as expected.

In your case looks like your CP is polling roughly at the same frequency, which means 400ms SC timeout is still a very good limit for the second PD to see a POLL command. The only scenario which can cause the problem that you are describing is when the CP waited for 400+ms (since it is 480ms when you landed on PD[2]) for the first PD to respond which is a OSDP spec violation. Can you confirm this theory?

The reason I’m reluctant to relax the time check here are:

If it is hard to communicate with Tyco support and get this sorted out, I suggest you have a downstream patch that modifies src/osdp_config.h.in to whatever value that works for this CP. Another option is to not daisy chain the PD and have separate RS485 lines for each PD so an offline PD cannot hog the bus.

dman66 commented 3 years ago

Yes, we've been reporting these issues to Tyco as they come up... but yes, so far no responses.

Question:

since PD[0] is offline, it would lock that channel for 200ms before timing out

What if you had 2 offline devices? It seems like the expectation that the CP could stop polling the other devices for 200ms would break the OSDP_PD_SC_TIMEOUT_MS timeout?

Moving each PD to its own RS485 bus is what we are presently doing, but there are only so many ports available.

sidcha commented 3 years ago

Question:

since PD[0] is offline, it would lock that channel for 200ms before timing out

What if you had 2 offline devices? It seems like the expectation that the CP could stop polling the other devices for 200ms would break the OSDP_PD_SC_TIMEOUT_MS timeout?

That is correct. We really have no option but to wait until the time out has passed as the PD is still allowed to respond within that window. And increasing the OSDP_PD_SC_TIMEOUT_MS to a higher value would result in the same failure with a higher number of PDs being offline.

What we can do is expose a new CP method which the app can use to pass a bitmask of enabled PDs that LibOSDP will refresh. This way if the app knows that a certain subset of PDs will be offline for maintenance, it can retain the remaining PDs in operational condition.

If you have any other ideas, I'm open for suggestions.

dman66 commented 3 years ago

Since currently only doing PD we made a local change to extend the timeout.

The CP support should probably have a provision to handle known offline PDs without timeouts depending on the # of PDs

sidcha commented 3 years ago

@dman66, I doubled the SC timeout after giving some thought to this.