PD assumes incoming data is equal to packet size

goToMain / libosdp

Implementation of IEC 60839-11-5 OSDP (Open Supervised Device Protocol); provides a C library with support for C++, Rust and Python3

https://libosdp.sidcha.dev

Apache License 2.0

134 stars 71 forks source link

PD assumes incoming data is equal to packet size #35

Closed te-johan closed 3 years ago

te-johan commented 3 years ago

When running multiple PDs a PD have to be able to receive multiple packets without breaking. Today all incoming data is put into pd->rx_buf and osdp_phy_decode_packet assumes that only one command is put into this buffer.

When running multiple PDs a PD might receive multiple commands at a high rate. For example if there is PD A and PD B, PD A receives a command and replies to it instantly. if PD B, did not try and decode the first command before if got the reply it will get stuck waiting for more data(due to "if (pkt_len != len)" in osdp_phy.c). (With multi drop each PD will receive the replies from other PDs, which will be discarded.)

The phy should only wait for more data if it needs it (pkt_len > len) and equally important is that it should not discard any additional bytes. easiest implementation would be to separate rx_buf from the data delivery. so at the end of osdp_phy_decode_packet() the data is copied into a separate data buffer and any additional bytes in rx_buffer should be moved to the front of the buffer.

te-johan commented 3 years ago

With this commit multidrop seems more stable. https://github.com/te-johan/libosdp/commit/0d3f148db5ed32cedaa059a2e1514c8d7e95409c I only did a quick test of the PD part(the CP does probably not work). The code can be a bit confusing where rx_buf_len vs data_buf_len is zeroed at different places.

The timeout had to be changed so that it starts when the first byte is received after a command has been "handled". Also changed to the security timeout is used for PD.

sidcha commented 3 years ago

Thanks for reporting this, its a more pressing issue on the CP side. I looked at your tree, and yes, managing another buffer is a bit confusing.

I'm thinking we should have a method osdp_phy_fetch_command() that would look at the buffer and copy exactly one command (based on length in header) into rx_buf and fail on partial packets instead of copying all bytes at once. This way rest of the code can be as it is.

te-johan commented 3 years ago

That sounds like a good idea, I was thinking of something similar but was reluctant to add a ring buffer between RX and receive. But it would be the cleaner implementation. Is this an issue on the CP as well? Since the CP synchronizes with itself between each command(since it is the only master on the bus) I imagined it wasn’t an issue.

sidcha commented 3 years ago

As of now its not a problem. OSDP has the concept of broadcast messages (addressed to 0x7F). LibOSDP does not support this as it's not a very tried/tested feature (I've seen some commercial PDs that don't respond to this address). In the case of multi-drop connections, it may make sense to send one POLL multicast command instead of unicasting it individually.

te-johan commented 3 years ago

Broadcast in OSDP is quite useless. Since multiple PDs cannot reply at the same time there is a limitation to the standard.

Since each PD will respond to 0x7F, the use of the configuration address should be limited to controlled (single PD) configurations.

Not responding to 0x7f is not unreasonable. I assume broadcast is only there to be able to assign new addresses to units before they are installed.

sidcha commented 3 years ago

Broadcast in OSDP is quite useless. Since multiple PDs cannot reply at the same time there is a limitation to the standard.

Since each PD will respond to 0x7F, the use of the configuration address should be limited to controlled (single PD) configurations.

Not responding to 0x7f is not unreasonable. I assume broadcast is only there to be able to assign new addresses to units before they are installed.

Ahh, I see. Come think of it, multicast cannot work when secure channel is active too. It's a shame, multicast POLL would have been a real performance boost.

te-johan commented 3 years ago

Adding a function called osdp_phy_fetch_command and not modifying the rx_buf handling will probably break the timeout that is used on to determine that no valid command will be received. Moving channel.send and receive into the phy might and handling this issue there might be good. Also makes sense from a stack perspective. With a osdp_phy_fetch_command() function the phy would have to handle the start of frame timeout. If you have an alternative solution for this issues I can do some verification on it since I have a multi PD setup running.

sidcha commented 3 years ago

So.. I found a neat way to achieve what we discussed earlier (long back). I do not have a multi-drop setup to to test this thoroughly but I made sure I didn't break any of my test beds.

If you do find some issue in your setup with this commit, send PR or re-open this issue :-)

te-johan commented 3 years ago

Thank you for looking into this. I'm not working with the OSDP project and don't know when I'll upgrade the software next time. However it would be really good for you to have a test setup where you can test multiple PDs. I would recommend that you have a test application that runs two instances of a PD client in the same Linux application. Maybe even run them in two different threads so that the processing are not syncronized. Have one channel and split all incoming data into both instances.

I realized that once nice way to solve it is by actually limiting how much data is read using the installed receive function. So first only read the header and when the size is known only one command would be read from the phy.