SerialDispatcherP can lose packets, loses double buffering, and doesn't implement FIFO correctly

GoogleCodeExporter commented 8 years ago


SerialDispatcherP implements a double buffer receive queue for incoming serial 
packets.  This double buffer queue is intended to operate in a FIFO manner.

SerialDispatcherP is implemented as two major parts.  The first part operates 
at the async (interrupt) level and is responsible buffer allocation, byte 
reception, buffer completion.  The second part is responsible for packet 
delivery and operates at the task level.

Two buffer slots are provided and this implements the double buffering.  One 
buffer can be in delivery while another buffer is receiving an incoming packet. 
 Under heavy loads both buffers will be in delivery and new incoming data 
should be discarded.

Buffers handed off to the task layer for delivery should be handled in a 
strictly FIFO order.

The current implemenation of SerialDispatcherP (rev 5204) only provides a 
single set of state information and can only convey information about one 
buffer slot to the delivery task.

What steps will reproduce the problem?
1. Single step the code and set up a scenerio where the 2nd buffer is signalled 
complete prior to the receiveTask running (key variable is receiveTaskPending). 

This problem exists because there isn't sufficient state to properly  run the 2 
slot FIFO queue in strict FIFO order while maintaining proper mutual exclusion.

This problem is a race condition.  If the receiveTask executes (and clears 
receiveTaskPending) prior to a new packet being signalled as available then no 
problem occurs.

What is the expected output? What do you see instead?

1st packet arrives and completion is signalled to SerialDispatcherP.   Buffer 
is handed off to receiveTask and receiveTaskPending set TRUE.  This opens the 
window.    Assume that prior to delivery being complete by receiveTask 
(clearing of receiveTaskPending), another packet is received and signalled.

Following problems occur:

1) Since there is only one set of state vars for handing information to 
receiveTask, meta info about the new packet is lost (type, which buffer, etc).

2) The packet in the 2nd buffer is never processed (hence lost).  The 
receiveTask has no way of knowing that another buffer is available.

3) The buffer becomes unavailable for any future packets.  It is never freed 
again and the system drops to one available buffer.

What version of the product are you using? On what operating system?

TinyOS-main SVN rev 5204 (hinrg git: c143598), 10/15/2010

Please provide any additional information below.

Original issue reported on code.google.com by cire...@gmail.com on 18 Oct 2010 at 12:21

GoogleCodeExporter commented 8 years ago

Hi! Can you please check if this reimplementation of the serial stack works for 
your load?

http://szte-wsn.cvs.sourceforge.net/viewvc/szte-wsn/tinyos/tos/lib/FastSerial/

(You do not need the Atm128UartP.nc and Msp430UartP.nc files, those will just 
make stuff a little faster by eliminating multi byte sends).

Miklos

Original comment by mmar...@gmail.com on 18 Oct 2010 at 2:54

GoogleCodeExporter commented 8 years ago

I've looked at that code and am unclear of how to use it to replace 
SerialDispatcherP/SerialP/HdlcTranslate.  Not use instead of but to replace the 
functionality of to be backward compatible with existing code.

It is structurally very different than what SerialDispatcherP is trying to do.

Rather I'm fixing SerialP and SerialDispatcherP and have proposed code which I 
will be posting shortly.  In fact I'd be pleased if you'd review and test it.

Original comment by cire...@gmail.com on 18 Oct 2010 at 4:11

GoogleCodeExporter commented 8 years ago

You just have to copy all files there to a directory and include that in your 
makefile. It replaces the SerialDispatcher code and gives some completely 
different implementation below that.

Anyways, if you can fix the current stack with little changes, then it is fine.

Best,
Miklos

Original comment by mmar...@gmail.com on 18 Oct 2010 at 8:45

GoogleCodeExporter commented 8 years ago

Attached is proposed code to fix the SerialDispatcherP problems listed above.

This code fully implements a two slot FIFO protected buffering system.  It 
includes appropriate state variables to insure that packet are received and 
delivered appropriately.

This code has been extensively  unit tested using GDB/JTAG and a msp430F2618 
mote.  All identified corner cases have been examined by working through the 
code while single stepping.

This code should be used in conjuction with SerialP corrections contained with 
Issue 1.

Original comment by cire...@gmail.com on 20 Oct 2010 at 10:08

Attachments:

SerialDispatcherP.nc

GoogleCodeExporter commented 8 years ago

[deleted comment]

GoogleCodeExporter commented 8 years ago

TOS Serial code review
Michael Schippling
Oct 26, 2010

As per instruction from Eric Decker I have examined the Receive thread through
these two files which are candidates for checkin to the TOS2 project. The "new" 
versions were on the serial_rx_fix branch of the tinyos-2.x tree and were 
compared
to the "old" versions which are on the main branch: 
    SerialP_nc_new.txt
    SerialDispatcherP_nc_new.txt

1) Presuming that we _are_ doing Software Engineering this code should
reference the appropriate TEP 113 document trail which provides a
Functional Specification. The two files in question implement the
Protocol and Dispatcher components of the standard serial stack for
TOS2 as described in the TEP. The associated Requirements, Design
and Test Plan documents are well enough hidden that I can't find them.

2) In the absence of Design documentation I would like to see comments
in the source files describing the state machines and transitions
related to the state variables:
    rslot_t.rslot_state, rx_state, and sendState in SerialDispatcherP
  and
    rxState, txState, and tx_buf_t.state in SerialP
I believe this would make it easier for those who come after us to
understand the code's function.

3) Just a comment on the partitioning of the higher level components...
Since the message_t headers are protocol dependent and are accessed
at multiple levels, would it have been "better" to indicate the header
type in the overall message protocol byte -- labeled "Protocol byte"
in section 3.6 of the TEP -- and handle it all in SerialP, rather
than splitting the reformatting between SerialP and SerialDispatcher?
I think this would also make swapping around from radio to serial
message paths somewhat easier.

4) in SerialP, general RX buffer management:
a) I think you can eliminate the +1 from the char buffer in rx_buf_t
 at line 133, and change the tests in push and pop from > to ==
 at lines 311 and 323.
b) rx_buffer_is_full() and rx_buffer_is_empty() are not used.
c) One might be able to say the same thing about the TX buffers.

5) In SerialP rx_state_machine(...) starting at line 433:
a) I am not comfortable with having multiple return paths from the
 switch(rxState){}, especially because the normal "break" path is
 for errors and the "all done" path is handled with a goto. Adding
 an error-state local variable might appease me...
b) I got lost at line 487:
     if (rxByteCnt < 2 || rx_current_crc() != rxCRC)
 because I didn't initially understand that one needs to remain two
 bytes behind in order to calculate the CRC correctly. A comment to
 this effect and a define like CRC_BYTE_LENGTH in place of the 2 here
 and on line 487 would make the whole 2 byte buffering thing much
 clearer.
c) I believe the rxInit() at line 492 could be moved to the "done:"
 section of the function, making the call at line 518 redundant.
 This would make it clearer that the state machine is indeed reset
 and ready to roll again.
d) The whole "offPending" thing at line 522 looks fishy, but I guess
 it's needed to be able to shutdown cleanly. I had trouble following
 the logic, so maybe another state-transition comment is in order.
e) And possibly a comment at line 529 pertaining to how multiple SYNC
 delimiters between messages are optional. This is covered in the
 HDLC RFC1662 document but it's a long trail to follow...

6) In SerialDispatcherP in general:
a) This comment at line 132 seems dis-ingenuous in light of there
 being three levels of function calls and buffer manipulations per
 byte received:
   * ...to minimize the overhead of structure dereference...
c) I have no idea what this COUNT_NOK... syntax at lines
 136 and 137 does:
   uint8_t* COUNT_NOK(sizeof(message_t)) rx_buffer;
d) The ReceiveBytePacket event functions could be coalesced
 into a single state machine function with multiple calls as
 done in  SerialP circa line 411. This would keep all the
 transitions in one place.

7) in SerialDispatcherP rxTask() starting at line 270:
a) Is it guaranteed OK to do a return from an atomic{} block?
b) You may be able to put everything but the check for FULL at
 line 278 outside the atomic. Once it's established that the buffer
 is FULL no one else should be touching it, no?
c) Since only this function manipulates C_rx_slot, I think line 306
 which re-gets the rs slot pointer is redundant.
d) At line 304:
    msg   = signal Receive.receive[proto](msg, msg, size);
 I presume that the receive() is supposed to return the msg buffer
 that it got -- as with T1's radio.receive(). If it doesn't you end
 up over-writing your rs->msg buffer at line 307 with what may be
 garbage. Doing it this way allows the receive() user some control
 over buffering but there could be consequences.
e) Again I think the only thing that really needs to be in the atomic
 block is the check for FREE at line 313.

8) in SerialDispatcherP ReceiveBytePacket.startPacket() starting at
line 330:
a) All the dispatch_err_* variables are not ever used or reset...
b) Line 346:
    rx_index = 0;
 is redundant because rx_index is reset later in byteReceived()
 at line 360.
c) Some more careful analysis (than I'm capable of) of what needs to
 be in the atomic could be done.

9) in SerialDispatcherP ReceiveBytePacket.byteReceived(...) starting
at line 353:
a) Again having returns in the switch makes me nervous.
c) And again, some more careful analysis (than I'm capable of) of what
 needs to be in the atomic could be done.

10) in SerialDispatcherP ReceiveBytePacket.endPacket(...) starting
at line 400...see #9, E.g., all the P_rx_slot manipulation happens
in this function and I think overlapping references are prevented
by the lower level access pattern.

Original comment by schippli...@gmail.com on 27 Oct 2010 at 6:47

GoogleCodeExporter commented 8 years ago

This fix has one of the same concerns as those in issue 1: use of enums waste 
memory. 

This seems like a pretty subtle bug. The current implementation is such that if 
you receive a second packet before the first is signaled, you discard it. The 
assumption in the code is that you want to keep this buffer free in case a new 
packet comes in. Clearly a better way would be to discard only on handling 
startPacket for a new, third buffer. But this raises a more fundamental 
question: should the queue be FIFO, drop-tail, or should it prefer newer 
packets?

Original comment by philip.l...@gmail.com on 27 Oct 2010 at 9:56

GoogleCodeExporter commented 8 years ago

[deleted comment]

GoogleCodeExporter commented 8 years ago

[deleted comment]

GoogleCodeExporter commented 8 years ago

SerialDispatcherP implements a double buffer receive queue that implements a 
FIFO queue.  The premise of the bug report states that both buffers are full 
and that is what precipitates the problem.

However, the current implementation never queues up two buffers rendering the 
premise invalid.

The proposed code does implement a two element FIFO that can be easily extended 
to multiple slots.  With two elements this implements the discard on 
startPacket for the third to be received mentioned above.

Original comment by cire...@gmail.com on 28 Oct 2010 at 9:03

GoogleCodeExporter commented 8 years ago

Eric -- your most recent email suggests this is isn't a problem. Can you verify?

Original comment by philip.l...@gmail.com on 13 Nov 2010 at 1:13

Changed state: Started

GoogleCodeExporter commented 8 years ago

This issue can be closed.   The analysis of the problem involved a premise that 
isn't true:

"This problem will only cause a failure iff there are two buffers at the 
SerialDispatcherP layer waiting to be delivered."

SerialDispatcherP does not allow for two packets to be pending at the same time 
rendering the analysis moot.

Since two buffers can't ever be pending, there is no need for state information 
about both buffers.

Original comment by cire...@gmail.com on 13 Nov 2010 at 1:44

GoogleCodeExporter commented 8 years ago

Original comment by philip.l...@gmail.com on 13 Nov 2010 at 2:22

Changed state: Invalid

135u5 / tinyos-main

SerialDispatcherP can lose packets, loses double buffering, and doesn't implement FIFO correctly #2