Closed GoogleCodeExporter closed 8 years ago
Attached is the proposed fix.
This fix closes all holes that could cause erroneous endPacket signals from
being generated. startPacket is only issued on the transition to SEQNO and
endPacket is only signalled if the state machine is in either SEQNO or INFO.
This code has been tested in its corner cases using gdb/jtag on a msp430f2618
based mote.
SerialP needs to be used with the modified SerialDispatchP that is part of
Issue 2. Both files need to be replaced to fix bugs in both that can cause
packet lossage.
Original comment by cire...@gmail.com
on 20 Oct 2010 at 10:02
Attachments:
[deleted comment]
Generally, it's bad to use enums as a type (e.g. rxstate_t). The C compiler
defaults enums to ints; if the int is 16 bits, you waste a byte. These are the
kinds of little stylistic changes, which, when mixed in with functional
changes, lead code to bitrot.
Same as the reversal of the RXSTATE_NOSYNC case. Is there a functional problem
here, or a stylistic one? The difference seems to be return vs. break. Why
reverse the statement? Was there a problem?
The RXSTATE_PROTO has a similar reversal, although now it looks functionally
different. Generally, the entire receive switch has been rewritten. It's not
clear which of the changes are intended to fix bugs 1 and 2, and which are just
stylistic. I'd feel much more comfortable if we had a version which fixes the
bugs, then one which stylistically refactors the code. In the former, it's
possible to through thought experiments and testing to determine if the bug is
fixed. In the latter, it's possible through thought experiments and testing to
determine if the code is equivalent. Since the two are mixed, it's very hard to
do either.
Why is #2 a bug? Is the desired protocol that endPacket is only paired on a
successful startPacket? The text above seems to suggest that this should be the
case, as do the interface comments. But #2 suggests there shouldn't be. Which
is right?
#1 seems like a definite bug, which requires only a tiny change (the case in
RXSTATE_PROTO not going to nosync but rather resetting the state machine). Why
is this not sufficient?
Original comment by philip.l...@gmail.com
on 27 Oct 2010 at 9:50
With respect to the use of ENUMs:
It is not stylistic sugar but rather an issue of trade offs.
On the MSP430 the toolchain allocates the native size which is 16 bits. The
trade off is code vs. ram.
If an ENUM is used it uses an extra byte of RAM but generates better code.
Using an uint8_t on the other hand saves the byte of RAM but everywhere the
object is referenced extra code is generated that maps the 8 bit datum to the
underlying machine size. (in this case 16 bits).
So the question is where does one want to burn the memory, code or data? 1
extra byte of RAM vs. multiple code bytes.
An associated impact is how does the object interact with symbolic debuggers
and debugging this code. The majority of time getting code working properly is
debugging and symbolic debuggers greatly aid in that. Using enums works better
with the debugger and is symbolic while uint8_t is just a number.
Original comment by cire...@gmail.com
on 28 Oct 2010 at 5:03
Same as the reversal of the RXSTATE_NOSYNC case. Is there a functional problem
here, or a stylistic one? The difference seems to be return vs. break. Why
reverse the statement? Was there a problem?
----------------
The rx_state_machine was rewritten to make it simpler and easier to understand.
Being easy to understand and clean to look at makes it much more likely that
the state machine will do the right thing. If it is easy to look at and
understand it should be easy to see its correctness. That was the thinking
anyway.
States that are complete immediately return as opposed to breaking and then
flowing through the end of the code either via the break or a goto. States
that needed to abort would do so via a break and the abort code immediately
followed the switch.
This seems to be much cleaner, easier to understand. Okay subjective.
The reversal of the RXSTATE_NOSYNC case is stylistic. Minimize indent coupled
with early processing if possible.
Original comment by cire...@gmail.com
on 28 Oct 2010 at 5:47
1) Invalid serial protocol. The first byte following the HDLC start of packet
delimiter is the low level serial protocol. Anything other than
SERIAL_PROTO_PACKET_ACK will cause an endPACKET(FAIL) to be signalled.
---
#1 seems like a definite bug, which requires only a tiny change (the case in
RXSTATE_PROTO not going to nosync but rather resetting the state machine). Why
is this not sufficient?
+++++++++++++++
The intent of the label "nosync:" and the goto nosync is to reset the state
machine. The label "nosync" is a place in the code that forces the rx state
machine to the NOSYNC state while also resetting whatever is necessary to fully
reset the machine. Counters and other state variables. It also tells the
underlying layer to reset.
So the intent is indeed to reset the state machine in a well defined common
manner.
One could set the rx_state to RXSTATE_NOSYNC but this special cases things
without clearing out underlying state variables. Is that what you meant by
"resetting the state machine"?
Seemed like having a consistent mechanism for "processing complete" and for
"abort" was a good idea.
+++++++++++++++
Due to problems with SerialP's implementation, solitary endPackets can be
generated. This can occur in the following cases:
2) If startPacket() (the upper layer signal) returns FALSE, endPACKET(FAIL) is
signalled.
---
Why is #2 a bug? Is the desired protocol that endPacket is only paired on a
successful startPacket? The text above seems to suggest that this should be the
case, as do the interface comments. But #2 suggests there shouldn't be. Which
is right?
+++++++++++++++
Yes, endPacket should only be allowed to follow an startPacket that returns
SUCCESS. #2 violates this and is a bug. The endPacket(FAIL) is signalled
and is a bug.
Premise is startPacket returns FALSE, ==> upper layer buffer is locked. Then
SerialP signals endPacket(FAIL), this causes the current buffer to be unlocked
and made available for a future receive.
Because this buffer was originally locked, it was handed off to the
receiveTask. The unlock violates mutual exclusion of the packet data.
That is why #2 is a bug. Is there some interaction with SerialDispatcherP such
that the premise would be invalid?
Original comment by cire...@gmail.com
on 28 Oct 2010 at 6:31
[deleted comment]
Can we close on this? Eric, your most recent emails suggest that fixing the
start/end problem is sufficient.
Original comment by philip.l...@gmail.com
on 13 Nov 2010 at 1:12
This issue can be closed. The analysis of the problem involved a premise that
isn't true:
"This problem will only cause a failure iff there are two buffers at the
SerialDispatcherP layer waiting to be delivered."
SerialDispatcherP does not allow for two packets to be pending at the same time
rendering the analysis moot.
Original comment by cire...@gmail.com
on 13 Nov 2010 at 1:42
Original comment by philip.l...@gmail.com
on 13 Nov 2010 at 2:22
Original issue reported on code.google.com by
cire...@gmail.com
on 17 Oct 2010 at 11:43