Closed LupusTheCanine closed 3 months ago
If Pkt_len is set to 0x100 (truncated to 0x00) the function goes into an infinite loop
I'm wondering how this is supposed to happen, considering that we uint8_t
as the type. That being said, we should check for 0 length packets anyway, that's right.
Regarding "Dropped start of the frame", the current implementation assumes that we don't receive such frames which would require a continuation packet where the beginning of the frame was at the end of another packet. We could however add support for these if necessary.
What bugs me however is: do we really have such jumbo frames (length = 0x1F
)? As of now the maximum frame length supported is based on the maximum buffer size, which is currently TELEMETRY_RX_PACKET_SIZE = 128
. If you want to support frames up to the full 256 bytes (254 + 2), we need to change the types from uint8_t
to uint16_t
at some other places.
Also, which kind of receiver / module supports these jumbo frames? What did you use for your tests?
As of now, and to make things more stable, I'll focus on unit tests with different types of frames / potential errors.
If Pkt_len is set to 0x100 (truncated to 0x00) the function goes into an infinite loop
I'm wondering how this is supposed to happen, considering that we
uint8_t
as the type. That being said, we should check for 0 length packets anyway, that's right.
The issue is that even though 0xFE(254) is not a valid packet length it is treated as such leading to total packet length being set to 0 ((254+2)%256). 0xFE may appear in p_buf[1] aka [len] if begining of the packet was dropped, previous packet was malformed, current packet is malformed, or non CRSF data is received on the port.
Regarding "Dropped start of the frame", the current implementation assumes that we don't receive such frames which would require a continuation packet where the beginning of the frame was at the end of another packet. We could however add support for these if necessary.
IMHO there is no guarantee that is the case so it would be best to assume that serial reads do not align with CRSF frames.
What bugs me however is: do we really have such jumbo frames (length =
0x1F
)? As of now the maximum frame length supported is based on the maximum buffer size, which is currentlyTELEMETRY_RX_PACKET_SIZE = 128
. If you want to support frames up to the full 256 bytes (254 + 2), we need to change the types fromuint8_t
touint16_t
at some other places.
I use 63 byte frames for Mavlink passthrough. As far as I know CRSF is limited to 64 byte frames (source) so I adhered to that.
Also, which kind of receiver / module supports these jumbo frames? What did you use for your tests?
I used Radiomaster Ranger Micro (ESP32) with my expansion of ExpressLRS (which now supports Mavlink in master) that supports sending mavlink frames to and from the handset through CRSF passthrough [sync/address][len][type(0xFF)][p_len][mavlink][crc].
I wouldn't call 31 bytes a jumbo frame when mavlink packets can be 280 bytes 😅. I use modified ExpressLRS (derivative of the current work to support mavlink) that sends mavlink to the handset through CRSF passthrough. I have a working script that can send and receive mavlink (right now it is "Hello world" microcontroller world aka blinking diode). I plan to add further support for mavlink in lua when I have a working proof of concept written in lua.
0xFE may appear in
p_buf[1]
aka[len]
if the beginning of the packet was dropped
Thx for the clarification.
IMHO there is no guarantee that is the case so it would be best to assume that serial reads do not align with CRSF frames.
Can be dealt with, I have no issue with that.
I use 63 byte frames for Mavlink passthrough. As far as I know CRSF is limited to 64 byte frames.
Good, so let's keep the max buffer size at 128
bytes.
I use modified ExpressLRS (derivative of the current work to support mavlink) that sends mavlink to the handset through CRSF passthrough.
Ok, cool, then let's make it work.
I would check if the new frame is guaranteed to be no larger than half the buffer length, especially when the buffer is limited to 19 bytes.
In the worst case it is possible to have a read with 63 trailing bytes followed by 128 bytes read, we should either limit read to one packet or increase buffer, I would also apply the buffer/read size logic to small packet radio since they appear to be potentially affected though FrSky appears to use some different process as I don't see the parser for it when looking for "processFrame".
ExpressLRS appears to behave well and only send chunks small enough to be handled with 128 byte buffer I don't have other hardware to test that except XJT though given how S.Port works we are unlikely to run into saturation issues with it.
If I set too serial baud to 921k I get quick trip to Emergency Mode.
@LupusTheCanine I've created #5243 with unit tests so we can verify if things are the way we think they are. Let me know if I should add more tests.
In the worst case it is possible to have a read with 63 trailing bytes followed by 128 bytes read, we should either limit read to one packet or increase buffer,
I'm really not sure I understand properly what you mean. What do you mean with "read" in this context? The serial receiver is handling this either based on IRQ (for each received byte) or DMA based, where the received payload is read directly into a circular buffer. Most radios use IRQ-based. IRQ-based receiver will stop writing into the RX buffer when full.
I would also apply the buffer/read size logic to small packet radio since they appear to be potentially affected though FrSky appears to use some different process as I don't see the parser for it when looking for "processFrame".
processFrame
API is used only for CRSF for now. We might apply the same logic to other protocols later if we see fit. The main difference is that we make use of the IDLE IRQ to trigger frame processing. The IDLE IRQ is triggered when the RX line is idle for at least a byte length (idle time >= (1 / baudrate) * 10
).
When the IDLE IRQ triggers, we schedule the processing function asynchronously so that FreeRTOS will execute that function out of the IRQ context. The latency between reception of the frame and processing is much shorter than when period polling of the buffers is used.
I'm really not sure I understand properly what you mean. What do you mean with "read" in this context? The serial receiver is handling this either based on IRQ (for each received byte) or DMA based, where the received payload is read directly into a circular buffer. Most radios use IRQ-based. IRQ-based receiver will stop writing into the RX buffer when full.
I mean the situation where we end up with a rxBuffer containing 63 bytes (worst case incomplete frame) and inconsiderate module sends 128 byte burst that we can't handle because it won't fit in the rxBuffer along with the old frame.
I extended the tests to include following situations
I also wrote convenience macros to make writing frameParser tests a bit less tedious. https://github.com/LupusTheCanine/edgetx/tree/fix-5238
length error f the frame is 65 bytes long (len=63)
Should we really consider frames of 65 bytes an issue? IMHO, we should only care about our buffer size, which is 128 bytes right now. Testing for a maximum frame length of 64 bytes would be something new that might confuse other modules (it's not just ELRS and original Crossfire using CRSF as a proto).
Regarding one of the last tests you added (badFrames
): you cannot expect to add some random bytes between 2 frames sent together at the same time and expect we could/should sort these out. What happens here is that we just throw away the rest of the buffer, in case that happens.
IMHO, it makes no sense to consider the case where a module would send a random stream of bytes sent in batches that have nothing to do with the underlying framing. That module won't behave like a TCP stack, on top of which you're sending some payload chunks. This would be much better comparable to UDP datagrams, where you have to consider the MTU to avoid causing some IP fragmentation.
Note: your tests did not compile as-is, I had to fix a few things, so I guess you did not try them out, right?
Should we really consider frames of 65 bytes an issue?
IMHO yes because it is above what CRSF allows according to the working group so it is unlikely to be a valid frame. AFAIK there are no frame format with total length larger than 64.
Regarding one of the last tests you added (badFrames): you cannot expect to add some random bytes between 2 frames sent together at the same time and expect we could/should sort these out.
These could as well be a frame that got a single bit flipped as is the case with the second invalid frame 😁 (I accidentally nailed CRC).
IMHO, it makes no sense to consider the case where a module would send a random stream of bytes sent in batches that have nothing to do with the underlying framing.
I would consider module that received a burst of data and utilizes the entire slots without attempting to align packet starts with the slot though we would have to increase buffer size to 192 to fully handle that >(63+128-3).
Note: your tests did not compile as-is, I had to fix a few things, so I guess you did not try them out, right?
I did.
My bad I didn't push the last commit.
I ended up adding some simple mechanism to skip invalid start bytes at the beginning of a frame, so that all the tests you proposed are now running more less as-is.
Also, the new version is able to process frames directly out of the RX buffer, using the telemetry buffer only for re-assembly and only if necessary. That saves a few memcpy
.
IMHO yes because it is above what CRSF allows according to the working group so it is unlikely to be a valid frame.
I'll check with the others whether or not this is really set in stone.
I checked your fix with my setup and I can't see any signs of dropped or discarded frames in the debug.
Update, few dropped frames.
Did a longer 10+ hour test, unfortunately Debug output quit (before showing any bad frames).
Another test, this time debug cooperated. 14 hours, about 400 fails to parse but no lockups
Is there an existing issue for this problem?
What part of EdgeTX is the focus of this bug?
Transmitter firmware
Current Behavior
Packets that fail address or CRC checks are not correctly discarded. pkt_len is threated as valid even though it isn't. If Pkt_len is set to 0x100 (truncated to 0x00) the function goes into an infinite loop,
Expected Behavior
Steps To Reproduce
Version
2.10.1
Transmitter
RadioMaster TX16S / TX16SMK2
Operating System (OS)
Windows
OS Version
No response
Anything else?
Issue was first observed on EdgeTX2.10 then extensive testing of the issue was done on EdgeTX2.11 with expanded tracing. Snipped Debug log with marked issue bytes EdgeTXDebugCommnet.txt