Closed furusawata closed 1 year ago
That's certainly unexpected.
Can you provide some additional information: What can2040 revision are you using? What was the rp2040 system clock speed? In the picture above, it seems like the spurious transmit is 6 passive bits after the ack - when this error occurs does it seem to occur with that same timing? When the error occurs does it seem that there is always just one spurious dominant bit (not multiple dominant bits)?
Also, can you confirm that the application using can2040 is only calling the library functions from a single rp2040 core and that the functions are not being invoked "reentrant"?
-Kevin
Before the additional information, write the following information: I also confirmed the phenomenon with Klipper. The hardware is Raspberry Pi Pico and MAX3051.
I am running two commands.
cangen can0 -g 10 -e -I 00EF0102 -L 8 -D 0000000000000000
Device Response [EID: 00EF0201 DLC: 8 Data: 00 00 00 00 70 70 00 00]
cangen can0 -g 11 -e -I 00EA0102 -L 3 -D 2EFF00
Device Response [EID: 00FF2E01 DLC: 8 Data: 00 00 00 00 00 00 00 80]
Log
0002000278 EF0102 E [8] 00 00 00 00 00 00 00 00 ........
0002000279 EF0201 E [8] 00 00 00 00 70 70 00 00 ....pp..
0002000286 EA0102 E [3] 2E FF 00 ...
0002000287 FF2E01 E [8] 00 00 00 00 00 00 00 80 ........
0002000289 EF0102 E [8] 00 00 00 00 00 00 00 00 ........
0002000289 EF0201 E [8] 00 00 00 00 70 70 00 00 ....pp..
0002000297 EA0102 E [3] 2E FF 00 ...
0002000298 FF2E01 E [8] 00 00 00 00 00 00 00 80 ........
0002000299 EF0102 E [8] 00 00 00 00 00 00 00 00 ........
0002000299 EF0201 E [8] 00 00 00 00 70 70 00 00 ....pp..
0002000309 EA0102 E [3] 2E FF 00 ...
0002000309 EF0102 E [8] 00 00 00 00 00 00 00 00 ........
0002000310 FF2E01 E [8] 00 00 00 00 00 00 00 80 ........
0002000310 EF0201 E [8] 00 00 00 00 70 70 00 00 ....pp..
Trigger on error
Expand
Expand more
In the picture above, it seems like the spurious transmit is 6 passive bits after the ack - when this error occurs does it seem to occur with that same timing?
Yes, it's always the same timing.
When the error occurs does it seem that there is always just one spurious dominant bit (not multiple dominant bits)?
Yes, always once.
The change in "Issues #23" was irrelevant.
thank you.
Okay, thanks. I reviewed the code and did not find anything suspicious. I'll try to reproduce the issue locally.
-Kevin
I have been able to reproduce this issue locally.
I used your cangen
commands: cangen can1 -g 11 -e -I 00EA0102 -L 3 -D 2EFF00
(rp2040 on can1) and cangen can0 -g 10 -e -I 00EF0102 -L 8 -D 0000000000000000
(stm32f072 on can0). Errors also reported by stm32f072 (running candlelight) as seen with candump -t z -Ddex can0,#FFFFFFFF
:
(015.186853) can0 TX - - 00EF0102 [8] 00 00 00 00 00 00 00 00
(015.196975) can0 TX - - 00EF0102 [8] 00 00 00 00 00 00 00 00
(015.197102) can0 RX - - 2000000C [8] 00 00 10 00 00 00 00 00 ERRORFRAME
controller-problem{}
protocol-violation{{tx-recessive-bit-error}{}}
(015.198474) can0 RX - - 00EA0102 [3] 2E FF 00
In a ~15 minute capture I got 27 errors. I was able to capture about 10 of those on my logic analyzer. All occurrences of error seem to be when can2040 acks a packet and has a packet ready for transmit. Spurious transmission always seems to be one dominant bit in the 7th slot of EOF sub-frame.
Not sure root cause at this point. I will try to track it down.
-Kevin
This appears to be a logic error in the "report" state machine handling. The following change fixes the issue for me:
--- a/src/can2040.c
+++ b/src/can2040.c
@@ -817,6 +817,9 @@ report_note_parse_error(struct can2040 *cd)
static void
report_line_ackdone(struct can2040 *cd)
{
+ // An ackdone event indicates an ack is present
+ if (cd->report_state == RS_NEED_RX_ACK)
+ cd->report_state |= RS_NEED_EOF_FLAG;
// Setup "matched" irq for fast rx callbacks
uint32_t bits = (cd->parse_crc_bits << 8) | 0x7f;
pio_match_check(cd, pio_match_calc_key(bits, cd->parse_crc_pos + 8));
Without the above change, it is possible that a call to can2040_transmit()
that occurs during an rx message, after an ack is generated, but before the 6th eof bit is received, could lead to incorrect behavior in report_line_txpending()
. Depending on the timing of the can2040_transmit()
it could result in a race condition in the PIO. Advancing the state in report_line_ackdone()
should avoid the issue.
-Kevin
I fixed it and confirmed that the problem does not occur. Thanks for the great software and the quick fix.
Thanks for testing. After further thought, I believe the change at #31 is a better fix.
-Kevin
I've merged in the fix contained in PR #31. (And I also tagged that code as "v1.4.1".)
Thanks again for reporting and testing. -Kevin
Hello. I have a phenomenon that I don't understand. After can2040 is in receiving state and returns Ack to other device's transmission packet, Ack-like waveform appears again.
Waveform during normal and error. (250kbps / 2 nodes)
This phenomenon occurs once every 700000~1000000 packets. What could be the cause?