KevinOConnor / can2040

Software CAN bus implementation for rp2040 micro-controllers
GNU General Public License v3.0
636 stars 63 forks source link

Can transmission timing wrong at some speeds #23

Closed apple2ms closed 1 year ago

apple2ms commented 1 year ago

I initially discovered this by using the arduino wrapper library but then confirmed the same with this library. This is a transmission at 500k: image Basically it needs to be around 1-2% faster and currently fails to be read by other node. I tried to compensate for this by changing the clock speed parameter and it does work but also leads to incoming messages not received. I understand that my pico clock speed could be off but definitely it looks like transmissions are slower than they have to be. Is there a way to fine tune the timing independently for transmit and receive?

KevinOConnor commented 1 year ago

The transmit rate in can2040 is determined by the receive rate. That is, it monitors the "self feedback" that the "CAN TX" line has on the CANH and CANL lines as received by the "CAN RX" line.

You may want to review your transceiver, transceiver wiring, and canbus wiring. I guess it's possible capacitance on those lines could be delaying signalling which could result in a timing difference.

For what it is worth, I run the canbus at 1Mbit and have not observed any timing anomalies.

-Kevin

apple2ms commented 1 year ago

Well, I tried it at 1mbit and strangely enough it looks good and stable. Then tried again with 500 and 250 and it works intermittently and looks like the picture above. 125 also seems to work well. I am running the same two rp2040 boards with the same code on them.

furusawata commented 1 year ago

I got similar results.

250[kbps] image 500[kbps] image 1[Mbps] image

Compared to 1Mbps, 250kbps/500kbps has a large deviation. Is it because of the wiring?

apple2ms commented 1 year ago

No, definitely not because of the wiring. With my experiment the device still transfers data with some other can nodes but it is marginal.

KevinOConnor commented 1 year ago

If you think there is something wrong with the timing, it would help if you could reproduce with known working code (Klipper in usb-to-canbus mode) and provide a trace of both can rx and the can tx lines (as close to the rp2040 as possible). Canbus transmit timing is dependent on received responses, so a trace without both sides doesn't provide much information. A picture of the test environment and description of the critical components (eg, transceiver chip, terminating resistor locations) would also be helpful.

For information on running Klipper see https://github.com/KevinOConnor/can2040/blob/master/docs/Tools.md . That document also has a description on using a low-cost logic analyzer to debug bus issues.

-Kevin

furusawata commented 1 year ago

Thank you for answering.

RP2040 The transceiver on my board is a MAX3051, isolated with an ADUM5201.

I tried at 250kbps. Tx_to_Rx The Tx edge to Rx edge delay was 140ns.

RP2040_time It is about 519us from the first falling edge of Tx to the rising edge before Ack. Is this the accumulation of circuit delays?

I have also tried with the same board and JetsonXavierNX. JetsonXavierNX Jetson_Time Here it is about 512us from the first falling edge of Tx to the rising edge before Ack, ending 7us early. Presumably Tx is not affected by Rx lag.

Try removing the isolator, but it's not a fundamental solution. English is not my native language, so I apologize if I'm rude.

furusawata commented 1 year ago

I did it without isolation.

Tx_to_Rx_nonIso The delay from Tx edge to Rx edge was 44ns.

RP2040_Time_NonIso It's about 512.1us from the first falling edge of Tx to the rising edge before Ack, which is a small delay. If we do without isolation, the baud rate deviation is small and there is no problem. However, even with isolation, it is not clear that there is little delay at 1 Mbps.

Let me know if you have any other experiments.

thank you.

KevinOConnor commented 1 year ago

Interesting.

It is about 519us from the first falling edge of Tx to the rising edge before Ack.

What was the canbus message sent (how many bits were in it)? Was this with Klipper in usb-to-can mode running on the rp2040? If not, what was the rp2040 system clock speed?

I think I see why it's running okay at 1mbit - the current code only delays the clock timing if it detects a falling canrx line that is a little delayed - if the canrx line is delayed by a lot, then it doesn't resync. So, at 1mbit in your setup it is likely not doing clock synchronization. The can2040 software could be improved here.

In any case, a 140ns delay should have been okay at a 250000 speed. So something odd seems to be occurring.

I don't have a test setup immediately available. There are a few things you could try to see if it improves things:

  1. Disable the rx pullup:

    --- a/src/can2040.c
    +++ b/src/can2040.c
    @@ -385,7 +385,7 @@ pio_setup(struct can2040 *cd, uint32_t sys_clock, uint32_t bitrate)
    
     // Map Rx/Tx gpios
     uint32_t pio_func = cd->pio_num ? 7 : 6;
    -    rp2040_gpio_peripheral(cd->gpio_rx, pio_func, 1);
    +    rp2040_gpio_peripheral(cd->gpio_rx, pio_func, 0);
     rp2040_gpio_peripheral(cd->gpio_tx, pio_func, 0);
    }
  2. Increase the cantx drive strength:
    --- a/src/can2040.c
    +++ b/src/can2040.c
    @@ -55,7 +55,7 @@ rp2040_gpio_peripheral(uint32_t gpio, int func, int pull_up)
    {
     padsbank0_hw->io[gpio] = (
         PADS_BANK0_GPIO0_IE_BITS
    -        | (PADS_BANK0_GPIO0_DRIVE_VALUE_4MA << PADS_BANK0_GPIO0_DRIVE_MSB)
    +        | (PADS_BANK0_GPIO0_DRIVE_VALUE_12MA << PADS_BANK0_GPIO0_DRIVE_MSB)
         | (pull_up > 0 ? PADS_BANK0_GPIO0_PUE_BITS : 0)
         | (pull_up < 0 ? PADS_BANK0_GPIO0_PDE_BITS : 0));
     iobank0_hw->io[gpio].ctrl = func << IO_BANK0_GPIO0_CTRL_FUNCSEL_LSB;
  3. Disable the PIO input synchronizer.

    --- a/src/can2040.c
    +++ b/src/can2040.c
    @@ -375,6 +375,7 @@ pio_setup(struct can2040 *cd, uint32_t sys_clock, uint32_t bitrate)
    
     // Setup and sync pio state machine clocks
     pio_hw_t *pio_hw = cd->pio_hw;
    +    pio_hw->input_sync_bypass = 1 << cd->gpio_rx;
     uint32_t div = (256 / PIO_CLOCK_PER_BIT) * sys_clock / bitrate;
     int i;
     for (i=0; i<4; i++)

The above is totally untested. Some, none, or all of the above may alter the behavior. If you run tests, let me know the results (success or failure).

-Kevin

furusawata commented 1 year ago

What was the canbus message sent (how many bits were in it)? Was this with Klipper in usb-to-can mode running on the rp2040? If not, what was the rp2040 system clock speed?

I am using Klipper. cansend can0 00EF0102#C309C30900000000 128bit (118bit + Bit Stuffing 10bit) / 250kbps = 512us

The above is totally untested. Some, none, or all of the above may alter the behavior. If you run tests, let me know the results (success or failure).

Unfortunately it didn't work.

KevinOConnor commented 1 year ago

Thanks for the feedback.

Unfortunately it didn't work.

It did not improve the results or some other error occurred?

I will not have access to a test setup for a few more days. Another possibility is to try disabling timing feedback on transmits:

--- a/src/can2040.c
+++ b/src/can2040.c
@@ -110,7 +110,7 @@ static const uint16_t can2040_program_instructions[] = {
     0xa242, // 25: nop                           [2]
     0x6021, // 26: out    x, 1
     0xa001, // 27: mov    pins, x
-    0x20c4, // 28: wait   1 irq, 4
+    0xb942, // 28: nop                           [25]
     0x00d9, // 29: jmp    pin, 25
     0x023a, // 30: jmp    !x, 26                 [2]
     0xc027, // 31: irq    wait 7

-Kevin

furusawata commented 1 year ago

It did not improve the results or some other error occurred?

There was no change in the waveform and it was not improved. No.1 -> no change No.1 & 2 -> no change No.1 & 2 & 3 -> no change

Another possibility is to try disabling timing feedback on transmits

No.1, 2 and 3 are returned and only the PIO software is changed. Improved! 250kbps -> 512us 500kbps -> 256us 1Mbps -> 128us

Waiting for formal reply. thank you.

KevinOConnor commented 1 year ago

Thanks for testing and providing feedback.

I think it should be okay to disable "timing feedback on transmit". I'll have to give it some more thought, analysis, and testing. (The risk is that it might impact canbus message arbitration.)

-Kevin

KevinOConnor commented 1 year ago

I have created PR #32 with a potential fix for this issue.

-Kevin

KevinOConnor commented 1 year ago

I have updated PR #32 with a new solution that I think will be more robust. The updated solution implements bit time synchronization on tx, but only to faster transmitters (as is recommended in the canbus specification).

-Kevin

furusawata commented 1 year ago

Tested for PR #32.

hardware: without Isolation (44ns delay) / with Isolation (140ns delay) software: master / txtiming(#32) bitrate: 250k / 500 k / 1M send packet: ID=EF0102 Data=C309C30900000000 send bit: SOF=1, EID=29, SRR/IDE=2, DLC=4, RTR/r1/r0=3, Data=64, CRC=15, Bit Stuffing=10 ---> Total=128bit

Since the transfer bit is 128bit, it should be 512us/256us/128us respectively.

250kbps (512us) 250kbps

500kbps (256us) 500kbps

1Mbps(128us) 1Mbps

master is slightly off, txtiming (#32) was time correct in every case. Excellent!

Strangely, in this test, with Isolation was faster at 1Mbps.

KevinOConnor commented 1 year ago

Thanks for testing!

Strangely, in this test, with Isolation was faster at 1Mbps.

The master code wont do bit sychronization if the sender has a delay more than 1/16th of the bit time (62ns at 1mbit). That behavior should also be improved in the new txtiming branch.

-Kevin

KevinOConnor commented 1 year ago

I have merged PR #32 . This issue should now be resolved.

-Kevin