KevinOConnor / can2040

Software CAN bus implementation for rp2040 micro-controllers
GNU General Public License v3.0
631 stars 62 forks source link

TX Bus Errors ~every frame sending to RP2040 #26

Closed JamesWGT3 closed 1 year ago

JamesWGT3 commented 1 year ago

Thank you for this amazing library - it's a game changer for me with this processor.

I am using the arduino wrapper version of this library for clarity.

I seem to be getting error frames after virtually every Tx from BusMaster to the device. The data is received by the RP2040 fine, so there is no issue there, but the error flags are being raised, which will cause issues if it was added to a bigger bus - this doesn't happen with exactly the same set up, but exchanging the RP2040 for a teensy 4.1 and using FlexCANT4.

The set up/config I am using is as follows;

Sender: Busmaster -> Peak CAN FD, 1MBaud -> 120Ohm terminator Receiver: 120Ohm terminator -> SN65HVD Transceiver -> Pi Pico pins 18RX, 19TX

The errors apear after a single send, as well as with cyclic transmissions and seems to cause timing issues with the send on cyclic.

Single Send; image

Cyclic 10ms Send; image

Sending from the device seems to be ok, with perfect timing on the cyclic send;

Single Send; image

Cyclic 10ms Send; image

Please let me know if there is any more information I can collect for this issue to resolve it - thank you again!

Edit: I updated the ACAN2040 library can2040.h and can2040.c files to your latest, using the following wrapper in ACAN2040.h to remove compilation errors for the IRQ handling, but still have the same issues when sending to the RP2040 from another device.

`extern "C" {

include "can2040.h"

}`

KevinOConnor commented 1 year ago

Unfortunately there isn't sufficient information to provide much assistance.

Error reports from other devices don't necessarily indicate a problem. It's common for CANbus chips to transition to an "error passive" state when the bus first comes up - the chip may need to observe a few dozen normally transmitted packets before it will stop reporting errors (that is, it transitions back to an "error active" state).

If you're still seeing errors after a prolonged period, you should try with known working software. You can use the Klipper software in "USB to canbus" mode to test with ( https://github.com/KevinOConnor/can2040/blob/master/docs/Tools.md ).

If you continue to see errors while using the Klipper software then it will be necessary to take captures of the low-level signalling to find out the actual source of the problem. A logic analyzer can be used for this purpose ( https://github.com/KevinOConnor/can2040/blob/master/docs/Tools.md ).

-Kevin

JamesWGT3 commented 1 year ago

Thank you for the feedback.

I thought this may have been the case, but wondered if the ID of the Error message had some meaning for decoding the issue.

I have pulled out my picoscope to see what is going on. I'm sending the same message from both sides (seperately!);

RP2040 transmitting cyclicly - no issue;

image

RP2040 receiving cyclicly - issue with every frame;

image

It looks like it has something to do with the ACK portion of the message, causing the PEAK controller to resend the message 1-2 times before its accepted in every case. When it does ACK, it seems weak somehow?

KevinOConnor commented 1 year ago

Is that with Klipper running on the rp2040? If not, it would be a good idea to test with known working software (high software irq latency could cause the symptoms you are seeing). Also, you may want to get a capture of the CAN Rx and CAN Tx lines next to the rp2040 (instead of CANH and CANL) as that helps verify the transceiver is working correctly.

-Kevin

JamesWGT3 commented 1 year ago

Hi Kevin - I understand your requirement for Klipper on the RP2040, but having spent 8 hours attempting it so far - I am struggling. I'm a windows user, and its definitely not straight forward with the information provided. I appologise for the walkthrough below, but I needed to try and document the process to eventually make a how-to for others that may also struggle.

I installed klipper on an RPi4 with git clone https://github.com/Klipper3d/klipper

I then guessed at ./klipper/scripts/install-debian.sh, as I didn't want octo-pi installed.

I then did the following to start working through the microcontroller settings;

cd ~/klipper/ make menuconfig

and selected the below for my setup (I couldnt find any config files to use/follow so did a lot of trial and error);

image

I then used make to build the configuration.

following on from this I found the ID of the pico using ls /dev/serial/by-id/* which returned the following;

/dev/serial/by-id/usb-Raspberry_Pi_PicoArduino_E6605838836A2D33-if00

I guess this is because I was using the Arduino IDE to flash the code previously.

I copied the device ID and tried the following, which didn't work;

make flash FLASH_DEVICE=/usb-Raspberry_Pi_PicoArduino_E6605838836A2D33-if00

it gave a suggestion to put the device into boot mode, and use the below command, which then flashed and rebooted;

make flash FLASH_DEVICE=2e8a:0003

it reports the following - I dont know if the 'Loaded UF2 image with 0 pages' is suspicious

Flashing out/klipper.bin to 2e8a:0003 sudo lib/rp2040_flash/rp2040_flash out/klipper.bin Loaded UF2 image with 0 pages Found rp2040 device on USB bus 1 address 9 Flashing... Resetting interface Locking Exiting XIP mode Erasing Flashing Rebooting device

around this flash I used the sudo service klipper stop/start commands

I then followed the next step to find the device UUID which is where it seems to all fall over;

~/klippy-env/bin/python ~/klipper/scripts/canbus_query.py can0 returns a failure;

Traceback (most recent call last): File "/home/pi/klipper/scripts/canbus_query.py", line 64, in <module> main() File "/home/pi/klipper/scripts/canbus_query.py", line 61, in main query_unassigned(canbus_iface) File "/home/pi/klipper/scripts/canbus_query.py", line 25, in query_unassigned bus.send(msg) File "/home/pi/klippy-env/lib/python2.7/site-packages/can/interfaces/socketcan/socketcan.py", line 658, in send sent = self._send_once(data, msg.channel) File "/home/pi/klippy-env/lib/python2.7/site-packages/can/interfaces/socketcan/socketcan.py", line 681, in _send_once raise can.CanError("Failed to transmit: %s" % exc) can.CanError: Failed to transmit: [Errno 100] Network is down

can-utils is installed and I have previously used it successfully with my PEAK adapter. I have the network up and a listener set up in Busmaster via my PEAK adapter which shows nothing.

after this, ls /dev/serial/by-id/* returns nothing - so I am assuming that the device isn't actually booting after the code is flashed?

not knowing what to do next, I started to follow the CAN-BUS installation notes from Klipper;

I created the Can0 file here /etc/network/interfaces.d/can0 with sudo nano with the following information (changing 500->1000);

allow-hotplug can0 iface can0 can static bitrate 1000000 up ifconfig $IFACE txqueuelen 128

I then rebooted the pi with sudo reboot

This seemed to be the missing step and afterwards ~/klippy-env/bin/python ~/klipper/scripts/canbus_query.py can0 returned;

Found canbus_uuid=9e9acedba0f1, Application: Klipper Total 1 uuids found

I thought this was going to be it working, but this seems to be where I have got stuck.

I finally found where to update MCU config - I needed to create /home/pi/printer.cfg - there doesnt seem to be any suitable example.cfg to follow - perhaps this would really help others going forwards with this.

I read 'Config_Reference.md' from the Klipper/Docs folder and tried the following;

[mcu] serial: /dev/serial/by-id/usb-Raspberry_Pi_PicoArduino_E6605838836A2D33-if00 canbus_uuid: 9e9acedba0f1 canbus_interface: can0 restart_method: rpi-usb

It doesnt seem to work - checking /tmp/klippy.log shows its not happy that there arent any printer configs, so I am working through this now.

also since flashing ls /dev/serial/by-id/* returns only ls: cannot access '/dev/serial/by-id/*': No such file or directory so I cant be sure the USB id is now correct.

lsusb shows the device is recognised by the RPi;

Bus 002 Device 001: ID 1d6b:0003 Linux Foundation 3.0 root hub Bus 001 Device 005: ID 1d50:606f OpenMoko, Inc. Geschwister Schneider CAN adapter Bus 001 Device 002: ID 2109:3431 VIA Labs, Inc. Hub Bus 001 Device 001: ID 1d6b:0002 Linux Foundation 2.0 root hub

To attempt to solve the above ls /dev/serial/by-id/* issue, I flashed 'Blink' on to the pi from the pico-sdk using the blink.uf2 to see if it was anything to do with the Arduino IDE that was causing the issue. I now cant get anything returned even when the pico is in boot mode, so I've taken a big step backwards!

I'm sorry to turn this into a klipper help post, but to get the information required to understand the original issue, I need help to get klipper working - I really tried to do it without asking for help - I must be doing something simple wrong somewhere!

For what its worth, the callback function I am using is simply copying the received message into rx_msg for processing in the loop from the flag;

`void my_cb(struct can2040 cd, uint32_t notify, struct can2040_msg msg) {

(void)(cd); cb_called = true; cb_notify = notify;

if (notify == CAN2040_NOTIFY_RX) { rx_msg = *msg; }

return; }`

I also scoped the CAN_RX and CAN_TX lines to the transceiver and they show exactly the same as the CAN lines in the above post showing 2-3 attempts to get an ACK.

image
KevinOConnor commented 1 year ago

Ah, okay. It's not as complex as you've done - as you only need to build the Klipper micro-controller code - you don't need to install all of Klipper. Steps are roughly:

  1. Download the Klipper code (git clone https://github.com/Klipper3d/klipper).
  2. Install compiler tools (eg, ./klipper/scripts/install-debian.sh).
  3. Build the micro-controller software (make menuconfig and then make).
  4. Flash the software to the micro-controller (make flash FLASH_DEVICE=2e8a:0003).
  5. At this point you should be able to install the can-utils package, bring up the can0 interface, and run cansend and candump (as described at https://github.com/KevinOConnor/can2040/blob/master/docs/Tools.md#klipper ). Specifically sudo apt-get install can-utils, sudo ip link set can0 up type can bitrate 1000000, and then something like candump -t z -Ddex can0,#FFFFFFFF).

It sounds like you got most of the way through the above. But I can't tell if you were actually able to send/receive packets on the interface or not. Your "make menuconfig" settings look okay, but you would not typically use a bootloader offset (unless you wanted to install and use CanBoot for some reason). If you got all the way to being able to bring the can0 interface up then it sounds like you got it working.

I also scoped the CAN_RX and CAN_TX lines to the transceiver and they show exactly the same as the CAN lines in the above post showing 2-3 attempts to get an ACK.

Was that with Klipper running on the rp2040 or with the arduino code?

-Kevin

KevinOConnor commented 1 year ago

FYI, I updated the instructions at https://github.com/KevinOConnor/can2040/blob/master/docs/Tools.md#klipper . Hopefully that makes it a little more clear on how one can test can2040 using the Klipper micro-controller code with the Linux can-utils package.

-Kevin

JamesWGT3 commented 1 year ago

phew - that was making me feel stupider by the minute - thank you for your help, and for updating the documentation.

I removed the 16KiB bootloader, and reflashed and it all started working! I had tried all those steps previously before and it wouldnt come up, hence why I thought I had to go through all the Klipper setup. Its likely I tried this before rebooting, perhaps.

With that, I have to report that I dont see any bus errors when using the Klipper code to receive the sent messages;

image

And on Transmit using cangen;

image

This obviously narrows the issue to the Arduino wrapper for the library, or my code - however I dont think there is anything arduous in the IRQ handler to cause timing issues.

I flashed my code back onto it, but allowed it to transmit as well as receive, and it seems to perform better than when receiving on its own. It's still getting sporadic errors, perhaps one every 30 seconds, but before it was on almost every receive. I checked this, by chaging it back to only receiving, and the errors came back to multiple/second with failed ACK's on the scope.

In one of my applications, I was hoping to just have the RP2040 listening and gatewaying the signals. A workaround could be to Tx a redundant frame, but it doesnt seem right. I'll gladly email you my code, if you would be willing to skim through it to see if something jumps out to cause this issue? Once working, I'm happy to release the code as as example, but I would need to sanitise the .dbc / variables etc.

Another anomaly I'm seeing is on resumption of sending if the device has been reflashed inbetween - I get some random message ID's injected onto the bus, with what looks like the data from the messages I was wanting to send. For instance I was wanting to single send 0xC0, 0x70, 0x80, 0x576 and 0x24C, but along with them 0x1 and 0x501 were sent. 0x1 has 0x80's data in it, and 0x501 has 0x576's!? Do you have any idea what could cause this? Is there a way to clear the Tx buffer on boot?

image

KevinOConnor commented 1 year ago

With that, I have to report that I dont see any bus errors when using the Klipper code to receive the sent messages

I wont be able to review your code. However, it sounds like your application (or something else in the Arduino code) is causing high irq latency. https://github.com/KevinOConnor/can2040/blob/master/docs/Features.md#software-utilization

The likely culprit is code that is disabling irqs for an extended period, some other regularly occurring irq handler that has greater or equal priority to the can2040 irq handler, or possibly rp2040 flash cache misses. Basically, the can2040 irq handler needs to run within a few microseconds of a PIO hardware irq signal - any delays can result in the code not being able to successfully acknowledge a CAN bus packet.

I get some random message ID's injected onto the bus

I'm not aware of any cases where can2040 would transmit a message not requested. It doesn't sound like a can2040 issue.

-Kevin

JamesWGT3 commented 1 year ago

Thank you, Kevin for the pointers - I completely understand not wanting to review anyones code.

I am currently running just can2040 on one core, and was intending for my application to run on the other core. I am not using any other interrupts that I know of.

I stopped using the Arduino ACAN2040 library, and managed to get the C library working in the Arduino IDE as it is.

The code seems to work better, with multiples less bus errors when running at 1Mbit with 1x 100ms Tx and 2x 10ms Rx messages (~850msg/s & 10% bus load). Its worth mentioning the code performs much better at lower Baud rates, but my application requires 1Mbit in this instance.

The errors still seem to be failed ACK's in all cases;

image

From your above comments on the IRQ latency, I put some timers around these and found them to be ~0-1us as all it is doing is copying the msg into rx_msg for processing out of the interrupt. At 1Mbit, the bit rate is 1us so it should be within the timing tolerances.

If I put anything at all on the second core (even a simple timed blink, not using delay/sleep) the bus errors go up significantly (6-10Error/s at 10% busload).

I have attached the simple working code as a basic example for anyone wanting a walk through of can2040 in the ArduinoIDE, but I will now be trying to implement this with the pico sdk to evaluate the performance there in comparison as perhaps the Arduino environment is having an effect given the klipper performance.

As to the random messages injected onto the bus - they occur when flashing the board with the CAN bus still connected. I havent seen any rogue messages from the code otherwise, so its a niche issue, but I expect lots of these boards will be flashed whilst still connected to the bus. It occurred also with the first single send, rather than cyclic message, so there was defeinitely rogue ID's in the tx queue using the existing data. Is there a way to clear the tx fifo before the initial send on start incase there is anything rogue in there after flashing? I saw someting under pio_tx_reset, but was unable to implement it as it is a private function.

can2040_ArduinoIDE.zip

KevinOConnor commented 1 year ago

From your above comments on the IRQ latency, I put some timers around these and found them to be ~0-1us as all it is doing is copying the msg into rx_msg for processing out of the interrupt.

Just to be clear, as far as I understand it, the issue is not that can2040_pio_irq_handler() is taking too long, but that something is blocking the PIO irq from starting the execution of can2040_pio_irq_handler(). (That is, something disabling irq handling, some other irqs running at higher priority, or possibly rp2040 flash cache misses.)

I'm not sure of a good way to measure irq latency. One thing you could check is to read the rx fifo level (pio0_hw->flevel & PIO_FLEVEL_RX1_BITS) >> PIO_FLEVEL_RX1_LSB) prior to every call to can2040_pio_irq_handler(). It should always be 0 or 1 - if it's ever greater than 1 it indicates there was at least 10 "bit times" of latency (10us at 1mbit).

As to the random messages injected onto the bus - they occur when flashing the board with the CAN bus still connected.

I don't understand what you mean. How could can2040 be running while you are flashing the board?!

As for clearing the fifo - the can2040_setup() code does a memset of the entire struct can2040 including the queue. There is definitely nothing in the queue after the setup call.

I'm not sure what the issue is, but it does not sound like something in the can2040 code.

Cheers, -Kevin

KevinOConnor commented 1 year ago

For what it is worth, you may be able to ignore this issue. I don't know your application, but it may be tolerant of an occasional retransmit.

This issue should not scale adversely with many other nodes on the bus. The canbus spec requires that every node on the bus ack every packet, even if those nodes aren't interested in that packet (yes, the canbus spec is quirky). So, a retransmit should only occur if no nodes are able to ack the packet. The can2040 code will eventually process the packet successfully as long as any node acks it (assuming one doesn't go over the ~81 bit times of irq latency that would result in a fifo overflow).

Cheers, -Kevin

JamesWGT3 commented 1 year ago

Thanks again, Kevin!

I implemented the rx fifo level check, with a flag set in the IRQ handler to print the results in the loop every 16 IRQ's. The results show that it is mostly 0's and 1's but there was one 2 in the below output (I had to seach a while to find an occurence, as it was defintely an outlier). This was tested with 1 x 100ms Tx message from the RP2040 and 1 x 100ms Rx message transmitted from busmaster at 1Mbit - so extremely low bus loading (0.2%). There was still an Rx error with no ACK with almost every send. I also tried PCAN-View just to see if this was a Busmaster quirk, and the same flags are reported. I have an ETAS581, and a Vector 1630, so I can try different interface hardware - but this all works properly on a teensy 4.1 with the FlexCanT4 library, so I dont believe that to be the issue.

13:11:01.366 -> 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 1, 1, 1, 1, 1, 0 13:11:01.366 -> 1, 1, 1, 1, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0 13:11:01.366 -> 0, 0, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0 13:11:01.496 -> 0, 0, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 0 13:11:01.528 -> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 0, 1, 1, 0 13:11:01.593 -> 1, 1, 1, 1, 1, 1, 1, 0, 1, 0, 1, 0, 0, 1, 1, 0 13:11:01.689 -> 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 1, 1, 1, 1, 1, 0 13:11:01.689 -> 2, 1, 1, 1, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0 13:11:01.689 -> 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 0, 0, 1, 0, 1, 0 13:11:01.786 -> 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 0, 0, 1, 0, 1, 0 13:11:01.851 -> 1, 1, 1, 1, 1, 1, 1, 0, 1, 0, 1, 1, 1, 1, 1, 0 13:11:01.851 -> 1, 1, 1, 0, 1, 0, 1, 0, 0, 1, 1, 1, 1, 1, 1, 0 13:11:01.884 -> 1, 1, 1, 0, 1, 0, 1, 0, 0, 1, 1, 1, 1, 1, 1, 0 13:11:01.982 -> 0, 1, 0, 1, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0 13:11:02.015 -> 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0 13:11:02.015 -> 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 0, 0, 1, 1, 1, 0 13:11:02.083 -> 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 0, 0, 1, 0, 1, 0 13:11:02.145 -> 1, 1, 1, 1, 1, 1, 1, 0, 1, 0, 1, 1, 1, 1, 1, 0 13:11:02.145 -> 1, 0, 1, 0, 1, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 0 13:11:02.178 -> 0, 0, 1, 0, 1, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 0 13:11:02.275 -> 0, 1, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0 13:11:02.307 -> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 1, 0, 0 13:11:02.307 -> 1, 1, 1, 1, 1, 0, 1, 0, 0, 1, 1, 0, 1, 1, 1, 0 13:11:02.372 -> 1, 1, 1, 1, 1, 0, 1, 0, 0, 1, 0, 1, 1, 1, 1, 0 13:11:02.468 -> 1, 1, 1, 1, 0, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 0 13:11:02.468 -> 0, 1, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0 13:11:02.468 -> 0, 1, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 0 13:11:02.598 -> 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 1, 1, 0 13:11:02.598 -> 1, 1, 1, 1, 1, 0, 1, 0, 0, 1, 1, 0, 1, 1, 1, 0 13:11:02.695 -> 1, 1, 1, 1, 1, 0, 1, 0, 0, 1, 0, 1, 1, 1, 1, 0 13:11:02.761 -> 1, 1, 1, 1, 0, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 0 13:11:02.761 -> 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 0, 0 13:11:02.890 -> 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 0, 0, 0 13:11:02.923 -> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 0 13:11:02.923 -> 1, 1, 1, 0, 1, 0, 0, 1, 1, 0, 1, 1, 1, 1, 1, 0 13:11:02.987 -> 1, 1, 1, 0, 1, 0, 0, 1, 0, 1, 1, 1, 1, 1, 1, 0 13:11:03.084 -> 1, 1, 0, 1, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0 13:11:03.084 -> 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 0, 1, 0 13:11:03.180 -> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 0, 1, 0, 0 13:11:03.244 -> 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 0, 1, 1, 1, 0

When only transmitting from the RP2040, the output looks much the same, but with no bus errors generated. As soon as I turn on the 100ms Rx the errors appear instantly (without any sudden change and numbers higher than 1 appearing). This is ~15 mgs/s due to the retransmits, with 5-20 errors/s dependant on how many retransmits are needed and 0.2% bus load.

I'm not the strongest coder, so the whole CMake process is new to me, but I am trying to get this to work/build in VSCode rather than the ArduinoIDE, as I appreciate it must be working correctly elsewhere!

The initial application I have is just 2 nodes, with the intention of the RP2040 being a receiver and passing the data to another controler via UART (the other node is changeable, with different addresses possible so I wanted to keep it away from the main bus), so I do need to get it working correctly - the main issue really that I have is that putting anything at all on the other core seems to make this completely fall over. I have had successfull multi-core aplications running in the ArduinoIDE on this board so something odd is occuring. I am not using anything else with interrupts - I even changed the timing structure incase that was using interrupts to a simple in code counter and the issue was still the same.

JamesWGT3 commented 1 year ago

Hi Kevin, I finally worked through getting everything set up in a Linux environment and got a very basic version of can2040 to compile correctly in the Pico SDK. Frustratingly, whilst it receives - it is performing the same as the ArduinoIDE compilation and is missing ACK's with minimal busloading (<1%).

The code is as simple as I believe it can be - it flags when the IRQ has received a message, and then it toggles an LED in main in recognition of the flag as I didn't want to have the overhead of printing anything - I am not doing any post processing or doing anything else at all. I didn't even want to Tx a message in case any timing methods turned out to be using interrupts causing the issues seen prior.

I must be doing something wrong, but as far as I can see - I am only using your example code snippets. cansend/cangen don't show any faults so is it possible that I am capturing this issue as I am using different software/hardware which makes these issues very visible?

At this point I really don't know what to do, I've already commissioned $800 of PCB's to test this with, and spent a week debugging so I need to keep struggling - I've tried using the bare library in both ArduinoIDE and now VSCode/command line and I seem to be having the same issue.

The Pico SDK code is attached - there must be something wrong somewhere?

can2040JW.md

KevinOConnor commented 1 year ago

That's really strange. I don't know what would cause that. I don't see issues like that here locally (but I test with the Klipper code).

If I understand your report correctly, you can't reliably ack messages at 1mbit with Arduino nor Pico SDK, but the same hardware has no issues acking when running Klipper?

Is there some way you could zip up the entire build of your picosdk code and attach it here? I'm looking for all your project rules (including cmake and other build files) along with the resulting binary files (both intermediate .o files and final .elf files).

Do you know what gcc optimization flags were used during the build?

-Kevin

JamesWGT3 commented 1 year ago

Hi Kevin - it was the optimisation flags! I feel very stupid - but in CMake it was set to the default of 'Debug', as at the time it made sense as I was trying to work out what was wrong - but this disabled optimisations! With this changed to 'Release', the code is working much better with an error rate of around 0.02% for a ~4% bus loading, measured over 10 minutes (1 x 100ms TX, 3 x 10ms Rx), which will work absolutely fine for my applications!

Bumping it up to around 20% bus loading shows an error rate of around 0.025%, so it doesn't scale adversely.

Using the compile optimisations within the ArduinoIDE also produces better results. The Arduino default is -Os, for reference.

I can also confirm that the multicore functions now work correctly and I am able to run things on both cores concurrently - so this was my problem all along.

I've donated to your Ko-Fi, but once tested I'd like to send you one of my boards to show my appreciation for your continued help (luckily my hardware design is a lot better than my coding!!). Its just a simple two CAN interface board with switchable terminations and an RP2040, but I added on some analogue inputs with test pots etc to make it a useful development board for bench top testing if you ever have the need. I should have them next week sometime and will get them tested. If you are interested, please let me know the best way to get in contact to arrange this.

Thank you again!

image