candle-usb / candleLight_fw

gs_usb compatible firmware for candleLight, cantact and canable
Other
642 stars 283 forks source link

cansequence tool shows wrong telegram order #166

Open mawekman opened 1 year ago

mawekman commented 1 year ago

Hi all, First of all, this is a great project. We bought two candlelight usb-can sticks from linux automation as we might plan to use this interface on our products. In our tests we now have a phenomenon that we cannot explain. We use the cansequence tool to create load on the canbus. With a baudrate of 1MBit and a load between 60% and 95%, we notice again and again that telegrams are received in the wrong order. Another non-candlelight-based cancontroller receives the telegrams in the correct order at the same moment, so I assume that the CAN telegrams are in the correct order on the bus. It looks like this:

grafik Telegrams are not lost but in the wrong order.

We are using a raspberry cm4 based system (cm4 io board). grafik

Has anyone ever seen something similar or have an idea what's going wrong ?

Thanks in advance !

marckleinebudde commented 1 year ago

Hey @mawekman,

can you try my latest firmware: https://github.com/marckleinebudde/candleLight_fw/tree/multichannel

marckleinebudde commented 1 year ago

Can you connect the candlelight without an USB-hub?

mawekman commented 1 year ago

Hey @marckleinebudde,

if have compiled and flashed your firmware but the behaviour ist the same. grafik

Connecting the candlelight to an port not routed over an USB-hub is not possible on pi4. The pi4 boards use an pcie USB controller to provide USB Interfaces. We have tested a VIA VL805 Chip pcie card (this is the one on the standard pi4 board) on the cm4-io-board and a pcie card with a Renesas upd720201 chip. The behaviour of both cards is the same. The internal dwc2 USB Controller on pi4 also depends on a usb-hub and cannot be used in our design.

marckleinebudde commented 1 year ago

I think the RX side on the µC hardware is configured for FIFO, so there shouldn't be any re-orderings on the hardware side.

Is your kernel new enough to support HW timestamps? What does candump any,0:0,#FFFFFFFF -cexdtA -H output? If HW timestamps are supported please record them via candump any,0:0,#FFFFFFFF -cexdtA -H | tee /tmp/log. Let's see if the kernel stack re-orders them.

mawekman commented 1 year ago

Hi @marckleinebudde ,

I'm working on compiling a new kernel with timestamping support in the gs_usb driver.

I just noticed that I translated the wrong candlelight firmware branch. So I will repeat uploading a new firmware. But when I compile the correct multichannel branch I get following error:

grafik

marckleinebudde commented 1 year ago

It should compile now - I've force pushed the multichannel branch.

mawekman commented 1 year ago

I'm sorry, I've updated the branch but it's still the same issue.

marckleinebudde commented 1 year ago

Please show me the output of:

git status
git rev-parse @
rm -rf build
mkdir build
cd build
cmake .. -DCMAKE_TOOLCHAIN_FILE=../cmake/gcc-arm-none-eabi-8-2019-q3-update.cmake
make
mawekman commented 1 year ago

`root@debian11:/mnt/hgfs/share/marckleinebudde/repos/candleLight_fw# git status On branch multichannel Your branch is up to date with 'origin/multichannel'.

nothing to commit, working tree clean root@debian11:/mnt/hgfs/share/marckleinebudde/repos/candleLight_fw# git rev-parse @ 78def661a756b22dc83ef4c051f2c406d4094fb8 root@debian11:/mnt/hgfs/share/marckleinebudde/repos/candleLight_fw# rm -rf build root@debian11:/mnt/hgfs/share/marckleinebudde/repos/candleLight_fw# mkdir build root@debian11:/mnt/hgfs/share/marckleinebudde/repos/candleLight_fw# cd build/ root@debian11:/mnt/hgfs/share/marckleinebudde/repos/candleLight_fw/build# cmake .. -DCMAKE_TOOLCHAIN_FILE=../cmake/gcc-arm-none-eabi-8-2019-q3-update.cmake -- The C compiler identification is GNU 8.3.1 -- The ASM compiler identification is GNU -- Found assembler: /usr/bin/arm-none-eabi-gcc -- Detecting C compiler ABI info -- Detecting C compiler ABI info - failed -- Detecting C compile features -- Detecting C compile features - done -- The CXX compiler identification is GNU 8.3.1 -- Detecting CXX compiler ABI info -- Detecting CXX compiler ABI info - failed -- Detecting CXX compile features -- Detecting CXX compile features - done


You may now: -compile all targets ('make') -compile a single target (e.g. 'make cantact_fw' -flash a device (e.g. 'make flash-cantact_fw' -- Configuring done -- Generating done -- Build files have been written to: /mnt/hgfs/share/marckleinebudde/repos/candleLight_fw/build root@debian11:/mnt/hgfs/share/marckleinebudde/repos/candleLight_fw/build# make Scanning dependencies of target STM32_HAL_STM32F042x6 [ 0%] Building C object libs/STM32_HAL/CMakeFiles/STM32_HAL_STM32F042x6.dir/src/stm32f0xx/stm32f0xx_hal.c.obj [ 1%] Building C object libs/STM32_HAL/CMakeFiles/STM32_HAL_STM32F042x6.dir/src/stm32f0xx/stm32f0xx_hal_can.c.obj [ 1%] Building C object libs/STM32_HAL/CMakeFiles/STM32_HAL_STM32F042x6.dir/src/stm32f0xx/stm32f0xx_hal_cortex.c.obj [ 1%] Building C object libs/STM32_HAL/CMakeFiles/STM32_HAL_STM32F042x6.dir/src/stm32f0xx/stm32f0xx_hal_flash_ex.c.obj [ 2%] Building C object libs/STM32_HAL/CMakeFiles/STM32_HAL_STM32F042x6.dir/src/stm32f0xx/stm32f0xx_hal_flash.c.obj [ 2%] Building C object libs/STM32_HAL/CMakeFiles/STM32_HAL_STM32F042x6.dir/src/stm32f0xx/stm32f0xx_hal_gpio.c.obj [ 2%] Building C object libs/STM32_HAL/CMakeFiles/STM32_HAL_STM32F042x6.dir/src/stm32f0xx/stm32f0xx_hal_pcd_ex.c.obj [ 3%] Building C object libs/STM32_HAL/CMakeFiles/STM32_HAL_STM32F042x6.dir/src/stm32f0xx/stm32f0xx_hal_pcd.c.obj [ 3%] Building C object libs/STM32_HAL/CMakeFiles/STM32_HAL_STM32F042x6.dir/src/stm32f0xx/stm32f0xx_hal_rcc.c.obj [ 3%] Building C object libs/STM32_HAL/CMakeFiles/STM32_HAL_STM32F042x6.dir/src/stm32f0xx/stm32f0xx_hal_rcc_ex.c.obj [ 4%] Building C object libs/STM32_HAL/CMakeFiles/STM32_HAL_STM32F042x6.dir/src/stm32f0xx/stm32f0xx_hal_tim_ex.c.obj [ 4%] Building C object libs/STM32_HAL/CMakeFiles/STM32_HAL_STM32F042x6.dir/src/stm32f0xx/stm32f0xx_hal_tim.c.obj [ 4%] Building C object libs/STM32_HAL/CMakeFiles/STM32_HAL_STM32F042x6.dir/src/stm32f0xx/stm32f0xx_ll_usb.c.obj [ 5%] Building C object libs/STM32_HAL/CMakeFiles/STM32_HAL_STM32F042x6.dir/src/cmsis/system_stm32f0xx.c.obj [ 5%] Linking C static library libSTM32_HAL_STM32F042x6.a [ 5%] Built target STM32_HAL_STM32F042x6 Scanning dependencies of target STM32_USB_Device_Library_STM32F042x6 [ 5%] Building C object libs/STM32_USB_Device_Library/CMakeFiles/STM32_USB_Device_Library_STM32F042x6.dir/Core/Src/usbd_ctlreq.c.obj [ 6%] Building C object libs/STM32_USB_Device_Library/CMakeFiles/STM32_USB_Device_Library_STM32F042x6.dir/Core/Src/usbd_ioreq.c.obj [ 6%] Building C object libs/STM32_USB_Device_Library/CMakeFiles/STM32_USB_Device_Library_STM32F042x6.dir/Core/Src/usbd_core.c.obj [ 6%] Linking C static library libSTM32_USB_Device_Library_STM32F042x6.a [ 6%] Built target STM32_USB_Device_Library_STM32F042x6 Scanning dependencies of target version_h -- Found Git: /usr/bin/git (found version "2.30.2") [ 6%] Built target version_h Scanning dependencies of target usb2can_fw [ 7%] Building C object CMakeFiles/usb2can_fw.dir/src/usbd_desc.c.obj [ 7%] Building C object CMakeFiles/usb2can_fw.dir/src/usbd_gs_can.c.obj [ 7%] Building C object CMakeFiles/usb2can_fw.dir/src/usbd_conf.c.obj [ 8%] Building C object CMakeFiles/usb2can_fw.dir/src/can_common.c.obj [ 8%] Building C object CMakeFiles/usb2can_fw.dir/src/dfu.c.obj [ 8%] Building C object CMakeFiles/usb2can_fw.dir/src/gpio.c.obj [ 9%] Building C object CMakeFiles/usb2can_fw.dir/src/led.c.obj [ 9%] Building C object CMakeFiles/usb2can_fw.dir/src/timer.c.obj [ 9%] Building C object CMakeFiles/usb2can_fw.dir/src/util.c.obj [ 10%] Building C object CMakeFiles/usb2can_fw.dir/src/startup.c.obj [ 10%] Building C object CMakeFiles/usb2can_fw.dir/src/main.c.obj [ 10%] Building C object CMakeFiles/usb2can_fw.dir/src/interrupts.c.obj [ 11%] Building C object CMakeFiles/usb2can_fw.dir/src/boards/legacy.c.obj /mnt/hgfs/share/marckleinebudde/repos/candleLight_fw/src/boards/legacy.c:188:2: error: missing initializer for field 'leds' of 'struct BoardChannelConfig' [-Werror=missing-field-initializers] .channels[0].leds = { ^ In file included from /mnt/hgfs/share/marckleinebudde/repos/candleLight_fw/src/boards/legacy.c:29: /mnt/hgfs/share/marckleinebudde/repos/candleLight_fw/include/board.h:44:19: note: 'leds' declared here struct LEDConfig leds[LED_MAX]; ^~~~ cc1: all warnings being treated as errors make[2]: [CMakeFiles/usb2can_fw.dir/build.make:238: CMakeFiles/usb2can_fw.dir/src/boards/legacy.c.obj] Error 1 make[1]: [CMakeFiles/Makefile2:191: CMakeFiles/usb2can_fw.dir/all] Error 2 make: *** [Makefile:103: all] Error 2 root@debian11:/mnt/hgfs/share/marckleinebudde/repos/candleLight_fw/build# `

marckleinebudde commented 1 year ago

Ok, you're indeed building the correct branch. Maybe a different compiler version. Let's try:

make VERBOSE=1 usb2can_fw

Then we see the used compiler, here it's /usr/lib/ccache/arm-none-eabi-gcc. Let's see it's version:

/usr/lib/ccache/arm-none-eabi-gcc --version
arm-none-eabi-gcc (15:12.2.rel1-1) 12.2.1 20221205
Copyright (C) 2022 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

It's from current Debian testing.

mawekman commented 1 year ago

It seems my compiler ist quit old.

root@debian11:/mnt/hgfs/share/marckleinebudde/repos/candleLight_fw/build# /usr/bin/arm-none-eabi-gcc --version arm-none-eabi-gcc (15:8-2019-q3-1+b1) 8.3.1 20190703 (release) [gcc-8-branch revision 273027] Copyright (C) 2018 Free Software Foundation, Inc. This is free software; see the source for copying conditions. There is NO warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

So I will first udate the gcc and try it again.

marckleinebudde commented 1 year ago

Is that current Debian stable aka Bullseye aka Debian 11?

I can reproduce the issue with the old compiler.

Update: I have a fix that works with the old compiler.

marckleinebudde commented 1 year ago

I've force pushed my branch. Now it works even with the older compiler.

mawekman commented 1 year ago

Hi @marckleinebudde,

thanks a lot for your help. I've made some additional tests and it seems to be a problem depending of the RT-kernel. Heavy load on the cpu's shows the problem more often and faster. Same tests with a non RT kernel do not show the problems at the moment. Do you know if anyone has ever done tests with an RT kernel ?

marckleinebudde commented 1 year ago

I don't think anyone has tested USB-CAN devices under RT configurations. As I speculated earlier https://github.com/candle-usb/candleLight_fw/issues/166#issuecomment-1542071952, I think the networking stack might reorder the CAN frames. I think this problem is best discussed on the linux-can@vger.kernel.org mailing list.

marckleinebudde commented 1 year ago

For the reference: A patch series has been posted on the linux-can@vger.kernel.org mailing list: https://lore.kernel.org/all/20230705-gs_usb-rx-offload-v1-0-8e7e46e40137@pengutronix.de

fenugrec commented 1 year ago

For the reference: A patch series has been posted on the linux-can@vger.kernel.org

oh wow good work tracking that down. How are other (non-CAN) usb network adapters handling this ?

marckleinebudde commented 1 year ago

I should add the solution to my mail, here's the problem description:

Traditionally USB drivers used to pass the received CAN frames/skbs to the network stack with netif_rx(). In netif_rx() the skbs are queued to the local CPU. If IRQs are handled in round robin, OoO packets may occur.

The solution to the problem is to use NAPI and then netif_receive_skb() to push the skb into the networking stack. netif_receive_skb() properly serializes the incoming skbs.

All networking drivers not using NAPI have this problem. Someone tested a non-NAPI USB Ethernet adapter and saw OoO TCP packages in wireshark. TCP tries to fix this, but IIRC this resulted in retransmissions and then duplicate TCP packets. At least this was not beneficial for TCP performance. Don't know what this does to a realtime UDP audio stream 😢.

Some newer USB-Ethernet drivers like the lan78xx use NAPI.

marckleinebudde commented 1 year ago

FYI: v2: https://lore.kernel.org/all/20230718-gs_usb-rx-offload-v2-0-716e542d14d5@pengutronix.de