PX4 / PX4-Autopilot

PX4 Autopilot Software
https://px4.io
BSD 3-Clause "New" or "Revised" License
8.09k stars 13.33k forks source link

Audit CAN interface on FMUv5 #9534

Closed LorenzMeier closed 6 years ago

LorenzMeier commented 6 years ago

We need to validate the CAN interface on FMUv5 before the manufacturers enter mass production.

LorenzMeier commented 6 years ago

@bkueng Could you please validate the interface early next week?

dagar commented 6 years ago

If at all relevant please consider syncing up with UAVCAN. https://github.com/PX4/Firmware/pull/9219

dagar commented 6 years ago

I don't know the state of this, but CAN init for FMUv5 is marked WIP and not included in the PX4 side board support - https://github.com/PX4/Firmware/blob/master/src/drivers/boards/px4fmu-v5/CMakeLists.txt#L35

@davids5 do you know of any issues with CAN support for FMUv5 or stm32f7 in general?

davids5 commented 6 years ago

@dagar IIRC the V5 board only uses CAN 1 & 2 (not CAN 3) the stm32_can file for nuttx in the board configs is a no-op on most of the platforms. UAVCAN use the HW directly. This should be tested on both busses.

DanielePettenuzzo commented 6 years ago

@LorenzMeier @bkueng I just performed a simple test connecting the can interfaces on the pixhawk to my laptop using the zubax can-usb adapter. I started uavcan on the pixhawk and I ran the uavcan_gui_tool on my laptop. On my laptop I set the node-id=50 and on the pixhawk the default id is 1. The gui detects the pixhawk node and on px4 it detects my laptop node. On the gui I receive a NodeStatus and a GlobalTimeSync message from px4. I get the same behaviour both with can1 and can2. The following are screenshots of the gui on ubuntu and uavcan status on nuttx for can1. Is there a more specific test I can perform to validate the interface?

uavcan_gui_tool_fmuv5-can1

nsh> uavcan start
INFO  [uavcan] Node ID 1, bitrate 1000000
INFO  [uavcan] sensor bridge 'gnss' init ok
INFO  [uavcan] sensor bridge 'mag' init ok
INFO  [uavcan] sensor bridge 'baro' init ok
nsh> WARN  [uavcan] GNSS ORB fd 8

nsh> uavcan status
Pool allocator status:
    Capacity hard/soft: 500/250 blocks
    Reserved:  19 blocks
    Allocated: 3 blocks
UAVCAN node status:
    Internal failures: 0
    Transfer errors:   0
    RX transfers:      15
    TX transfers:      17
CAN1 status:
    HW errors: 36684
    IO errors: 36684
    RX frames: 15
    TX frames: 20
CAN2 status:
    HW errors: 169950
    IO errors: 169958
    RX frames: 0
    TX frames: 12
ESC actuators control groups: sub: 0 / req: 0 / fds: 2
ESC mixer: NONE
Sensor 'gnss':
RX errors: 0, using old Fix: 1, receiver node id: N/A

Sensor 'mag':
devname: /dev/mag
channel 0: empty
channel 1: empty
channel 2: empty
channel 3: empty
channel 4: empty

Sensor 'baro':
devname: /dev/baro
channel 0: empty
channel 1: empty
channel 2: empty
channel 3: empty
channel 4: empty

Online nodes (Node ID, Health, Mode):
     50 OK         OPERAT 
mhkabir commented 6 years ago

CAN1 status: HW errors: 36684 IO errors: 36684 RX frames: 15 TX frames: 20

These hardware errors are something that needs auditing. We are recently seeing this on all Pixhawks, and after a complete hardware swap-out, it still persists. This was not the case on older releases. These hardware errors on UAVCAN issue seems to be specific to Pixhawk. With exact same setup of CAN peripherals (Zubax GNSS v2), if I use the Babel (Zubax USB-CAN) instead of the Pixhawk, I don't get any hwerrors on its CAN interface.

@pavel-kirienko @BryanMonti

bryanmonti commented 6 years ago

tx_error_babel_gnss

These hardware errors on UAVCAN issue seems to be specific to Pixhawk.

I am seeing the same results here between a GNSSv2.2 on latest fw (4.1) and on 0.9.0 UAVCAN GUI Tool release.

davids5 commented 6 years ago

Is there termination on the peripherals?

mhkabir commented 6 years ago

Yes, all my tests were with a terminated bus. I swapped out peripherals, wires and terminators to make sure it wasn't a hardware problem.

davids5 commented 6 years ago

Is this a start up issue (counts rise only at start up) or does it persist?

dagar commented 6 years ago

@mhkabir last known good release?

bryanmonti commented 6 years ago

@davids5 It persists

mhkabir commented 6 years ago

It persists. Hardware error count keeps going up, although things seem to be otherwise "normal".

mhkabir commented 6 years ago

@dagar Not sure. It doesn't work on 1.7, I confirmed. @bryanmonti Can you try one of the older releases and see when it worked?

dagar commented 6 years ago

That likely corresponds with the big NuttX update.

thomasgubler commented 6 years ago

--> It seems that the issue we see probably is not related to fmuv5

@DanielePettenuzzo is going to verify the state on pixracer to check if we see the same as @mhkabir

LorenzMeier commented 6 years ago

@dpettenuzzo Please update issues with intermediate findings even if there is no conclusion yet. Thanks!

davids5 commented 6 years ago

@LorenzMeier I have done some limited testing. (I need to build up and program some more UAVCAN hw) but on my setup whatever port I connect my ESC (CAN1, CAN2), the IO/HW errors stop counting and RX/TX frames count.

nsh> uavcan status
Pool allocator status:
        Capacity hard/soft: 500/250 blocks
        Reserved:  77 blocks
        Allocated: 19 blocks
UAVCAN node status:
        Internal failures: 0
        Transfer errors:   0
        RX transfers:      57
        TX transfers:      552
CAN1 status:
        HW errors: 0
        IO errors: 0
        RX frames: 88
        TX frames: 3587
CAN2 status:
        HW errors: 21084
        IO errors: 24443
        RX frames: 0
        TX frames: 211
ESC actuators control groups: sub: 0 / req: 0 / fds: 2
ESC mixer: NONE
Sensor 'gnss':
RX errors: 0, using old Fix: 1, receiver node id: N/A

Sensor 'mag':
devname: /dev/mag
channel 0: empty
channel 1: empty
channel 2: empty
channel 3: empty
channel 4: empty

Sensor 'baro':
devname: /dev/baro
channel 0: empty
channel 1: empty
channel 2: empty
channel 3: empty
channel 4: empty

Online nodes (Node ID, Health, Mode):
        125 OK         INIT
DanielePettenuzzo commented 6 years ago

Same for me I have 0 HW/IO errors on the port where I connect the ESCs.

nsh> uavcan status
Pool allocator status:
        Capacity hard/soft: 500/250 blocks
        Reserved:  90 blocks
        Allocated: 43 blocks
UAVCAN node status:
        Internal failures: 0
        Transfer errors:   0
        RX transfers:      485
        TX transfers:      15303
CAN1 status:
        HW errors: 94109
        IO errors: 95601
        RX frames: 0
        TX frames: 13653
CAN2 status:
        HW errors: 0
        IO errors: 0
        RX frames: 969
        TX frames: 15182

My CAN2 works fine but I'm having some problems on CAN1. If I connect to QGC after boot up I get a hardfault. If I don't open QGC CAN1 works fine. This doesn't happen on CAN2..

DanielePettenuzzo commented 6 years ago

@davids5 do you also have this behaviour?

davids5 commented 6 years ago

@DanielePettenuzzo - I get the hardfault when QGC connects. With the UAVCAN firmware server running (uavcan start fw I suspect that is the setting you have with automatic.)

It is coming from here: image

the cause is that _uavcan_open_request_list is null

But up to connecting QGC I have no HW ot IO errors! I have devices on both buses.

nsh> uavcan status
Pool allocator status:
        Capacity hard/soft: 500/250 blocks
        Reserved:  24 blocks
        Allocated: 6 blocks
UAVCAN node status:
        Internal failures: 0
        Transfer errors:   0
        RX transfers:      979
        TX transfers:      348
CAN1 status:
        HW errors: 0
        IO errors: 0
        RX frames: 5359
        TX frames: 2267
CAN2 status:
        HW errors: 0
        IO errors: 0
        RX frames: 0
        TX frames: 2267
ESC actuators control groups: sub: 0 / req: 0 / fds: 2
ESC mixer: NONE
Sensor 'gnss':
RX errors: 0, using old Fix: 1, receiver node id: N/A

Sensor 'mag':
devname: /dev/mag
channel 0: empty
channel 1: empty
channel 2: empty
channel 3: empty
channel 4: empty

Sensor 'baro':
devname: /dev/baro
channel 0: empty
channel 1: empty
channel 2: empty
channel 3: empty
channel 4: empty

Online nodes (Node ID, Health, Mode):
        124 CRIT       OPERAT
        125 OK         INIT

Perhaps the other posters have a bad cabling or have looped CAN1 to CAN2

I will need to know the exact equipment and how it is wired. To duplicate the setup and help debug it.

Also in case it is not mentioned anywhere, be advised that FW upgrade and dynamic node ID assignment ONLY will occur on the CAN1 bus. This is by design.

LorenzMeier commented 6 years ago

@DanielePettenuzzo Please continue Monday - the hardfault David identified is what you saw.

thomasgubler commented 6 years ago

See https://github.com/PX4/Firmware/issues/9648 and https://github.com/PX4/Firmware/pull/9652