PX4 / PX4-Autopilot

PX4 Autopilot Software
https://px4.io
BSD 3-Clause "New" or "Revised" License
8.19k stars 13.37k forks source link

I2C bus workqueue or CLK Frequency issue on RDDRONE-FMUK66 #13300

Closed igalloway closed 2 years ago

igalloway commented 4 years ago

Describe the bug The following was reported by a HoverGames participant. q: i recall a sensor bus choking issue. is that present in 1.9.2? and corrected in later versions?

[10/24 4:06 AM] Matthias ViertlerDrone Orientation Problems Katrin Moritz, Jari Van Ewijk, Leo Mustafa, Iain Galloway, Audrey Eng

We successfully integrated the Melexis MLX90614 IR sensor using your src code and it worked very well when using it "on the ground", i.e. when not flying. However, whenever we added the IR sensor via I2C bus 1 and started flying in autonomous mode, the drone goes completely haywire and looses orientation. Please see the attached screenshots where you can see that the QGC has a certain path it intends to follow and also correctly records the - obviously wrong - flight path but apparently doesn't have a way to correct it. Whenever we detach the I2C sensor or even if I switch to fake-sensor-data where it doesn't send anything over the I2C bus 1 anymore, everything goes back to normal.

So it got me thinking... And the first thing after analyzing the boot-up log is to realize that also the GPS magnetometer or "external compass" IST8310 - which I assume is crucial for the drone's orientation - is also on I2C bus 1 (address 0x0E). Could it be that what we're seeing here are bus-collisions due to the additional requests coming from the IR sensor, which then prevents critical magnetometer sensor data reaching the EKF2 position estimator module?

Furthermore, the bus speed of I2C bus 1 decreased to 100KHz - due to the Melexis MLX90614 max speed being 100KHz, whereas all the other components are actually capable of doing 400KHz.

So my plan was to await the Panasonic AMG88 IR sensor which is capable of the fast-mode I2C up to 400KHz an see, whether a higher bus speed could prevent bus-collisions happening despite another IR sensor being connected via I2C. It arrived this week and integration was straight-forward, immediately delivering the IR sensor data for both, pixel and ambient temperature. However, it still didn't make the I2C bus 1 switch to 400KHz.

Turns out, also the also the rgbled1 component - which I'm sure to have seen connected via I2C bus 2 already in the past (maybe different PX4 FW version?) - now limits the bus to 100KHz.

I've tried a lot: turning off the rgbled module in the ROMFS init.d startup script "rcS", switching to bus number 2 (maybe there's another one on the board and since I'm convinced that I saw the leds being started on bus2 before and showing the IST8310 with bus-speed of 400KHz),

additionally switching-off the rgbled_ncp5623c and rgbled_pwm, etc. but nothing led the I2C bus 1 "going back to" fast mode speed of 400KHz anymore.

Could it be I2C:WorkQueue related? Another person already complained about this but they never addressed it because for him, using a new cable fixed the issue - see here. There might be a conclusion here but I'll have to dig into it.

Question is, if anyone of you already came across this and has an answer for us?

image

image

igalloway commented 4 years ago

Hi Iain Galloway, yes I'm using PX4 v1.9.2 stable TAG as my basis for all my own git branches. How recently? Because the v1.9.2 is only published since e/o June I believe and right now there's only some experimental branches for TAG v1.10.0beta pushed on-top.

First the good news: I managed to resolve it by forcing the I2C bus 1 CLK frequency to 400KHz in the AMG88 constructor using the I2C::set_bus_clock(unsigned bus, unsigned clock_hz) method, after the I2C superclass constructor was called. It only works if I also disable the RGB LED which would otherwise limit the bus speed to 100KHz as discussed here by Lorenz Maier as well.

As I mentioned, this allows to run the I2C bus 1 now at 400KHz but it's kind of a dirty hack since by right, the I2C class itself is meanwhile meant to decide the bus frequency and should set the highest speed that's supported by all components on the bus, automatically. This is where we might want to get in touch with the NuttX / PX4 guys. Because I'm not sure whether it's a bug in the I2C implementation (it doesn't switch to 400KHz despite all components supporting it) or whether there's one/some more component(s) which is at 100KHz but that is not shown in the boot-up log.

Anyways, it makes me think you might want to also re-consider the design since IMO it's not a good idea to:

having a system critical component such as the IST8310 magnetometer (which is the external and typically, main magnetometer of the system) on the same I2C bus with additional, external components that might lead to bus collisions sacrifice the I2C bus speed just because of the RGB LEDs which, frankly, are not even that helpful (I've never used/needed them since the Arming-LED is anyhow mounted on the GPS itself and the rest of the information can be seen via QGC and the audio sounds).

Regarding your request for the flight-logs: I have changed the logging via a script on the SD card to only log my air-quality/temperature/humidity/gps for the HoverGames challenge and to keep it as small as possible because I wanted to elaborate if I can fetch it in real time via the Telemetry / MAVLink channel. I'm pretty sure I could reproduce it for you, though. However, for now I believe it's best that I'll go for another test flight with the 400KHz I2C bus 1 that won't interfere anymore with the IST8310 magnetometer and hence, should again allow a stable flight. Attached, you can also see that the AMG88 IR sensor does report bus collisions ("no data received" error messages) whenever I switch back to 100KHz. Those completely stopped when I switched to 400KHz. image

image

igalloway commented 4 years ago

[10/25 2:26 AM] Matthias Viertler

Just back from the test flight and we confirm our theory: with the AMG88 sensor attached at 400KHz, the ITS8310 also operating at 400KHz and the RGB-LED switched off, the drone flies perfectly fine in both modes, manually & autonomously. No more wrong directions, nothing.

Tested two times with different missions the autonomous flight mode and also flying manually worked well. Mid-flight sensor data was received well, too - I was using the listener via MAVLink shell in QGC.

dagar commented 4 years ago

It's possible the new driver is blocking in the work queue thread preventing further ist8310 magnetometer readings (work item cycles).

dagar commented 4 years ago

Getting output from top, perf and work_queue status (newer PX4) can be helpful.

igalloway commented 4 years ago

@davids5 Could you look at the above - i am not clear if that sensor bus error that was exposed recently was in 1.9?

igalloway commented 4 years ago

Thanks @dagar I'll ask Matthias V to connect here for more investigation

davids5 commented 4 years ago

@igalloway -

As you indicated the slowest device on the bus sets the max bus speed. This is set by system design here https://github.com/PX4/Firmware/blob/master/src/drivers/boards/common/board_common.h#L114-L126

To override it define BOARD_I2C_BUS_CLOCK_INIT in the board_config.h file with N entries listed per bus. Where N == the number of I2C buses.

What are all the I2C devices on each bus on the drone?

MatthiasViertler commented 4 years ago

@davids5 The I2C bus members which I can see from the boot log is I2C1: work queue wq:I2C1, IST8310 magentometer, PX4FLOW, RGB-LED (manually disabled to make 400KHz mode work) and our IR sensor AMG88 or MLX90614. I2C2: work queue wq:I2C2, BMM150, BMP280_I2C and MPL3115A2_I2C.

I haven't tried your instructions from earlier about the board_config.h but what did the trick for me was to force the bus speed to 400KHz using the method I2C::set_bus_clock(bus, clock_hz) during instantiation of my IR sensor object (AMG88 in this case). But what I don't understand is why the I2C bus 1 isn't at 400KHz bus clk speed but at 100KHz despite all (visible) components indicate 400KHz support. Is it the work queue that limits the speed here?

The most-concerning part, however, is that whenever you add a component via (the only) available external I2C connection - the IR sensor in our case - and only poll for sensor data with roughly 1Hz on the same I2C bus as the IST8310 magnetometer, it seems sufficient to disturb the position-estimator/controlelr/navigator control-system enough to crash the drone in autonomous flight.

Would it be possible to operate the system-critical external magnetometer via the GPS-module's interface UART2 /dev/tty3?

Also: would you agree to move the RGB-LED to I2C2 for future designs so at least one I2C bus remains operable at 400KHz?

PS: We followed the advice [here] (https://docs.px4.io/v1.9.0/en/getting_started/sensor_selection.html#gps_compass) and disabled the internal compass since we always got the "sensor data inconsistent" warning/error which prevented arming our drone.

davids5 commented 4 years ago

@MatthiasViertler The question of what Hardware was not: What does the software think is there. It is what is connected. This is because if you run the buss faster than a connected device and tolerate it may misinterpret the date from the bus and cause issues.

Does the list you posted match the actual devices? Do each of the devices (not the device the driver reports, the actual IC on the board/peripherals) support 400kHz?

I haven't tried your instructions from earlier about the board_config.h but what did the trick for me was to force the bus speed to 400KHz using the method I2C::set_bus_clock(bus, clock_hz) during instantiation of my IR sensor object (AMG88 in this case). But what I don't understand is why the I2C bus 1 isn't at 400KHz bus clk speed but at 100KHz despite all (visible) components indicate 400KHz support. Is it the work queue that limits the speed here?

It may be an issue with the NuttX driver. Use the debugger add a printf and look at what the requested frequency and the actual frequency is,

The most-concerning part, however, is that whenever you add a component via (the only) available external I2C connection - the IR sensor in our case - and only poll for sensor data with roughly 1Hz on the same I2C bus as the IST8310 magnetometer, it seems sufficient to disturb the position-estimator/controlelr/navigator control-system enough to crash the drone in autonomous flight.

The K66 has limited I2C buses. Given the list of devices on I2C0 (reported as I2C1) I would guess you have over committed that bus or have a contention. You can verify that with a scope.

Would it be possible to operate the system-critical external magnetometer via the GPS-module's interface UART2 /dev/tty3?

It is an I2C device on that interface. It can not be a serial device. It is on I2C0 image

Why not move the Flow and AMG88 to Serial 2 and configure it for I2C in the build (defconfig and board.h)?

image

Also: would you agree to move the RGB-LED to I2C2 for future designs so at least one I2C bus remains operable at 400KHz?

This is using a HolyBro GPS, Magnetometer & Button and LED. So it not possible.

The external compass is the always better than an internal. I would suggest you get a scope on the bus and determine the root cause of the problem. If the bus is over committed then move the flow and the AMG88 to Serial 2.

MatthiasViertler commented 4 years ago

HI @davids5 & @dagar - thank you for all your answers & sorry for my late reply. Last week we were working on integrating Pixy2 via SPI3 where we are also facing an issue (see https://github.com/PX4/Firmware/issues/13364) and since the AMG88 IR sensor works fine for us on I2C0 in 400kHz with RGB-LED disabled, we thought to stick to it for our HoverGames challenge submission at first and will try to (if there's time left) integrate the MLX90614 100kHz component via our NXP Rapid IoT "companion computer" connected on TELEM2 (UART) via MAVLink.

@dagar:

Getting output from top, perf and work_queue status (newer PX4) can be helpful.

Do you need me to provide it while flying or just when the IR sensor is attached on I2C0? If while flying it could be a painful experience again :O FYI, we're not running our IR Sensor driver as a work-queue component but as a dedicated, own, task - do you think that is a problem?

@davids5:

To override it define BOARD_I2C_BUS_CLOCK_INIT in the board_config.h file with N entries listed per bus. Where N == the number of I2C buses.

I confirm that this works for us, if changing the initial speed to 400kHz in the board_config.h also the IST8310 is already at 400kHz directly at boot-up (no need for forcing the bus using I2C::set_bus_clock().

What are all the I2C devices on each bus on the drone?

I'm not a FMUK66 HW expert, from the block diagram I can see: I2C0: RGB-LED, PX4FLOW, OPTIONAL: IR-Sensors AMG88 | MLX90614; I2C1: MPL3115A2 Pressure Sensor, BMP280 Barometer, BMM150 Magnetometer; I2C2: ? @igalloway could you answer this?

The K66 has limited I2C buses. Given the list of devices on I2C0 (reported as I2C1) I would guess you have over committed that bus or have a contention. You can verify that with a scope.

Yes, this is what I think, too - I'll verify using an Osci/logic-analyzer later this month after we are done with some other topics since this IR sensor is not connected in our current solution.

The external compass is the always better than an internal. I would suggest you get a scope on the bus and determine the root cause of the problem. If the bus is over committed then move the flow and the AMG88 to Serial 2.

Understand. That's why I wanted you also to be aware that IMO it might be good to come up with some further protection (if possible) of system-critical components such as the IST8310 when connected on externally-exposed buses such as I2C0 in this case. We have to see how easily we can integrate the AMG88 in our NXP Rapid IoT companion which is currently connected to Serial2/TELEM2.

stale[bot] commented 4 years ago

This issue has been automatically marked as stale because it has not had recent activity. Thank you for your contributions.

robotastic commented 4 years ago

I am using the PX4Flow with the FMUK66, connected over I2C. I am trying to get it to get it to hover indoors without GPS. It will work for a period of time but then start to drift. I have noticed the following message in the logs (5-6 times): [load_mon] wq:I2C1 low on stack! (220 bytes left) https://logs.px4.io/plot_app?log=b4769c8a-85e3-430c-9eff-b5344e1ea291 Is this a sign that the I2C bus is getting overwhelmed? Is it possible that some messages from the PX4Flow are not getting to the FMU? Would adjusting the bus rate help?

davids5 commented 4 years ago

@robotastic

[load_mon] wq:I2C1 low on stack! (220 bytes left) Is this a sign that the I2C bus is getting overwhelmed?

It may be but doubtful. The default margin is 300 bytes. So this is just an indicator that the stack on the wq:I2C1 has to be increased by 80-90 bytes.

stale[bot] commented 4 years ago

This issue has been automatically marked as stale because it has not had recent activity. Thank you for your contributions.

ali20480 commented 3 years ago

@robotastic how did you manage to get the PX4Flow working for even a short period of time with the FMUK66? I have tried everything I could but each time in altitude mode or in position mode it says: "no local positioning". But the sensors are running though (I checked them in MAVlink inspector). So each time the drone takes off, at some point it sais: "no local positioning" and comes down. Here is a log:

https://logs.px4.io/plot_app?log=12fae489-194f-463f-a71c-eea4371e4fa6

Please help me, I am quite a beginner and I have been struggling with PX4flow for a long time now... Best regards