PX4 / PX4-Autopilot

PX4 Autopilot Software
https://px4.io
BSD 3-Clause "New" or "Revised" License
8.23k stars 13.4k forks source link

MPU9250 magnetometer flight testing #3953

Closed kd0aij closed 8 years ago

kd0aij commented 8 years ago

@Inspirati after calibrating, I'm seeing either 12 or 20 degrees of error in the compass heading displayed in QGC. Question: without GPS, does QGC display the magnetic heading, or the true heading (corrected for declination)?

LorenzMeier commented 8 years ago

@kd0aij True heading only with GPS, since that really only reliably works if you know where you are.

kd0aij commented 8 years ago

after gps lock the true heading error is closer to 9 degrees.

kd0aij commented 8 years ago

latest master with MPU9250 support for pixracer: http://logs.uaventure.com/view/xoFHKE78iLJv4ZAPVZn2Ji

Performance not so good this time. GPS but no external mag. Nearly 90 degree yaw on takeoff, noticeable yaw reaction to throttle. Fairly rapid yaw drift at times nearly 90. Entering POSCTL resulted in fairly rapid flight out of test area so I was unable to check for toilet-bowling.

The glitches in the magnetometer data are probably degrading the calibration, and drift could also be a result of that problem. This log shows a lot more spikes than my first one.

LorenzMeier commented 8 years ago

The spikes here look like a bus or driver level problem: http://logs.uaventure.com/view/xoFHKE78iLJv4ZAPVZn2Ji#Thrust_PLOT

kd0aij commented 8 years ago

test flight using proposed workaround: https://github.com/Inspirati/Firmware/commit/cadcae2e0fc4882133424a2295c3ab455c363c2c log: http://logs.uaventure.com/view/CyDChHPzryT5xo7ArnTxte

looked OK on the bench; no drift and no spikes. But in stabilized flight it seemed as if the yaw drift was no better than before, perhaps worse. no drift at all in acro mode, so it's definitely the compass causing the drift

kd0aij commented 8 years ago

@Inspirati I just added a link (above) to the log from the workaround test. It looks like there are spikes (and other interesting behavior) at 203 to 205 seconds elapsed.

glitches

kd0aij commented 8 years ago

@Inspirati Are you checking the overflow flag in 8963 register st2 ?

Inspirati commented 8 years ago

@kd0aij It is being read, however given there is no real mechanism for reporting it or acting in any special way, I don't explicitly test for it. Even if an overflow was to occur, I would not expect it to lead to the LSB error we are seeing.

Inspirati commented 8 years ago

I've put a test for the overflow flag in my development build to count the number of times it is set and watching for it using the 'mag' systemcmd. Despite seeing the occasional glitch occur, the flag is never being set. Unfortunately I don't have any scary magnets around to see if I can force it - fridge magnet is doing nothing interesting.

Inspirati commented 8 years ago

@kd0aij "at 203 to 205 seconds elapsed" - I was rather anticipating something like this to become evident, as my quick fix was really just that for the moment, and thought to be useful only for purposes of assessing the mags drift characteristics. Knowing that a proper work-around seems to get confounding very quickly, I've been focussing my efforts on actually getting rid of the 'bit-9' error altogether. I believe the quick fix has at least verified my hypothesis that the glitch is such a single bit error, which I suspect is pseudo randomly evident in the least significant bit of any byte being read. This'll also explain all the trouble I had with stray data-ready flags during my early development of the driver. I'm now testing another idea that the bit error may be dependent on the state of the second least significant bit, which would help explain why glitches are only evident in certain orientations.

kd0aij commented 8 years ago

@Inspirati @LorenzMeier I just realized that I don't understand how sensor data is selected and utilized for attitude control. Can you tell from the log data whether this flight testing was a valid test of behavior with the 8963 mag? I'm unable to follow the sensor priority and voting logic in the source, and can't make sense of the sensor IDs and "primary IDs"

kd0aij commented 8 years ago

@Inspirati This is with your latest commit to https://github.com/Inspirati/Firmware/tree/mpu9250_mag 9250-only flight test. No problems with yaw drift in stabilize or acro modes http://logs.uaventure.com/view/NhsomSnFjyNjFgV7okCLE8

kd0aij commented 8 years ago

just a few degrees of deviation in yaw measurement with fairly large thrust variation in stabilized mode: (yaw in degrees)

9250_thrust_yaw

kd0aij commented 8 years ago

for reference, no yaw deviation with very large throttle variation when in acro mode: (yaw in degrees) acro_thrust_yaw

kd0aij commented 8 years ago

@LorenzMeier @tumbili @Inspirati Please check whether I have the mag traces labeled correctly. The hmc5883 driver is started first, and the orb instance assigned to it is zero.

Comparison of HMC5983 and AK8963 data in flight, both onboard a PixRacer in a new 450 class quadcopter. Mag0 is the 5983, 1 is the 8963. The X traces are not offset, the Z traces are offset by +1.5 and the Y traces are offset by -1.0. (units are gauss and seconds)

Much higher noise levels on the 5983. Attitude controller was using the 5983 for yaw correction. Zero rudder input and zero observed yaw drift over the 2 minute period from 140 to 270 seconds elapsed. Except for the intermittent spikes in the 8963 X and Y values, the data looks good compared to the 5983.

dualmag http://logs.uaventure.com/view/jhQpZugzZ5DGZSfFzXF2TB

kd0aij commented 8 years ago

another log comparing 5983 (green) to AK8963 (red). The 8963 looks good except for the glitches.

glitches

pkocmoud commented 8 years ago

5983 is quite jittery. Would smoothing improve anything? Is there a consensus on the preferred frequency of the MAG data stream?

kd0aij commented 8 years ago

possibly, but this issue is about the AK8963

LorenzMeier commented 8 years ago

Have we tried reading the AK slower? Are we checking the error code of the I2C state machine? I'm relatively confident that the issue is with how the AK is read or configured, and not really about SPI or NuttX. Is it much slower or low passed?

pkocmoud commented 8 years ago

We are working with Invensense to get it sorted out. They maintain there is no hardware problem that would cause this issue.

LorenzMeier commented 8 years ago

I'm not saying there is. It might just be the way the internal I2C state machine is run. But because that's within the MPU this is where we should dig. The most likely reasons are timing and error checking. It would be good to know if the I2C transfer returns an OK error code even if the bit flip occurred.

LorenzMeier commented 8 years ago

I have a very specific request: could you hack the code to output the raw value and not apply the offsets? I'd like to see if the bit flips are associated to a specific value and are the result of incorrect two's complement format conversion. It's interesting to see that only the two axes closest to zero are ever affected.

And check if the axes are read individually or consecutively and try to switch to either one.

kd0aij commented 8 years ago

@Inspirati is working on this

Inspirati commented 8 years ago

I've been working with a new cut down driver 'hack' in an effort to answer these questions:

https://github.com/Inspirati/Firmware/tree/mpu9250_hack

I'd previously found that the issue is the low bit occasionally being erroneously set, which in two's complement explains the +ve/-ve glitches.

From my earlier observations with QGC charts, it's not just the two axes 'closest to zero' (however i may be missing your point there), but rather it can affect any of the axes, and at any value. This had me working on a hypothesis that erroneously set bit may be only occurring when the second least significant bit was set, however this proved fruitless.

In my new branch I have a program/driver, 'm', which by default runs a very tight loop continuously polling the mpu9250 data registers. When run in this manner, glitches do not occur. However when run any slower, or from a timer, they again become apparent.

There was a reason I'd not tried reading the axes data in other than a single block, as I believe the oversampling model relies on fetching all the results in a single read, in order to be able to detect and drop the duplicates. I'm going to revisit that now.

Inspirati commented 8 years ago

i can avoid the glitching issue by polling the device constantly from a foreground process. therefore i am pretty confident that the hardware is not at fault. now just to determine why it all goes south when only queried periodically, ie, from the timer.

LorenzMeier commented 8 years ago

@Inspirati I'd like to help to nail this down. It would be fantastic to make the 9250 prime-time ready.

Inspirati commented 8 years ago

@kd0aij @pkocmoud @LorenzMeier I have published a reworked version of the mpu9250 driver at [https://github.com/Inspirati/Firmware/tree/mpu9250_mag_new] This revision does no longer polls the magnetometer data ready flag before reading the axes data. It seems to work much better on the bench with no glitching apparent as yet. Please give it a go and let me know of any issues.

There is also a development testing hack version of the driver, called 'm', in the branch [https://github.com/Inspirati/Firmware/tree/mpu9250_hack] This tool gives a lot of visibility on how the data streams from the magnetometer to the mpu9250, and makes the existence of the glitch quite apparent and also depicts how the previous version of the driver actually fetched a combination of new and old magnetometer axes values. Please read the code at the end of Firmware/src/drivers/m/main.cpp for options on how to invoke - or just ask!

pkocmoud commented 8 years ago

Thanks Rob.

kd0aij commented 8 years ago

@Inspirati It sounds like non-atomic reads of multi-byte values may have been the problem. I'll do another test flight in the snow today with my new waterproof quad and post a flight log.

kd0aij commented 8 years ago

Here's a bench test with motors running and both 5983 and 9250 mags logged. Still a few spikes on the 9250 z axis, but there's also one anomalous looking spike on the 5983 x and y axes around 51 seconds. Those spike have magnitude similar to the spikes from the 9250, so one might argue that the 9250 looks at least as good as the 5983 in that respect.

http://logs.uaventure.com/view/URwmPTgzdj6eagtp3yRtkY

Red = 5983, blue = 9250 (PixRacer R7 / AquaQuad)

8983_9250_mag

kd0aij commented 8 years ago

(copied from #4075) (Using a different (R12) PixRacer, problems were with my older R7) Bench test of latest master with 3 magnetometers. I'm sure one of them is in the 9250 :) http://logs.uaventure.com/view/DxiMAoWQhuVzuamSfw4zXo

I only noticed 1 spike in FlightPlot.

kd0aij commented 8 years ago

@Inspirati Any comments on the logs above? What's your opinion of the AK8963 vs. the 8983 data quality there?

Inspirati commented 8 years ago

@kd0aij I think the logmuncher chart for Magnetic Field Strength looks very clean, however I am not sure how this depicts the perforance of one mag versus another. Is this data not from the highest priority published device, and therefore may not be portraying the performance of the mpu9250 mag at all?

As for the chart above, seeing as we now see glitches from both devices, perhaps the isssue is at a higher layer than the device drivers themselves. Overall I'd say the blue (9250) trace looks significantly cleaner (quieter) than that from the red (8983). BTW: don't you mean (HMC)5983?

kd0aij commented 8 years ago

You have to use FlightPlot to plot the values for the 8963. Its data is logged in IMU1.MagX/Y/Z as the legend in my plot shows. Logmuncher is only plotting values for IMU 0. @LorenzMeier We need better definition of requirements for the internal magnetometer: currently it is not clear when/if an internal magnetometer is necessary, and what fall-back functionality it is required to support.

LorenzMeier commented 8 years ago

@kd0aij I looked at the log and the readings look fine up to a scale difference. I'll write up in a sec how sensor selection works and how to work out which sensor is active where.

plot

LorenzMeier commented 8 years ago

Seems like we're fine now with this sensor for now.