betaflight / betaflight

Open Source Flight Controller Firmware
GNU General Public License v3.0
8.06k stars 2.87k forks source link

Gyro overflow causing high-speed yaw spin after crash #3959

Closed etracer65 closed 6 years ago

etracer65 commented 6 years ago

Opening this issue to consolidate investigation into a gyro overflow that is the apparent cause of the "yaw spin to the moon". Investigation so far has shown that the ICM206xx gyros are experiencing what looks like an internal overflow when an axis exceeds +-2000 degrees/second. This manifests as polarity reversals from full scale in the gyro output. All reported cases (so far) have involved ICM206xx gyros. Extensive high-speed testing with MPU6000 gyros (even exceeding 8000 deg/sec) has been unable to replicate and the gyro behaves normally.

Blackbox log representing the issue: https://www.rcgroups.com/forums/showpost.php?p=38163469&postcount=51207

Videos showing the behavior: https://www.youtube.com/watch?v=l6WPUEqIYX8 https://youtu.be/MJ-E1SsQDeM

Additional related discussions: https://github.com/betaflight/betaflight/pull/3909 https://github.com/betaflight/betaflight/pull/3950 https://github.com/betaflight/betaflight/issues/3893

Ozzy33 commented 6 years ago

My 206 gyro data did not show reversal, but I mentioned I dis-armed and rearmed (could this of reset a possible reversal), and probably should be drill tested.

etracer65 commented 6 years ago

@martinbudden I'm working on getting a flight controller with the ICM206xx series gyro for testing on the high-speed yaw rig. That way we'll at least be able to test any code in a controlled manner. For capturing the highest speed logs which is better? On-board SPI flash or SD card? I'm thinking we might need to capture the gyro data at 8K or maybe even test at 32K. Should only need to capture a few seconds at a time so 16MB of on-board should be sufficient.

@borisbstyle Do you have any contacts at Invensense that we can ask about the overflow > 2000dps behavior?

etracer65 commented 6 years ago

@Ozzy33 Do you have a blackbox log for this - sorry if I missed it earlier. The one where we saw the reversal actually had 2 reversals. The first almost immediately on the hit and then several seconds later it reversed again.

Ozzy33 commented 6 years ago

Log is attached, 17/25, it's only ~0.3s long. You could stack the two fc on top of each other for a direct comparison. I am pretty sure onboard SPI flash is the best for speed, BUT, let the BB guru's chime in. Rap4 Revolt 32-16-32 2206-2400 V2 5050x3 4S 8-10-2017 YAW SPIN OF DEATH after rearm log.zip

etracer65 commented 6 years ago

That's a weird log! The gyro is already at -1998dps when the logging starts. Not sure what to make of that unless gyro calibration failed. Unfortunately not enough data to see what's going on.

Ozzy33 commented 6 years ago

I have said many many times about that log, I crashed without BB running, it started YSTTM, I dis-armed, since this was not out of the park yet, I rearmed and enabled BB logging, then it got another boost to the moon and I dis-armed. First time out of many many moon trips i caught one on BB, just not the whole event.

etracer65 commented 6 years ago

Gyro was still over 2000dps and likely inverted on the re-arm. So the PID controller added positive reinforcement to the spin again.

gke commented 6 years ago

I suggest smart saturation to avoid the wraparound. old and new are the previous and current/prospective ray gyro readings.

Perhaps:

old = new = (abs(old) > 1800) && (sign(old) != sign(new)) ? old : new;

1800 nominal to stop false detection at -1..0.

ctzsnooze commented 6 years ago

@gke Clever!...tracking the change of direction in sign as it bounces off the limits! :-)

But... if gyro exceeded 1800, but then actually only got as high as 1850 and then recovered by itself... what would the above code do?

And would it work if the gyro sample rate was slow so that if the gyro went up really fast, say the FC saw +1700 then the next value is folded back to -1700 and the next -2000? Could the above algorithm deal with this.

I can see it being challenging to deal reliably with all possibilities. And if we get the sign wrong, our motors just make it worse.

In the meantime, simply ignoring incoming gyro yaw data (set value seen by FC to 0) whenever gyro reported rate >1500 should make the quad safe. I know it definitely will end up with correct sign eventually. Doing nothing that makes the motors spin up, which could make it worse, and in the meantime waiting for it to slow down eventually to <1500, that should be sufficient to get normal control back. Brief motor blips during reversals would occur but shouldn't matter greatly.

I am hoping that @ledvinap noticing that there is an overflow bit on the ICM20602 may help; perhaps whenever that bit is set we must ignore the gyro data (set value seen by FC to zero) until that bit clears.... I personally can only find an FIFO overflow register in the datasheet I have.

martinbudden commented 6 years ago

@ctzsnooze , where does @ledvinap mention an overflow bit? There is nothing about this in the ICM20602 datasheet.

ctzsnooze commented 6 years ago

https://github.com/betaflight/betaflight/pull/3950#issuecomment-324899518 But I see this is not gyro but acc. Oh well. :-)

gke commented 6 years ago

@ctzsnooze.

It is reasonable to expect there to be an out of range check and hardware flag for the accs as you can get extremely high accelerations way out of range quite easily. Just tap the ICM lightly with a hammer :). The logic for detecting that a shipping package to which an ICM is attached has been dropped and phone home is one of the markets. I guess manufacturers make a note of when you drop a cell phone for warranty (avoidance) purposes using their on board accs..

Now back to the problem.

The 1800 check is just to make sure you don't apply the check when you rate is jumping around zero. Without the check if you had a value of say -1 and then got a zero which is positive you would reject that value. So 1800 is arbitrary but has to be large enough for it to be an impossible change sign in one sample interval.

The motors will only act on gyro data that has actually been read so if the ICM registers are jumping around and we have not read them then we will not act on the values we have not read if you get what I mean. The BB log values need to be taken with caution if the compression is active.

If the value is jumping around a large +/- value then we use the sign of the one that first got us there. In some senses it is a little like the being at +/- 180 degrees when have to decide which way to turn to a new waypoint on an exact reciprocal heading. With noise we can end up in a metastable state around 180 degrees. In our case however we absolutely know that we cannot have gone from max positive rate to max negative rate or vice versa in one sampling period (Newton).

So we are only trying to detect and reject "impossible" changes in the yaw rate.

borisbstyle commented 6 years ago

I think the inversion is easier to detect than you all think. When inversion happens it is always 2000deg/sec and not 1999 or 1998.

Basically when gyro signal inverts suddenly from direction from any number directly to the highest you know the problem is there. So when you ignore that for a while it might recover itself

ctzsnooze commented 6 years ago

OK that's good news, thanks! I will be home in a couple of days can test an ICM20602 then.

Aileron8 commented 6 years ago

Makes sense. Although in scenario's where the inversion didn't occur right away I think there still might be the high P & built-up Iterm to contend with. Folks may still experience an e tended loss of control while avoiding the hyperspin unless the Iterm in yaw was clamped at say 200 or so. I'm hoping 32k avoids the issue altogether.

gke commented 6 years ago

Is it always just an inversion or does it keep going.

My simplest check is:

old = new = abs(old - new) > impossible ? old : new

Code this evening.

gke commented 6 years ago

Will probably work. Not bothered with pull etc.

void gyroUpdateSensor(gyroSensor_t *gyroSensor) {
    static int32_t gyroADCp[3] = { 0, 0, 0 };
    int32_t newGyro;

    if (!gyroSensor->gyroDev.readFn(&gyroSensor->gyroDev)) {

        return;
    }
    gyroSensor->gyroDev.dataReady = false;

    if (isGyroSensorCalibrationComplete(gyroSensor)) {
        // move gyro data into 32-bit variables to avoid overflows in calculations

        for (uint32_t a = X; a <= Z; a++) { // assume loop unroll opt

            newRawGyro = (int32_t) gyroSensor->gyroDev.gyroADCRaw[a];
            if (abs(newRawGyro - gyroADCp[a]) > INT16_MAX)
                newRawGyro = gyroADCp[a];
            else
                gyroADCp[a] = newRawGyro;

            gyroSensor->gyroDev.gyroADC[a] = newRawGyro
                    - (int32_t) gyroSensor->gyroDev.gyroZero[a];

        }
etc.
martinbudden commented 6 years ago

@gke , we don't have enough data about the values the gyro outputs in a crash to be able to write code at that level of detail. In particular:

  1. When the gyro overflows it appears to change from a value of +1996 to -1996 (or vice versa): a the comparison with INT16_MAX won't detect that.
  2. We don't know how quickly the gyro overflows, so it may take several readings to go from +1996 to -1996, in which case a history of more than just the previous reading is required.

As an aside, the gyro history needs to be stored in the gyroSensor_t struct, not a static variable, otherwise it won't work when multiple gyro are used.

We need more blackbox logs from crashes before we can improve the code.

gke commented 6 years ago

Thanks Martin.

I think you will find that while the code does not check for the specific case it should find impossible changes to the gyro values and thus acts as a form of slew filter.

I did comment elsewhere that the BB code may be masking the change point with its compression. If the ICM DLPF is active yes it may be smeared over several values.

I do of course understand that the history variable should, for consistency, if for no other reason, should be in gyroSensor_t.

I offer this merely as a contribution with the encouragement of others who were concerned enough to PM me.

Cheers

borisbstyle commented 6 years ago

I will play with it this week to investigate the behaviour

gke commented 6 years ago

Cleaned up mods (not doing push/pulls):

https://github.com/gke/Betaflight-Patches

As a generalisation all sensors throw bad values from time to time.

ctzsnooze commented 6 years ago

@gke Am I correct in interpreting this suggestion like this: to hold the previous (presumably good) gyro value whenever the step change to the next value exceeds something which would be newtownianly impossible? And then hold the last 'known good' value indefinitely until a change occurs which is not impossible?

gke commented 6 years ago

@ctzsnooze

Yes exactly. The controller will eventually bring it back from the locked state - or should - once you centre the sticks. The problem may not be limited just to yaw.

+2000, -2000, -2000, -2000, .... +1950, +1600, +1200 deg/sec

will yield

+2000, +2000, -+2000, +2000, .... +1950, +1600, +1200

and so on.

Of course the suggested code is dealing with raw gyro values not deg/sec and it does not have to be at the gyro limits. It could be +1700, -1500, +1650 with the -1500 being rejected if the threshold was 3200. This corresponds to an enormous angular acceleration if it occurred between samples at 8KHz - 25M deg/s/s?

So you can set the "newton" constant to be anything you like. Better probably to set the constant high to capture the truly bizarre values as there are already parameters for max commanded rate from memory.

As you suggested elsewhere it should reject shocks like gate/branch hits possibly in conjunction with the acc. You don't want to set the threshold too low because of disturbances that the control D term needs to see. It just may however also be a way of doing gross filtering to clip prop wash noise energy getting into D.

Perhaps this is yet another GUI parameter ;).

A bit rambling but hopefully useful.

martinbudden commented 6 years ago

@gke , I think a slew filter is a useful addition. I've used your example as a basis for creating a slew filter, see PR https://github.com/betaflight/betaflight/pull/3983 .

I haven't yet incorporated the slew filter into the gyro code - that will be the next step.

gke commented 6 years ago

@martinbudden

Thanks

etracer65 commented 6 years ago

@martinbudden @borisbstyle I just received a flight controller with an ICM20608G gyro and updated my testing rig. The rig now has a double stack with both the original MPU6000 and the new ICM20608G gyros so I can collect simultaneous logs from both boards. Both flight controllers are Flip32 F4's and are identical except for the gyros. Both run the AirbotF4 target.

MPU6000 version: http://www.readytoflyquads.com/flip32-f4 ICM20608G version: http://www.readytoflyquads.com/flip32-f4-icm-edition-mpu-icm20608

Short story: I was able to replicate the gyro inversion on the ICM20608 on my first quick attempt. Both logs are attached. The MPU6000 behaves normally and maxes at 1998 deg/sec. The ICM20608 replicates what we saw from the earlier flight log and inverts initially on the acceleration past 2000 deg/sec, and then inverts again on the slow down as the rotation drops below 2000 deg/sec.

Seems easy to reproduce and hopefully this will be helpful to the Invensense contact.

mpu6000 test 1.BBL.zip icm20608g test 1.BBL.zip

etracer65 commented 6 years ago

I tested with the ICM20608 running at 32Khz and still saw the inversion.

icm20608g test 2 32khz.BBL.zip

Also, haven't fully characterized it yet but there seems to be a bug with 32Khz enabled and blackbox logging. When 32k is enabled it's impossible to select a logging rate greater than 1/4 of the PID loop rate. So trying to run at 32k/8k provides a maximum of 2Khz in the configurator drop-down. Not sure if it's a bug with the firmware or with the configurator.

screen shot 2017-08-31 at 7 13 30 pm

If 32k is disabled, then I can log up to the PID rate. This is with on-board flash.

Ozzy33 commented 6 years ago

You might try CLI commands to check if it's the BFC or BF. I would try it right now, but I don't have a spare FC laying around here.

martinbudden commented 6 years ago

@etracer65 , thanks for the new logs. I agree, this looks like a arithmetic overflow in the ICM20608 gyro, rather than any MEMS mechanical effect.

borisbstyle commented 6 years ago

@etracer65 Does it invert in both directions or only positive direction?

Can you please peform that test as well. That will help creating a guard against it.

And please log the RAW gyro data. Perhaps remove all filtering or enable debug gyro

borisbstyle commented 6 years ago

Also please enable 32khz and run bb log 1/1 rate to see every single sample.

You can eventually log with 16khz if bb log doesnt work at 32khz

ledvinap commented 6 years ago

@martinbudden: The noisy part close to extreme at high rotation speed is strange. I'm afraid that there is deeper problem in ICM decoding algorithm...

etracer65 commented 6 years ago

@borisbstyle Yes, the inversion occurs the same in both directions. The behavior seems the same regardless of the rotation dirction. I'm trying to test at 32khz but having some problems/confusion. I made a build of the current source because I know that @martinbudden had some fixes for the gyro logging. But whenever I enable 32khz mode the blackbox logging rate in the configurator can't be set above 2khz. I saw this earlier with RC3 so I don't think it's related to a recent commit. Possibly just a bug with the configurator? When the the gyro/PID are set to 32k/8k here's what the blackbox options show:

screen shot 2017-08-31 at 7 13 30 pm

I tried manually setting blackbox_p_denom = 1024 which I think is the value needed for 32k 1:1 (64 is 2k, 256 is 8k so 1024 should be 32k). This works and I collect logs but they seem to capture fewer samples than the logs from the MPU6000 logging at 8k (blackbox_p_denom=64). So I'm not sure it's capturing things correctly. Blackbox viewer says:

screen shot 2017-09-01 at 8 39 21 am

So what value should I set blackbox_p_denom to?

Also, what debug_mode do you want me to use? I tried the new GYRO_RAW and it completely breaks the blackbox viewer (even pulled down the current source). With GYRO_RAW you can't change any of the graphs because no values show up. Switching back to debug_mode = NOTCH gets it working again but I'm not sure if the debug values contain the raw gyro data.

martinbudden commented 6 years ago

@ledvinap , yes odd things are going on in the gyro, but the arithmetic overflow seems to be the worst problem.

@etracer65 , can you give PR https://github.com/betaflight/betaflight/pull/4038 a spin? (literally)

I've added an inversion filter on the yaw axis. Could you do a blackbox log with debug_mode = NOTCH? Thanks.

martinbudden commented 6 years ago

@etracer65 https://github.com/betaflight/blackbox-log-viewer/pull/83 should allow you to view log files with debug_mode = GYRO_RAW. Can you try it out?

etracer65 commented 6 years ago

@etracer65 betaflight/blackbox-log-viewer#83 should allow you to view log files with debug_mode = GYRO_RAW. Can you try it out?

Yes, I was able to open the log files with debug_mode = GYRO_RAW. Not sure the debug mode is working correctly though as the data doesn't make a lot of sense. The debug values seem to be randomly biased positive or negative even before the test starts. And it varies run to run and is even different between the two boards. Also the debug value don't display any numbers or calculated dps in the legend so not sure how useful the GYRO_RAW data is. Here are the logs from earlier this morning if you want to check. Note that these are not with PR 4038 - I'll be testing that soon.

I'm still not confident that the ICM20608 version is logging at 32k even though it looks like it. It seems to capture fewer samples and has a smaller file size than the MPU6000 version which is logging at 8k.

icm20608g test 3 32k GYRO_RAW.BBL.zip mpu6000 test 3 8k GYRO_RAW.BBL.zip

etracer65 commented 6 years ago

@martinbudden Here are the results from PR4038. I'm not seeing any real difference as the gyro still seems to track along with the raw gryo data and invert.

Log 1/5 - positive gyro spin Log 2/5 - negative gyro spin Log 3/5 - negative gyro spin (slower start) Log 4/5 - positive spin; stop; negative spin Log 5/5 - negative spin; stop; positive spin

icm20608 PR4038.BBL.zip mpu6000 PR4038.BBL.zip

martinbudden commented 6 years ago

@etracer65 , the gyro raw values are literally raw, that is without calibration, alignment and scaling. I wanted to see them just to confirm the overflow - the extreme values in the log are 32767 and -32768 which does indicate arithmetic overflow.

ctzsnooze commented 6 years ago

Hi Guys... did some testing where I would hang my problem quad upside down on a string, I'd then hit one arm really hard with a shoe and log it. Logs show it spinning for 7s in excess of 1982 deg/s (the stable overload value in my logs) without reversal. I estimate, based on rate of deceleration, was that peak roll rate was 2500 degrees/sec, it was above 1982 deg/s for at least 3 seconds. I see no inversion in these logs. It could be that inversion requires it to spin even faster than 2500 deg/s

ctzsnooze commented 6 years ago

I've put RC4 on and will see if I can replicate a YSTTM event today. I've also reversed props because I realise that in a sideways clip on the ground or gate as per my video, motor torque will enhance the spin rate of the arm touching the ground, and pull the quad closer and faster adding spin direction and pulling into the ground or gate. Reversed props should have the opposite effect of pushing away from the gate or ground and opposing the 'catch'.

martinbudden commented 6 years ago

@etracer65 , can you try PR https://github.com/betaflight/betaflight/pull/4040 ?

martinbudden commented 6 years ago

@ctzsnooze , if possible, can you try the latest Jenkins build rather than RC4? Can you also set debug_mode = NOTCH and also set crash_recovery = ON

ctzsnooze commented 6 years ago

OK will do. Today RC4 with crash recovery off did very well, had a couple of clips and no untoward outcomes. A mate on RC2 had a number of very active spins :-) Thanks for what you've done Martin. Will test the Jenkins build and crash recovery too.

etracer65 commented 6 years ago

@martinbudden Here are the tests with PR4040. I didn't see any specific CLI commands to enable extra functionality so I'm just running a nearly default setup (diff below). The gyro is tracking a little differently on the really noisy transitions, but overall it's still inverting. Also provided MPU6000 logs for comparison. They don't show any problems but they're captured simultaneously so no harm in providing them.

1/5: positive gyro spin 2/5: positive gyro spin with slower start 3/5: negative gyro spin 4/5: negative gyro spin with slower start 5/5: positive spin; stop; negative spin

icm20608 PR4040.BBL.zip mpu6000 PR4040.BBL.zip

diff: # diff # version # Betaflight / AIRBOTF4 (AIR4) 3.2.0 Sep 2 2017 / 07:51:08 (0898b3c) beeper -ON_USB serial 0 64 115200 57600 0 115200 aux 0 0 1 1775 1825 aux 1 26 3 1725 2100 aux 2 28 0 1775 2100 set gyro_use_32khz = ON set acc_hardware = NONE set baro_hardware = NONE set serialrx_provider = SBUS set blackbox_p_denom = 1024 set blackbox_device = SPIFLASH set motor_pwm_protocol = MULTISHOT

etracer65 commented 6 years ago

@martinbudden Were you ever able to get with the Invensense representative? If so, any feedback from him on the gyro behavior?

martinbudden commented 6 years ago

@etracer65 , thanks for the new logs. I'm surprised there is no change, I think there must be a mistake in my code somewhere, since there should be some change.

I was someone else who got in touch with Invensense - I asked them to put me in the look, but I haven't heard any further.

gke commented 6 years ago

I dropped out for a while - life interfered. I have no idea how you keep up with this Martin.

Checks are easy though - hang the aircraft on a strong string as Chris has done with another string to the ground with a large brick attached preferably using fisherman's swivels to stop the string shredding. Set yaw super rate to say 1.5 and give it full yaw and then release the stick. If the filter is working it will stop. Ignore BB for now as reality is the test. I like the shoe test BTW :).

When I did this stuff almost 20 years ago I reduced the roll/pitch authority and had the bottom strings from each arm to hold it level in roll/pitch.

mikeller commented 6 years ago

@ctzsnooze:

I would hang my problem quad upside down on a string, I'd then hit one arm really hard with a shoe

Gotta love your test set up, man. ;-)

martinbudden commented 6 years ago

@gke , feel free to submit a PR with your proposed changes and then @etracer65 can test it and we can have a look at the logs.

gke commented 6 years ago

@martinbudden OK I will try to get that done.

... done and on my gitHub. I need to set up a build environment (a pain) so the code has not yet been compiled. I will do a pull request when I get that done.