Airspeed selector issues

ThomasRigi commented 3 years ago

Describe the bug @sfuhrer @dagar @RomanBapst I found several issues (of different importance) with the airspeed selector / validator. I could reproduce parts in SITL, but not everything. As usual, I only know about quadplane VTOLs. Sorry in advance for the long report, here a TL;DR: switching from not using airspeed sensor to using airspeed sensor completely cuts FW throttle. Ground_minus_wind estimate is not working properly. In v1.11.2 you could easily trigger false triggers on airspeed validity, which has improved since.

on v1.11.2, you can consistently trigger the airspeed fallback when transitioning with side wind and just after the transition turn into tail wind. If I understood correctly, it's because the wind estimator hasn't converged to the actual wind and as you turn into tailwind the groundspeed (and therefore also ground-minus-wind) increases fast with constant airspeed. We've had it appear several times on a real drone, and I could also consistently reproduce it in SITL. On v1.12-beta and master (of 31Mar) I was no longer able to consistently reproduce the switching in SITL. I imagine that's thanks to https://github.com/PX4/PX4-Autopilot/pull/16564.
Still on v1.11.2, we almost lost a drone (quadchute recovered it) when the airspeed validator decided after 100s (as the parameter was set) that the airspeed sensor was actually good and switched back to it. At this moment the throttle was cut completely to 0 and only ramped up really slowly afterwards. After 7s it was back at 40%, at which point the drone stalled. (cruise is around 70%). This to me is the most critical part of this issue report. Unfortunately I wasn't able to reproduce a switch back to using the airspeed on SITL, so it's hard to understand why the cut in throttle happened. Maybe someone of you has an idea?
On v1.12-beta and master the ground-minus-wind-speed isn't working properly. The airspeed selector wind estimators don't converge to the good value. The EKF ones work much better. Maybe linked to https://github.com/PX4/PX4-Autopilot/pull/16857 ?

Logs, screenshots and more details

v1.11.2, real drone flying in relatively high winds (around 10m/s): https://logs.px4.io/plot_app?log=597040b8-9015-4432-b178-a5c008bb028a. This is the log where at 12:01 the throttle is cut to 0 until it stalls: The altitude was below the setpoint and the airspeed only marginally above (initially). There is no reason to cut thrust in this case. There's probably a bad initialisation somewhere, maybe in TECS?

From the same log, here the airspeed and wind plots: You can see how the ground_minus_wind diverges from the real airspeed around t=510s. You can also see that the airspeed sensor was fine throughout the flight.

Here a log where I tried to reproduce everything on v1.11.2 SITL with HEADLESS=1 make px4_sitl gazebo_standard_vtol__windy : https://logs.px4.io/plot_app?log=d1f835d0-396b-49e3-9f8a-cc9c887e19ba

Wind conditions are

      <windDirectionMean>0 1 0</windDirectionMean>
      <windVelocityMean>8.0</windVelocityMean>

The initial faulty airspeed switch is exactly the same, but I couldn't get it to switch back to using the airspeed sensor (only when back transitioning), which was my main goal :(. wind_estimate.00 is ok, but wind_estimate.01 is way off.

Moving on to point 3), here the logs of v1.12-beta and master where I tried to trigger the same faulty switch, but couldn't. I tried to have more reactive parameters for simpler switching, but didn't work (which in itself is good :)). But as I couldn't prove otherwise we have to assume that the no-throttle-bug is still present. Also, note that the ground_minus_wind is completely off because of a not properly working wind estimate. v1.12-beta: https://logs.px4.io/plot_app?log=a91e4480-b50e-4d0e-ba56-3b00fbc10024 The wind topic correctly converges to the simulated wind, but the airspeed_wind.00/01 topics do not. Consequently, also the ground_minus_wind estimate is completely off.

Linked to this last point, I've seen that on master the parameter ASPD_FALLBACK_GW has replaced ASPD_FALLBACK. I'm not sure if I understand correctly how it works. For me, the behaviour with ASPD_FALLBACK = 0 (to other airspeed if available, otherwise fixed-thrust) is exactly what we want. Do I have the same behaviour with ASPD_FALLBACK_GW = 0 ? The description is much less clear to me. If you disable this fallback, does it mean it keeps using the faulty airspeed measurements or go to fixed thrust or what? Thanks for clarifying :)

sfuhrer commented 3 years ago

@ThomasRigi thanks for the detailed report. I'm off this week, but will gladly have in in-depth look when I'm back. Let's go for the lowest hanging fruits together.

Kim-zhi-jiang commented 3 years ago

@ThomasRigi More than 10 minutes then stopped today v-master, not logs, only this 2021-04-15 09-51-56 的屏幕截图

ThomasRigi commented 3 years ago

@Kim-zhi-jiang Not sure I follow what's happening in your case and how it's linked to the airspeed validator module.

From what I see you had a loss of global position out of nowhere, which is a severe issue, as well as a baro failure. It's certainly worth investigating. Can you open a different issue for it please? Also, we need more info to be able to reproduce it and your log would help a great deal.

Kim-zhi-jiang commented 3 years ago

@ThomasRigi Very concerned your case. This case has always occurred in v-master and logs not saved successfully. will test again next week. if also then write new issue.

sfuhrer commented 3 years ago

@ThomasRigi I now had a closer look: Concerning your point 1) (failure false-positive when transitioning with tailwind) That indeed is a valid concern, and I don't see a straight forward solution right away beside making the checks less sensitive over all. The issue is that the wind estimator used for the estimate of ground-windspeed is rather slow to converge, much slower than if also airspeed sensor data is fused. After some time, the data from all 3 estimators usually is close together. As an example, that's a recent flight in higher wind with a firmware state of roughly after https://github.com/PX4/PX4-Autopilot/pull/16564). Instance 0 is beta fusion only, 1 is beta and airspeed and 2 is the wind estimate from EKF.

We could

factor in the current variance of the wind estimate, and increase the allowed difference from airspeed sensor reading to estimated airspeed in case of high uncertainty (need to fix the variance init first though, it now begins at 0 what doesn't make sense ofc)
make the check less sensitive. To some extend that was already done with https://github.com/PX4/PX4-Autopilot/pull/16564/commits/b102abb0c3f0f4a1accd7eb56f567ca2245f6cbd

on 2) sounds concerning, but without log hard to troubleshoot, and never experienced myself. It further sounds more like a TECS issue, just to not mix things too much here. If you can't share a log file then I could try to reproduce myself.

3) https://github.com/PX4/PX4-Autopilot/pull/16857 should only have been a renaming. As I don't 100% trust gazebo for wind related things: have you also got to the same result on real vehicles, that the wind estimates were not converging to a similar value? And did it get worse recently (either on the simulator or the real vehicle).

4) ASPD_FALLBACK_GW was introduced to make clear it's only concerning the fallback to the (very experimental) estimate of groundspeed-windspeed. Having it at 0 gives you fallback to alternative airspeed sensors if present, or then to airspeedless mode.

What wasn't brought up yet but is the biggest concern right now is that with the current structure, you can't have online airspeed scale estimate and checks at the same time. For me this has the biggest priority (beside fixing obvious bugs if detected). A wrong scale introduces all kind of issues, from stalling aircrafts to false positives in the airspeed checks bc the wind estimate is off.

ThomasRigi commented 3 years ago

@sfuhrer thanks for looking into it.

On 1) it sounds like a good idea to factor in the variance of the estimate. If what you use to trigger a failure detection has a larger uncertainty that the quantity you measure it doesn't make too much sense. Also, as I seem to understand from your plots (I'm not really much of an estimation expert), it can currently take close to 200s for the beta fusion only estimator to converge ? That's a really slow convergence compared to the other two instances. I'd be almost tempted to use the EKF instance to help with initial convergence (probably still a very bad idea). Is there some other way to speed up convergence?

Additionally, if we could initialise the airspeed wind estimator already during transition and not only when it's finished we could gain some additional seconds for convergence.

I also tested with the new default value of ASPD_FS_INTEG, but still had the same false positives on v1.11 (can't find the log anymore...). I think the main robustification here was part of #16564 which somehow fixed the false positives I could previously reproduce in SITL

On 2), I actually shared the log of the real flight where the throttle was cut:

v1.11.2, real drone flying in relatively high winds (around 10m/s): https://logs.px4.io/plot_app?log=597040b8-9015-4432-b178-a5c008bb028a. This is the log where at 12:01 the throttle is cut to 0 until it stalls:

I agree it looks like something with TECS goes wrong, but it must be linked to the airspeed measurements being somehow used again, or I was just extremely unlucky to have to unlinked problems at exactly the same time. It's almost as if the cruise throttle parameter was suddenly ignored when it switched from fixed-thrust to using the airspeed sensor again. :bulb: I might have just had an idea on how we can try to reproduce it independently of the airspeed validator: maybe it's triggerable by switching ASPD_PRIMARY from -1 to 1 and if not we know that it has to be strongly linked with the airspeed validator. I'll see if I get somewhere with this approach.

And on 3) maybe it's also linked to the slow convergence of 1). Though it doesn't seem to be converging at all tbh. I don't really know if it got worse recently as with the renaming and multi-EKF instances it got quite hard to compare previous to later versions. I'll see if I can fly soon-ish with a real drone on v1.12-beta3 in windy conditions.

I wasn't aware of your last concern as we typically set ASPD_SCALE by hand upon initial validation on our drones and then it's good enough for a while. But how can the scale affect the wind estimate used in the validator? I thought that was the whole point of only using sideslip

ThomasRigi commented 3 years ago

On 2), I SITL tested by manually changing ASPD_PRIMARY (no effect) and FW_ARSP_MODE (effect on v1.11, fixed on v1.12).

v1.11.3: https://logs.px4.io/plot_app?log=be0c07e0-b3a0-406e-a5ed-f2e92ce53322 Towards the end of the log, when I set FW_ARSP_MODE from 1 back to 0 (from fixed thrust to normal airspeed-based mode), the throttle cuts the same as it did in our flight where the switch was induced by the airspeed validator.

v1.12-beta3: https://logs.px4.io/plot_app?log=3405daf6-e277-4d32-980b-336ab96de96f No cut in thrust observed when setting FW_ARSP_MODE back to 0. I assume (at least I hope) that this means it's also fixed for when the switch is triggered by the airspeed validator.

sfuhrer commented 3 years ago

it can currently take close to 200s for the beta fusion only estimator to converge

Yeah correct, in higher winds.

I'd be almost tempted to use the EKF instance to help with initial convergence (probably still a very bad idea). Is there some other way to speed up convergence?

Well we can't really use the airspeed estimate to fuse any wind estimate, also not the ekf one, if an sensor issue was detected. We could check how well the EKF wind estimate performs (how fast it converges) if it's also only fused by beta (EKF2_ARSP_THR to 0). I'll also play a bit with the over all wind estimator tuning when we next time have nice windy conditions.

On 2), I actually shared the log of the real flight where the throttle was cut:

My bad sorry. Had a look at it, it's due to the TECS throttle integrator filling up during the airspeed less mode

And I think that in turn is related to something else I fixed since, that in airspeed less mode, the TECS has a "current airspeed" that's offset from the airspeed setpoint, creating a constant error that somehow ends up in the integrator, but not directly in the throttle output..or somewhere around that. Could use verification, but I think it should look better with a more recent firmware.

. But how can the scale affect the wind estimate used in the validator? I thought that was the whole point of only using sideslip

It will affect the wind estimate that responsible for the airspeed checks (making it more likely to trigger an invalid airspeed sensor), and the ekf wind estimate (that's used e.g. for dead reckoning. Just the wind estimate that can be used as as fallback airspeed estimation (airspeed = wind-ground) is not fused by the airspeed sensor. It's quite confusing I know, and adding documentation is overdue on my part...

ThomasRigi commented 3 years ago

Well we can't really use the airspeed estimate to fuse any wind estimate, also not the ekf one, if an sensor issue was detected.

Yes it's clear you shouldn't use it throughout the flight. I only thought to use it for initialisation, for the first 10s or so (Idk what a reasonable value would be) - and hope that there is no initial issue with the sensor. Like this in most cases you should have a reasonable "initial" value for the airspeedless fusion to start working with. And you could also use the variances of the EKF to initialise the airspeedless fusion, maybe increase them further as it's probably less precise without airspeed fusion. To me this could still be a gain to having 200s of convergence problems during which the airspeed validator doesn't really work well. But yeah, if you start with an airspeed problem this could still screw you up... But not doing it when you don't have a problem apparently can also screw you up...

Thanks for the insights in TECS, and I indeed think it's fixed by now. Here again the log where I manually switched to fixed thrust:

v1.12-beta3: https://logs.px4.io/plot_app?log=3405daf6-e277-4d32-980b-336ab96de96f and didn't have the thrust issue anymore. I didn't look into the details of TECS, but from a global perspective it certainly looks fixed :)

On a side note, I now have two drones ready with v1.12-beta3. If we go fly in windy weather I'll share my logs here.

ThomasRigi commented 3 years ago

Yesterday in high wind conditions (baseline around 10-12m/s, higher gusts, close to the platform's limit) we managed to provoke a false airspeed failure trigger on the third flight of the day: (v1.12.0) https://logs.px4.io/plot_app?log=f5e332fe-4e10-4f17-9446-9e8793d22238 It took longer than on v1.11, but it still happened.

For reference, the previous flights where it didn't happen (but still interesting to check the wind estimates' convergence) https://logs.px4.io/plot_app?log=1138b10f-325d-48da-bad5-23d23db21eb5 https://logs.px4.io/plot_app?log=3304228d-2a03-43ed-8bcc-ffa304911661

Apparently there was a logging issue, the airspeed_wind topic is only sampled at very low rate in two of the logs. :/

ThomasRigi commented 3 years ago

Some things I was thinking about: 1) on the previous logs the airspeed_wind topic was always reporting slower wind speeds than the wind topic, and I have reasons to believe that the latter was more correct. (it was really windy on the terrain, weather forecast was above 10m/s and higher gusts and also the max-min groundspeed indicate more of a 12m/s wind strength). 2) If we know that the airspeed_wind estimate is slow to correctly initialise, wouldn't it make sense to not run this check for the first say 60s? 3) following 2), I was going through the code and I think https://github.com/PX4/PX4-Autopilot/blob/master/src/modules/airspeed_selector/AirspeedValidator.cpp#L149 should be set to true. If the estimator is not initialised for long enough the checks shouldn't be run, i.e. they shouldn't fail. 4) if flying with TECS, it should be possible to use some of that information to check if the airspeed is reasonable. For example, if flying straight at cruise airspeed the cruise throttle should be around FW_THR_CRUISE. If it is consistently and significantly higher or lower, then you very likely have some problem with the airspeed sensor (or with the FW power train...). Not sure about the implementation though, just an idea that popped up in my head. I mean that's basically what you rely on when flying in fixed thrust mode.

sfuhrer commented 3 years ago

Thanks for sharing the logs and ideas, and sorry for the late answer, was on holidays.

About your logs: it's pity that only one of them has airspeed_wind properly logged. On that one (it's the second link I think) I see two issues with the wind estimate of the airspeed validator: initialization seems off as windspeed_east stays close to 0, and then after some time the two components are more or less exactly the same until the estimator resets itself. Can you give me the hash of the px4 you're based on?

About your points: 1) That's likely related to the issues I noted above. On our vehicle (with slightly updated estimator) we don't see that discrepancy. A good way to check the wind estimate is by comparing groundspeed (absolute, so sqrt(vx²+vy²)) to TAS.

2) "slow to correctly initialise" --> we need to be careful with talking about the same thing: the initialization happens immediately, so it can't be slow, just correct or wrong. But then the convergence of the filter can be fast/slow. If we deem it necessary (with a revisited wind estimator), then yes, we can either increase the waiting time (it's now 10s) do it based on variance.

3) "If the estimator is not initialised for long enough the checks shouldn't be run, i.e. they shouldn't fail." Isn't that exactly what this line does? :thinking:

4) That's an idea for an additional airspeed validity check, though I don't really like the cross dependency on a FW tuning variable (cruise throttle) and, like you say, it's hard to distinguish from power train issues..

ThomasRigi commented 3 years ago

The PX4 version is v1.12.0 with some custom mixers and a quick change to keep the pusher active after quadchutes, hence the fork. All the relevant parts of the code are exactly the same as v1.12.0.

That's what I meant. Slow to converge, thanks for correcting my wording. It's true that the initialisation of the estimator is immediate. Working more with the variance makes sense I think, but it would need to be initialised with a higher value. Here the logged variance is lower in the beginning than during "steady state", which doesn't really look correct to me.
Oops my bad, I got confused with the negation of a failed check. Yes, that's exactly what this line does.

sfuhrer commented 3 years ago

Here the logged variance is lower in the beginning than during "steady state", which doesn't really look correct to me.

Yes, I'm looking into that. Don't know how big the effect in the end will be though.

PX4 / PX4-Autopilot

Airspeed selector issues #17365