ArduPilot / ardupilot

ArduPlane, ArduCopter, ArduRover, ArduSub source
http://ardupilot.org/
GNU General Public License v3.0
11.02k stars 17.57k forks source link

EKF: suppress lane switch warning #24181

Open rmackay9 opened 1 year ago

rmackay9 commented 1 year ago

Users often receive "EKF Lane Switch" messages but there isn't much they can actually do about it. We should consider suppressing this message or at least providing a way for OEMs to suppress the message.

timtuxworth commented 1 year ago

Absolutely! It should default off for most users, and only be logged, or (perhaps) if we had some kind of DEBUG mode where people who know what this means want to turn it on they want to know this.

rishabsingh3003 commented 1 year ago

I think the issue is more pronounced because, in the current lane-switching algorithm, we get a lot of false positives and make lane changes that were completely unnecessary. This has been proven by several logs in the past. The better way ahead IMO would be to reduce the false lane switching instead of suppressing the message entirely.

A user should probably know when a lane switch has happened because it maybe a real fault in some sensor which should be fixed instead of them never knowing something like this happened. We need more REPLAY logs to understand what is truly happening.

timtuxworth commented 1 year ago

A user should probably know when a lane switch has happened because it maybe a real fault in some sensor which should be fixed instead of them never knowing something like this happened. We need more REPLAY logs to understand what is truly happening.

Why would you have to do something if a lane switch is the correct solution to a real time issue that actually fixed the problem? Isn't that lane switching performing "as intended"? If so - why does the user get a CRITICAL ERROR message announced by the GCS (on QGC - using it's "urgent voice")?

There is nothing the user can do during flight, at very least reduce it to an informational message that is not announced.

rishabsingh3003 commented 1 year ago

@timtuxworth sure. But a Lane switch can also just as easily be caused by a real sensor malfunction, which a user should know about. There has been a case where I had sensor affinity setup for multiple compasses in a copter, and there was a legitimate failure. I wouldn't have bothered to look in my logs without that GCS voice.

I do not deny that the excessive current lane switching causes users to worry necessarily ( And that's why the algorithm needs further work.). We should do something to make it sound less threatening. I think we can right away lower the severity number and that should make things better.

But there is a point to be made against removing them completely.

timtuxworth commented 1 year ago

@rishabsingh3003 You have no idea how frustrating it is to read your clear and simple explanation of what "EKF Lane switch" means to me as a pilot and what I should do about it after more than 18 months of flying ArduPilot.

Up until now, all I knew was that "something happened", it may or may not be a problem, most of the time I can ignore it, and most of the time fixing other messages (such as Compass alignment, or AHRS messages) make it go away. I'm not clear that it meaningfully adds anything useful since other messages will have given more useful information about which sensor(s) is wrong.

What this means to me is that the message "EKF Lane Switch" is an extremely poor message in its wording, frequency and classification (Critical). All of these things need to be fixed.

pompecukor commented 1 year ago

@timtuxworth sure. But a Lane switch can also just as easily be caused by a real sensor malfunction, which a user should know about. There has been a case where I had sensor affinity setup for multiple compasses in a copter, and there was a legitimate failure. I wouldn't have bothered to look in my logs without that GCS voice.

I do not deny that the excessive current lane switching causes users to worry necessarily ( And that's why the algorithm needs further work.). We should do something to make it sound less threatening. I think we can right away lower the severity number and that should make things better.

But there is a point to be made against removing them completely.

You are an expert user. Most end users are not. Giving them such messages is useless. I am ok with lane switch happening. But I am not ok with the report. That is really an exceptional circumstance where you got a lane switch due to a failed compass. I mean it is really rear for a compass to fail. I am sure we can agree on that. However I respect that you like it and you can (try) to work on it doing less switching. However I do not see short term a case where it works as should but not switch. We have hundreds of aircraft with users in the field. Flying everyday. I am yet to see a case where such lane switch warning helped us look at a log and realized we needed to fix something. Or a case where such lane switch warning helped us identify a failed or failing hardware. Hence I do not see a world where the report would be useful. I think your example is the exception (expert user and compass failure). As @timtuxworth suggested, we could just have an option to have it disabled to report. Then the very few users like you that find it handy can turn it on for themselves.

andyp1per commented 1 year ago

Lane switch can be caused by vibration - in that case it is telling you something important that you can do something about. I'm not sure I agree that it is useless.

pompecukor commented 1 year ago

Lane switch can be caused by vibration - in that case it is telling you something important that you can do something about. I'm not sure I agree that it is useless.

I did not say it is completely useless. I am saying it is MOSTLY useless. That is a fact. From real life. We build lots of drones and we have never had this report lead to anything useful. Or lead us to determine something about to fail or such.

Hence the request to have at least the option to disable it for the wider crowd.

rishabsingh3003 commented 1 year ago

@pompecukor, a REPLAY log would really be helpful in this case. It would tell us how in your use case, the switching is unnecessary and, with some code changes, what would have actually happened in your flight.

pompecukor commented 1 year ago

@pompecukor, a REPLAY log would really be helpful in this case. It would tell us how in your use case, the switching is unnecessary and, with some code changes, what would have actually happened in your flight.

As I described to you the switch in some of our cases was not unnecessary, what has always been unnecessary is the reporting of it.

  1. Nothing the pilot could do about it except panic
  2. there was no HW failure that needed to be sorted after the flight, the issue was momentary.

I don't want you to eliminate lane switching of due to dual airspeed sensor being on affinity. I just want users not to have to hear about it during the flight. The switch is good and it is normal even though a false positive. The other regular one is GPS.

In all cases I see the reason behind the switch and makes sense too.

I think part of the issue is you two are developers and locked in that mindset. That is

  1. you know what you are doing and what is going on in the cost.
  2. you tend patchwork drones with cheap hardware ( :p)

Neither of that is typically true for your standard end user of commercial/industrial ArduPilot based UAVs. They tent to be more standardized.

Edit: I will write the example that I gave in the close forum here, so other can read: On one of our drones we have the two airspeed sensor on the two wings quite out. So there is a large seperation. This in case there is a sharp turn especially in strong wind will cause a large (but understandable) deviation that we are always guaranteed a lane switch. So even though it is sort of a false positive. We still want it to happen. As in a real case of one failing it will be just as snappy at switching to the better one.

What I don't understand is why you have a problem with giving us the option? You can still enable the report for yourself if you want to. And you can improve the code slowly but surely to perhaps initiate less lane switching,if it is infact found that there are many instances where it should not have switched.

rishabsingh3003 commented 1 year ago

@pompecukor I completely agree with all your points. Most of my HW failures are just poor soldering because I am in a hurry to get my cheap testing rig up :)

The REPLAY log would also tell us how and when to report a switch that has happened in your flight. All I am saying is that we need to build a collection of logs to understand how various people are using affinity and lane switching, when is it useful, when is it unnecessary. When do we need to report it, and when do we ignore it? When we do have some changes, it'll be great to validate against your logs so we know that your use case is also benefiting from those changes and we are not introducing more hassle. This is the only way, we as developers would be able to know how a commercial UAV like yours would like lane switching to behave.

pompecukor commented 1 year ago

@pompecukor I completely agree with all your points. Most of my HW failures are just poor soldering because I am in a hurry to get my cheap testing rig up :)

The REPLAY log would also tell us how and when to report a switch that has happened in your flight. All I am saying is that we need to build a collection of logs to understand how various people are using affinity and lane switching, when is it useful, when is it unnecessary. When do we need to report it, and when do we ignore it? When we do have some changes, it'll be great to validate against your logs so we know that your use case is also benefiting from those changes and we are not introducing more hassle. This is the only way, we as developers would be able to know how a commercial UAV like yours would like lane switching to behave. 👍

So the issue with your suggestion/request is that we cannot have our users fly around with log replay and log disarm on, just to get logs. That means using log from the past is not possible as those parameters are obviously not on. The only thing we can do it enable those params and make (artificial) logs for you. That is what I was trying to explain. What you will see is mostly what I already described, the AS sensor. And some GPS. But we can of course make sure logs, just that it is extra work. I mean going out to fly for 30mins-1hours with all of our aircrafts is not easy when we are busy with orders. I have tons of logs with the lane switch. The problem again is that I don't have any with replay and disarm.

rishabsingh3003 commented 1 year ago

@pompecukor no worries. I understand. If you get a chance in the future, it'll be great to have a replay log with your issues. Thanks

timtuxworth commented 1 year ago

We have hundreds of aircraft with users in the field. Flying everyday. I am yet to see a case where such lane switch warning helped us look at a log and realized we needed to fix something.

This was discussed in detail in the Dev call. Is it possible for you to share some example logs where the EKF Lane Switch message didn't add anything?

timtuxworth commented 1 year ago

Discord thread about how this message continues to cause confusion, especially for new users. In this case the user had done a perfectly good setup, but was inside a building getting these messages which did nothing to help resolve the issue.

Hielke — Yesterday at 2:12 PM I'm trying to arm my fixed wing (Pixhawk 6c + M8N GPS). I can force-arm it, but to properly arm it QGC says it fails multiple pre-arm checks, of which one is "AHRS: EKF3 Roll/Pitch inconsistent by 13 degrees".

1 [2:12 PM] I calibrated the sensors multiple times, but that doesn't solve it. Since the electronics is stuffed together in a small space I thought that maybe the wires over the pixhawk could cause interference. Does anybody know whether that could be true? Or does anyone have a different idea? All tips are welcome!

@Hielke I calibrated the sensors multiple times, but that doesn't solve it. Since the electronics is stuffed together in a small space I thought that maybe the wires over the pixhawk could cause interference. Does anybody know whether that could be true? Or does anyone have a different idea? All tips are welcome!

Tim Tuxworth — Yesterday at 2:20 PM I have similar issues. I have no idea what this means, or what to do about it. I have this on Zealot H743 and also on my old PixHawk 1-1M.

Hielke — Yesterday at 2:20 PM Ah, I now just found out that those degrees are the difference between the heading and the north. So if I change the direction of the plane, the "degrees inconsistency" change.

1

Tim Tuxworth — Yesterday at 2:21 PM So if you point it North it is ok? [2:22 PM] Oh but usually I am getting "Yaw inconsistent" not Roll/Pitch

Hielke — Yesterday at 2:25 PM Wait.. I first had the warning about the Roll/Pitch, and while I was moving the plane around the error changed to a yaw inconsistency. [2:25 PM]

[2:29 PM] And now I'm getting "Prearm: AHRS: not using configured AHRS type"

@Hielke And now I'm getting "Prearm: AHRS: not using configured AHRS type"

Tim Tuxworth — Yesterday at 2:29 PM That usually means your compass is off. Is your compass very near to your PDF or battery wires?

Hielke — Yesterday at 2:29 PM

[2:32 PM] Well yeah. Things are stuffed into it quite closely, but the GPS is on top: [2:32 PM]

Hielke — Yesterday at 2:37 PM Is that too close together? If so I will need to paste stuff to the outside of the fuselage, since there is no room inside anymore.

Rob F — Yesterday at 2:47 PM You might look at turning your gps 180 and moving it back on the fuselage. More distance and less drag. Just need to change the orientation.

Hielke — Yesterday at 2:55 PM I could move it further back, but that would also move it closer to the motor: [2:56 PM]

[2:57 PM] It's just a simple EasyStar3, my first plane ever.. [2:58 PM] I'm extremely excited about it and I want to make the maiden flight tomorrow. [2:58 PM] (to the others; sorry for all the spam)

@Hielke Click to see attachment

Tim Tuxworth — Yesterday at 4:12 PM There is also a compass likely in the flight controller. You might want to run the magfit tool to see if you can improve the calibration or perhaps disable one of them.

Henry Wurzburg — Yesterday at 6:12 PM the "EKF3 X inconsistency" messages mean that the attitude solution determined by EKF and DCM disagree....many possible sources, compass issues, gps issues... the 'not using configured AHRS type' means what it says....usually you want EKF3 to be the AHRS, but its ill and you have DCM active...hence not using the proper source...when EKF3 becomes healthy and it switches back to it, you can arm... (edited) [6:16 PM] you should almost always disable the autopilot's onboard compass...its usually too close to power leads....that external puck compass should be 6inch away, if possible, from any motor power carrying stuff and even then will benefit from a magfit after the first flight from the log data....that said...this being a plane not quadplane or copter, you can just totally disable ALL compasses....plane does not need it....just set COMPASS_ENABLE=0...you will have less problems and you can keep it mounted where it is...

1

Henry Wurzburg — Yesterday at 6:20 PM not seeing the total GSC screen, be sure you actually are detecting the GPS and its getting 3D lock...you may have to be outdoors for the lock to happen...if the GSC says NO GPS...then that is why EKF3 is not healthy for sure... July 7, 2023

Hielke — Today at 6:39 AM Today my plane flew its maiden flight. I tested fbwa, loiter, and even the custom flight mode I wrote using sitl testing (which basically shuts the motor off and glides back home). I'm a 40 year old dude, but I'm happy like a child today, haha. Thanks for the suggestions @Tim Tuxworth amd @Henry Wurzburg . In the end all problems where solved by just going outside, as it is with many problems in life..

1

@Hielke Today my plane flew its maiden flight. I tested fbwa, loiter, and even the custom flight mode I wrote using sitl testing (which basically shuts the motor off and glides back home). I'm a 40 year old dude, but I'm happy like a child today, haha. Thanks for the suggestions @Tim Tuxworth amd @Henry Wurzburg . In the end all problems where solved by just going outside, as it is with many problems in life..

rishabsingh3003 commented 1 year ago

@timtuxworth sorry, I don't see any mention of lane-switching errors. Henry accurately describes the actual issues the user faced.

timtuxworth commented 1 year ago

In my mind this is basically the same problem being discussed in #24243 - the user experience for non-developers is being spammed with information that is very interesting for developers, but less than helpful for the vast majority of pilots.

MichelleRos commented 1 year ago

@timtuxworth It's not something that's only interesting to developers. It's something that if you ignore it and don't check why it happened, you can end up losing your vehicle.

We can at the moment get false lane switches due to the scheduler etc. This is a code issue that people are working on. Once that's fixed, lane switches should only happen very rarely, and when they do, it should almost always be because you have a sensor error that you should land as soon as you can in order to troubleshoot and fix that or else risk the vehicle falling from the sky randomly.

And the fact that general users often don't know what "lane switch" means doesn't mean we should just stop warning users that something's wrong. It means users either need to do their own research to find out what the cause is, or we could change the message to be clearer or add a documentation page on how to find out what caused it, for example.

timtuxworth commented 1 year ago

And the fact that general users often don't know what "lane switch" means doesn't mean we should just stop warning users that something's wrong. It means users either need to do their own research to find out what the cause is, or we could change the message to be clearer or add a documentation page on how to find out what caused it, for example.

Thanks Michelle, this - in combination with a reduction in spurious messages, should make a big difference. But I have also heard what @pompecukor had to say - sometimes lane switches are expected/normal and the pilot does not need to know.

As I described to you the switch in some of our cases was not unnecessary, what has always been unnecessary is the reporting of it. Nothing the pilot could do about it except panic there was no HW failure that needed to be sorted after the flight, the issue was momentary.

I don't want you to eliminate lane switching of due to dual airspeed sensor being on affinity. I just want users not to have to hear about it during the flight. The switch is good and it is normal even though a false positive. The other regular one is GPS.

MichelleRos commented 1 year ago

Thanks Michelle, this - in combination with a reduction in spurious messages, should make a big difference. But I have also heard what @pompecukor had to say - sometimes lane switches are expected/normal and the pilot does not need to know.

What I'm saying is I don't think anything other than actual sensor errors should trigger a lane switch - so lane switches due to momentary issues should always be false switches. For example, this scheduling issue does cause the current lane to have a spike in innovations before the other lanes get the same spike, FWIU - the lane switch seems necessary because there were high innovations when the other lanes were fine, but it actually isn't because the innovation spikes should've all happened at the same time - no lane is actually better than the current one.

Think about the point of a lane switch: It's so that if one of your pitot tubes gets blocked then it switches to the other airspeed sensor. Or if an IMU dies then it can use a different one. It is not so that if you happen to get a momentary spike in innovations then it uses the other lane which is supposed to be equivalent.

I'd like to hear an example of when a lane switch is neccessary yet still something the pilot doesn't need to ever know about.

timtuxworth commented 1 year ago

I'd like to hear an example of when a lane switch is neccessary yet still something the pilot doesn't need to ever know about.

All you have to do is scroll up ...

Edit: I will write the example that I gave in the close forum here, so other can read: On one of our drones we have the two airspeed sensor on the two wings quite out. So there is a large seperation. This in case there is a sharp turn especially in strong wind will cause a large (but understandable) deviation that we are always guaranteed a lane switch. So even though it is sort of a false positive. We still want it to happen. As in a real case of one failing it will be just as snappy at switching to the better one.

timtuxworth commented 1 year ago

Also from my own experience with the T1 Ranger VTOL. It's almost impossible to position a compass on this tiny plane so you don't get interference, however it is helpful to have one when taking off in a Q mode, so the AP has some idea which way is which. Invariably as soon as you transition to a forward flight mode, the GPS will produce a better idea of the heading and a lane switch will be triggered. As far as I can see, it's exactly what you want, no need to panic. And no need for a scary message (yes it scared the bejesus out of me the first couple of times).

MichelleRos commented 1 year ago

I had already scrolled up and found no such examples.

On one of our drones we have the two airspeed sensor on the two wings quite out. So there is a large seperation.

That's another example of a switch that shouldn't be happening... Maybe the real fix for that is actually airspeed sensor offsets, so the EKF can expect that difference in airspeed readings and not get high innovations...

Invariably as soon as you transition to a forward flight mode, the GPS will produce a better idea of the heading and a lane switch will be triggered.

Not sure why switching from compass to GPS orientation actually causes a lane switch... AFAIK that's usually not two different lanes.

So again, I'm not convinced that that lane switch is supposed to be happening.

timtuxworth commented 1 year ago

I had already scrolled up and found no such examples.

This post https://github.com/ArduPilot/ardupilot/issues/24181#issuecomment-1616483600 from @pompecukor

MichelleRos commented 1 year ago

This post #24181 (comment) from @pompecukor

That's another example of a switch that shouldn't be happening... Maybe the real fix for that is actually airspeed sensor offsets, so the EKF can expect that difference in airspeed readings and not get high innovations...