ArduPilot / ardupilot

ArduPlane, ArduCopter, ArduRover, ArduSub source
http://ardupilot.org/
GNU General Public License v3.0
11.02k stars 17.58k forks source link

Adding User Defined Sensor Health Check Mechanism to ArduPilot #12655

Closed YumaTheCompanion closed 3 years ago

YumaTheCompanion commented 5 years ago

Feature request

Is your feature request related to a problem? Please describe. The problem I'm having is related to the AHRS and EKF parts of the ArduPilot codebase. Specifically, adding error protection to optical flow and rangefinder sensors. I have noticed that if I introduce a faulty optical flow sensor or a rangefinder, what happens is that the I2C line continues communicating with the ArduPilot, however, the values received are always constant. When the sensors are working correctly, there is almost always a bit of noise on the values received. We must (?) add a warning or a prevention of some sort to notify the user or recover the vehicle if anything like that happens.

For example, in my case, when the values are constant, the EK2 does not notice this and the Copter starts drifting in AID_RELATIVE mode. This is pretty bad, if there is a Mavlink blackout with the device. (I'm losing the copter)

Describe the solution you'd like My current solution involves checking optical flow and rangefinder values for a given time limit, such as 100 seconds. That time limit is a moving window, new values are pushed from the front and popped from the back. A function then checks for equality and then if a fault is found, the EK2 state variables PV_AidingMode and PV_AidingModePrev are updated to AID_NONE. Which then checks for GPS to see if it is okay to go to AID_ABSOLUTE or not. This approach has some problems. First of all, the NavEKF2_core constructor is called twice, due to two IMU's being active. Hence, I need to make sure both cores switch to AID_NONE. This means, I have to introduce major changes to the codebase, which in time will not comply with the original codebase and a merge will be difficult. Another one is; GPS may not be available and the device will continue to drift, since we are in AID_NONE mode. However, let's say, I wanted to land the device, if anything bad happened to optical flow or rangefinder sensors. I can't do that without intoducing major changes to perhaps all files in "libraries/AP_NavEKF2" folder. This is a big problem. Another problem is that introducing this only in the code makes it hard coded, it needs to be parametric (as in Mavlink configurable), as in the case for most other ArduPilot behaviour. As you can see, it's just a lot of work writing and maintaining and documenting this.

I think we need to introduce a method, perhaps a user defined method, for sensor values that allows them to switch on/off from the system. The user defined methods would always be active, but, normally they would always return true (sensor is OK), unless a user modified them to behave differently. I don't think this approach has too much calculation overhead to the system yet provides a lot of flexibility in return.

Describe alternatives you've considered Other than providing user defined methods to check for sensor health, another option is to hard embed them into the codebase but that wouldn't be as flexible. However, it might do it's job just fine if the time limit of the window etc. can be configured on Mavlink.

Platform [ ] All [ ] AntennaTracker [x] Copter [ ] Plane [ ] Rover [ ] Submarine

Additional context My concern and testbed includes only the Copter platform however, all ArduPilot platforms may make use of such a feature. An example of a faulty rangefinder and optical flow devices can be listed as:

I have a working and faulty version of both of these. The faulty ones were subjected to extreme cold and arid environments. They were broken after that and then they now won't even work in room temperature.

peterbarker commented 4 years ago

So you have devices that, when they go faulty, are fully-functional electronically but return a constant value?

Why would you need a history of more than one value to detect that?

This sort of detection would probably want to be in the sensor driver itself - the sensor could mark itself unhealthy if it detects the driven device in this state.

YumaTheCompanion commented 4 years ago

My intention was to just give an example on how a "user-defined sensor error detector coding area" could prevent copter accidents whilst minimizing custom code impact.

History is useful in the sense that there is more to extract if needed. Anyhow, I still believe that this would be a useful feature, just like the "constant HZ user code area" permitted in the main codebase.

It's better if it is in the sensor driver, but, there are a ton of sensors with complicated parts and not all of their faults count as a bug [according to their manufacturers] but it counts as a bug according to us.

IamPete1 commented 3 years ago

I'm going to close this, I'm struggling to see the benefit of a generic solution. If there are know issues with sensors we should deal with them on a case by case basis.

One could implement a check of the type wanted in scripting I think, and then turn off the relevant sensor, but again if there is a reliable way to spot something going bad it should be in the main driver.