ArduPilot / ardupilot

ArduPlane, ArduCopter, ArduRover, ArduSub source
http://ardupilot.org/
GNU General Public License v3.0
11.08k stars 17.65k forks source link

AP_IOMCU: status read error counter never resets #28534

Open RickReeser opened 3 weeks ago

RickReeser commented 3 weeks ago

Bug report

Copter 4.5.6 CubeOrange; CubeOrangePlus

If I leave my vehicle turned on for a long time, I inevitably encounter this prearm error:

AP: PreArm: Internal errors 0x1000 1:469 iomcu_reset error

Which points to this line: https://github.com/ArduPilot/ardupilot/blob/Copter-4.5/libraries/AP_IOMCU/AP_IOMCU.cpp#L469

An overnight disarmed log of IOMCU errors looks like this: image

So I guess the status read errors are accruing until they hit the limit of 20, triggering the prearm error. The comments in the code indicate that they are supposed to reset upon a successful read:

    if (read_status_ok == 0) {
        // reset error count on first good read
        read_status_errors = 0;
    }

This never happens because read_status_ok is never 0. I guess this line is supposed to be read_status_ok > 0? This would cause the errors to reset (maybe the conditional is not needed at all since we return early if the read is not successful).

However, this would cause another issue. read_status_errors is used elsewhere:

bool AP_IOMCU::healthy(void)
{
    return crc_is_ok && protocol_fail_count == 0 && !detected_io_reset && read_status_errors < read_status_ok/128U;
}

Resetting the read errors will make the last check useless, because read_status_ok is always incrementing and will be some arbitrarily large number. It should be compared to the total read errors ever seen, so I guess we should track the total errors and errors-since-last-success separately.

This check was added by this commit.

Version Copter 4.5.6

Platform [ x ] All

peterbarker commented 1 week ago

@RickReeser did you want to put together a PR with your proposal?

RickReeser commented 6 days ago

@RickReeser did you want to put together a PR with your proposal?

https://github.com/ArduPilot/ardupilot/pull/28725

I have been testing this on 3.5.6 for a few weeks now, everything seems good.

There is one difference between this PR and my tests, though, and that is the logging: this PR changes the logging to match the comments (the counter resets upon successful read). But in my test code, I left it to accumulate so that I could verify that the prearm error did not appear once it reached 20+ counts.