PX4 / PX4-Autopilot

PX4 Autopilot Software
https://px4.io
BSD 3-Clause "New" or "Revised" License
7.86k stars 13.24k forks source link

Complete parameter loss on reboot #16147

Open ryanjAA opened 3 years ago

ryanjAA commented 3 years ago

When doing a compass calibration and rebooting, all parameters were lost. Even vehicle setup was required.

This doesn’t happen very often but was seen in several occasions and on two different systems/vehicles. One was just about to be flown which grounded it until all could be reset and resolved.

Not sure how this can be recreated since it happens when it happens but the two vehicles were being operated several weeks apart so it’s not something that was continuous. Reloading the parameters that were saved and doing a compass calibration (and other sensor cal) again did not lose the parameters immediately after that subsequent reboot which makes recreating this difficult.

v1.11.1 on fmu-v3 (Pixhawk 1) and QGC stable 4.10. No custom code on these at all.

lukegluke commented 3 years ago

I had total parameter losses several times when debugging without bootloader and there was parameters storing on boot (#15618). As I saw in PX4 code on parameters update there is a 300 ms delay before writing to mtd and also no more frequently than every 2s. I don't know how compass calibration procedure works, but probably it is rebooting to fast after changing parameters that can lead to invalid (cleared but not yet written back) mtd block. In this case param load returns -1 on boot.

As for me, this problem is very vital, I would like to see for instance an option to use additional mtd param partition for backup that is used if first one is invalid. By the way, @dagar, as I understand if param_default_file was null, a internal flash mcu would be used for storing params, right? If it used anywhere in PX4?

ryanjAA commented 3 years ago

Thanks @lukegluke ! That actually makes a lot of sense since when doing sensor calibration, often times it will finish the specific cal but the green bar showing progress on the top of qgc isn’t completely finished yet which I presume is also when writing the params is done. Since with mag cal it tells you to reboot right there on the screen that is a perfect recipe for rebooting and doing so at just the wrong time could trigger what you’ve explained.

I think then also maybe it makes sense to look at not populating this reboot until the parameters are written or something similar with the calibration logic at the end in general. This could maybe be something simple like a delay in QGC but I’ve occasionally seen long times to have that green status bar finish up depending on the link type I’m on.

Definitely something that can be progressed on. How is tbd.

dagar commented 3 years ago

By the way, @dagar, as I understand if param_default_file was null, a internal flash mcu would be used for storing params, right? If it used anywhere in PX4?

There are boards that have flash based parameters, but that isn't used as a fallback. Most boards have an FRAM for parameter storage, but if it fails to start the param system will use the SD card.

I'd like to first address whatever the bug is here, but secondly I'm also interested in saving parameters to the SD card as a backup.

Can you reproduce the parameter loss and immediately do some debugging in that session?

nsh> param status
nsh> param show CAL*
nsh> mtd status
nsh> mtd readtest
nsh> nsh> cat /proc/partitions
nsh> dmesg # only available on newer boards

I'm trying to see if the problem is the calibration parameters not being committed, or if your non-volatile parameter storage (FRAM) is failing to start after the reboot.

lukegluke commented 3 years ago

@dagar by the way, IIRC, on reboot there is a code that waits logger module to stops correctly. I suppose it is good thing to have similar waiting for parameters backend to not interrupt it incorrectly.

lukegluke commented 3 years ago

Can you reproduce the parameter loss and immediately do some debugging in that session?

Just in case, @ryanjAA this Daniel's question was for you.

lukegluke commented 3 years ago

I suppose it is good thing to have similar waiting for parameters backend to not interrupt it incorrectly.

I've reviewed code - I missed that parameters backend uses px4_shutdown_lock() quite a long already, so there should not be such problems I supposed in first comment.

As I see there theoretically could be a problem losing latest value of parameters in case if parameters lib is waiting 2s after previous saving, but board gets reboot request. But it surely doesn't lead to total parameter losses issue. (if this case is important, probably it could be fixed with registering shutdown hook)

By the way @dagar could you please explain why you revert 9963bb6c40925e5facf183045e1fc359b9e7d959 with a152fb65cb2f9aba1679847952f28071fca51776? Additional shutdown lock in param_save_default as you wrote then "ensures the parameter file is closed properly before shutdown", but the lock only in param_export can't guarantee properly file closing.

ryanjAA commented 3 years ago

This just started happening again after several flights. Disarmed only then reboot then no parameters or anything, no airframe setup either. After talking with @dagar I'm going to try and recreate it without any parameter changes as well while watching from console.

ryanjAA commented 3 years ago

Ive tried rebooting over and over while connected to console. I can't reproduce it. Im on master.

I will try on 1.11.3 release.

Here are the screenshots from above:

nsh> param status nsh> param show CAL* nsh> mtd status nsh> mtd readtest nsh> cat /proc/partitions nsh> dmesg

Screen Shot 2021-01-28 at 10 02 35 PM Screen Shot 2021-01-28 at 10 02 20 PM Screen Shot 2021-01-28 at 10 02 07 PM Screen Shot 2021-01-28 at 10 01 54 PM Screen Shot 2021-01-28 at 10 01 41 PM