OpenOrienteering / mapper

OpenOrienteering Mapper is a software for creating maps for the orienteering sport.
https://www.openorienteering.org/apps/mapper/
GNU General Public License v3.0
404 stars 108 forks source link

Crash combined with ceased autosave during field work #1995

Open mlerjen opened 3 years ago

mlerjen commented 3 years ago

2021-10-22-15-36-15

Steps to reproduce

Hi guys, last friday I had a bad incident. Mapper 0.9.5 on Android 4.4. crashed three times. Usually this is not that big of a deal as I have the autosave on 5 minutes and I thus can reconstruct the decisions just taken easily. But not friday... At the second crash around 15.20, at the restart of Mapper it turned out that the autosave had not been working for about two hours. :-( No idea why.

Actual behaviour

It was mostly vegetation mapping, which is hard to rebuild from memory. I had to revisit the area. The time loss is ok, But the mental energy wasted and the loss of confidence in Mapper was notable.

Expected behaviour

Would be nice we could prevent these incidents from happening or minimizing their damage to approx. zero. E.g. changes since the last save should be logged and kept in a second file in real time until they are merged to the main file or something like that.

Configuration

Mapper Version: 0.9.5 Operating System: Android 4.4.2

lpechacek commented 3 years ago

Crashes are bad. Crashes with data loss are even worse. Would you please capture a bug report from your device (https://developer.android.com/studio/debug/bug-report#bugreportdevice) and send it over to my e-mail, please? Don't attach the file here, it contains information bits that you might not like to share publicly. I'll search the records for traces of the bug that caused the immediate crash.

I have no idea what made the autosave stop working though. Are you using a large number of scribble layers, or large layers in general, on this project, please?

dg0yt commented 3 years ago

I'm sorry for the inconvenience. Android is troublesome in this regard, as it makes different decisions about application lifecycle than PCs. What appears as a crash might have triggers outside the app. You are using a fairly old device, and you are very likely to run out of memory (RAM) sooner or later, in particular with many templates. We can not completely control when and how this will hit the app, but it is well isolated from the stored files.

In any case, it is not obvious why autosave should have stopped working. Did you notice a changed setting (possibly as a side effect of a crash)? If autosave could stop working, even saving changes to a separate file could stop as well. You are saving to nextcloud folder. Can you exclude a background effect from nextcloud synchronization?

What we could do is to show messages if the file wasn't save in a while. With or without autosave, users could notice that they should take action.

Thanks to @lpechacek for offering debugging.

lpechacek commented 3 years ago

FTR, I've received a file that contains diagnostic data from @mlerjen. Unlike the newer bug report format, this one does not contain the application tombstones. Moreover, the records from Friday are mostly overwritten. However, there are two interesting clues. First, the log contains mentions about native app terminations:

DUMP OF SERVICE dropbox:
Unknown argument: -a
Drop box contents: 375 entries
...
2021-10-25 14:55:20 data_app_native_crash (text, 3335 bytes)
    Process: org.openorienteering.mapper/Flags: 0x88be44/Package: org.open ...
2021-10-25 14:55:20 SYSTEM_TOMBSTONE (compressed text, 10505 bytes)
    Build: samsung/t0ltexx/t0lte:4.4.2/KOT49H/N7105XXSFQA5:user/release-ke ...
...
2021-10-25 19:53:07 data_app_native_crash (text, 3644 bytes)
    Process: org.openorienteering.mapper/Flags: 0x88be44/Package: org.open ...
2021-10-25 19:53:07 SYSTEM_TOMBSTONE (compressed text, 11044 bytes)
    Build: samsung/t0ltexx/t0lte:4.4.2/KOT49H/N7105XXSFQA5:user/release-ke ...

and the file system mounted under /tombstones holds over 18M of data. I have no direct experience with these old Android releases (I've donated the old phone to the local Trail-O group) but I would say that something is going on on the system. And I agree with Kai, that it's difficult to discern the application crash from the OS decision to kill the application. The hints for memory exhaustion might be:

DUMP OF SERVICE procstats:
COMMITTED STATS FROM 2021-10-25-09-17-47:
...
  * org.openorienteering.mapper / u0a190:
           TOTAL: 98% (255MB-733MB-828MB/254MB-731MB-826MB over 74)
             Top: 98% (255MB-733MB-828MB/254MB-731MB-826MB over 74)
          Imp Fg: 0.14%
         Service: 0.00%
      (Last Act): 0.16%
        (Cached): 0.01%
...
COMMITTED STATS FROM 2021-10-25-12-18-00:
...
  * org.openorienteering.mapper / u0a190:
           TOTAL: 100% (639MB-826MB-0.89GB/637MB-825MB-0.89GB over 94)
             Top: 99% (639MB-826MB-0.89GB/637MB-824MB-0.89GB over 93)
          Imp Fg: 0.75% (843MB-843MB-843MB/841MB-841MB-841MB over 1)
         Service: 0.00%
...
COMMITTED STATS FROM 2021-10-25-15-20-00:
...
  * org.openorienteering.mapper / u0a190:
           TOTAL: 100% (47MB-736MB-828MB/45MB-735MB-826MB over 90)
             Top: 99% (47MB-735MB-827MB/45MB-734MB-825MB over 89)
          Imp Fg: 0.80% (828MB-828MB-828MB/826MB-826MB-826MB over 1)
         Service: 0.00%
...
COMMITTED STATS FROM 2021-10-25-18-20-08:
...
  * org.openorienteering.mapper / u0a190:
           TOTAL: 76% (50MB-388MB-761MB/48MB-387MB-760MB over 8)
             Top: 76% (50MB-388MB-761MB/48MB-387MB-760MB over 8)
        (Cached): 0.97%

I'm not sure about the exact meaning of the numbers in the parentheses. They might be Cached-RSS-Total but it's a pure guess. I'll see if I find docs for this dump format.

pkturner commented 3 years ago

TOTAL: 98% (255MB-733MB-828MB/254MB-731MB-826MB over 74)

I'm not sure about the exact meaning of the numbers in the parentheses.

Per a blog, it's memory statistics from process sampling. minPss-avgPss-maxPss / minUss-avgUss-maxUss where Pss is Proportional Set Size and Uss is Unique Set Size.

mlerjen commented 3 years ago

@dg0yt: What I noticed before was, that the phone had a reduced reactivity. I had to push certain buttons several times until the app made as I wanted... I might thus be that the project itself with about 10 templates, 1 scribble and the daily .gpx itself was a bit much for my OLD phone. What I now suspect to be the killer was, that due such a reactivity lag I happened to define another template as the actual scribble layer only to realize, that mistake after trying to erase an older note on my original scribble layer. I thus switched back directly to the original scribble layer and continued working. Could that have jammed the auto-save? That unwantingly edited template added another 5MB to the autosave queue? I only remember that it happened that day, but not exactly when.

The block of the save was entirely. Neither scribble, nor .gpx nor .omap had any saved data from the period. The phone is offline during work, nextcloud is not supposed to be running/interfere.

For the rest of the project, I migrated to my newer phone. No such issues there.

dg0yt commented 3 years ago

I see...

Autosave is blocked during certain regular interactions (as is normal saving), and that's why there is a mechanism to reschedule auto-save when hitting this block.

Maybe the stress situation triggered some rare corner case where it wouldn't properly clear the blocker or schedule the retry. The mechanism is unit-tested, but synthetically, and not on Android.