anongit / mytracks

Automatically exported from code.google.com/p/mytracks
0 stars 0 forks source link

Random crashes when saving a new statistics marker #172

Closed GoogleCodeExporter closed 9 years ago

GoogleCodeExporter commented 9 years ago
What steps will reproduce the problem?
1. While recording a track, add a statistics marker
2. Click Save

What is the expected output? What do you see instead?
It does save the marker, but it also crashes the application.

I've been unable to get anything useful from a log, but the attached log 
includes two such crashes.

Original issue reported on code.google.com by rdama...@google.com on 30 Sep 2010 at 6:18

Attachments:

GoogleCodeExporter commented 9 years ago
I'd like to take on this bug.

Original comment by ba...@google.com on 1 Oct 2010 at 4:52

GoogleCodeExporter commented 9 years ago
All yours :)

Original comment by rdama...@google.com on 1 Oct 2010 at 5:30

GoogleCodeExporter commented 9 years ago
I've been trying to reproduce this bug on both G1 and N1 running 1.0.17 and my 
own build (including some refactoring to the service and other minor bug 
fixes), but I can't reproduce this bug.  Since it looks like VM dies, it's 
probably related to some system specifc condition, e.g. running low on memory.

Are you able to reproduce this bug? What hardware do you have? If yes, do you 
have a full bug report so I could take a look?

BTW, while trying various unexpected combinations of user actions, I came 
across a bug where you can delete all tracks while you are recording a track.  
It doesn't crash the app, but has some unpleasant consequences (do we have a 
bug on that?).

Original comment by ba...@google.com on 2 Oct 2010 at 8:48

GoogleCodeExporter commented 9 years ago
I'm able to reproduce it on a Nexus One with Froyo FRF91. I have the logs 
above, but not much more. It doesn't always happen, but happens quite 
frequently.

We don't have a bug for delete all tracks, feel free to open one.

Original comment by rdama...@google.com on 5 Oct 2010 at 7:58

GoogleCodeExporter commented 9 years ago
Can you attach a full bug report? (press trackball and volume up altogether).  
The full bug report has all information, including system activity and it's 
much easier to track down what's going on.  We should see a kernel message 
related to the crash.

I will open a new bug on the other issue.

Original comment by ba...@google.com on 5 Oct 2010 at 10:22

GoogleCodeExporter commented 9 years ago
Rodrigo, do you happen to have more logs?  I still can't reproduce this bug.

Original comment by ba...@google.com on 12 Oct 2010 at 10:16

GoogleCodeExporter commented 9 years ago
Yes, I have more logs, but the only relevant part seems to be this:

I/ActivityManager(   83): Displayed activity 
com.google.android.maps.mytracks/com.google.android.apps.mytracks.MyTracksWaypoi
ntDetails: 425 ms (total 425 ms)
D/MyTracksProvider( 5651): MyTracksProviderUtilsImpl.insertWaypoint
D/MyTracksProvider( 5651): MyTracksProvider.insert
W/MyTracks( 5651): Tried to account for location while track is paused

I/ActivityManager(   83): Displayed activity 
com.google.android.maps.mytracks/com.google.android.apps.mytracks.MyTracksWaypoi
ntDetails: 475 ms (total 475 ms)

E/Database( 5688): Error updating name=Statistics using UPDATE waypoints SET 
name=? WHERE _id = 10
D/AndroidRuntime( 5688): Shutting down VM
W/dalvikvm( 5688): threadid=1: thread exiting with uncaught exception 
(group=0x4001d7f0)
I/Process ( 5688): Sending signal. PID: 5688 SIG: 9
I/ActivityThread( 5778): Publishing provider com.google.android.maps.mytracks: 
com.google.android.apps.mytracks.content.MyTracksProvider

A database update fails (no idea why, but it seems to fail often), and that 
crashes the provider process, which is then restarted.

Original comment by rdama...@google.com on 20 Oct 2010 at 4:04

GoogleCodeExporter commented 9 years ago
You are sending logcat logs, while I'm interested in kernel logs.  If you 
happen to have the full lug, please send it me offline.  A log I'm talking 
about is typically around ~2MB.

My guess is that that SQL lite must be triggering some bug in Android (maybe 
corrupted DB?) thus restarting the app.  Most likely an isolated issue, 
specific to your particular device/data.

Original comment by ba...@google.com on 20 Oct 2010 at 4:32

GoogleCodeExporter commented 9 years ago
Btw, this last log was right after clearing all data in mytracks, so it's not 
specific to my data :)

I'll try to produce an adb bugreport and send it to you.

Original comment by rdama...@google.com on 20 Oct 2010 at 4:36

GoogleCodeExporter commented 9 years ago
That would be great, thanks!

The kernel log should reveal why Android is shutting down the VM.

Original comment by ba...@google.com on 20 Oct 2010 at 4:40

GoogleCodeExporter commented 9 years ago
One interesting note is that this seems to be related to stopping recording 
stats.  It has the same odd message:
"Tried to account for location while track is paused"

It is kind of a hack but I think we might want to check if the track is paused 
and resume it before we insert a location.  It is only a few lines of code and 
would paper over the real issue until we can sort out what is really going on.

Original comment by sandordo...@google.com on 20 Oct 2010 at 4:44

GoogleCodeExporter commented 9 years ago
Ah, I missed that line:
W/dalvikvm( 5688): threadid=1: thread exiting with uncaught exception 
(group=0x4001d7f0)

I even more want to see the kernel log, which would show us the real exception.

Sandor@ - I think this is a separate issue worth to fix as well, but I'd rather 
fix the cause (prevent from a call to insert a waypoint), not the symptom (try 
to resume the service after the call has been made if paused).

Original comment by ba...@google.com on 20 Oct 2010 at 4:51

GoogleCodeExporter commented 9 years ago
More logs based on Rodrigo's bug report:

10-21 13:21:59.688 D/MyTracksProvider(14666): 
MyTracksProviderUtilsImpl.insertTrackPoint
10-21 13:21:59.708 D/MyTracksProvider(14666): MyTracksProvider.insert
10-21 13:21:59.958 I/ActivityManager(12584): Displayed activity 
com.google.android.maps.mytracks/com.google.android.apps.mytracks.MyTracksWaypoi
ntDetails: 919 ms (total 919 ms)
10-21 13:21:59.978 D/dalvikvm(12664): GC_EXPLICIT freed 3290 objects / 136072 
bytes in 209ms
10-21 13:22:00.128 D/dalvikvm(12664): GC_EXPLICIT freed 228 objects / 9776 
bytes in 39ms
10-21 13:22:01.548 E/Database(14651): Error updating name=Statistics using 
UPDATE waypoints SET name=? WHERE _id = 39
10-21 13:22:01.558 D/AndroidRuntime(14651): Shutting down VM
10-21 13:22:01.558 W/dalvikvm(14651): threadid=1: thread exiting with uncaught 
exception (group=0x4001d7f0)
10-21 13:22:01.578 E/AndroidRuntime(14651): FATAL EXCEPTION: main
10-21 13:22:01.578 E/AndroidRuntime(14651): 
android.database.sqlite.SQLiteException: error code 5: database is locked
10-21 13:22:01.578 E/AndroidRuntime(14651):     at 
android.database.sqlite.SQLiteStatement.native_execute(Native Method)
10-21 13:22:01.578 E/AndroidRuntime(14651):     at 
android.database.sqlite.SQLiteStatement.execute(SQLiteStatement.java:55)
10-21 13:22:01.578 E/AndroidRuntime(14651):     at 
android.database.sqlite.SQLiteDatabase.updateWithOnConflict(SQLiteDatabase.java:
1692)
10-21 13:22:01.578 E/AndroidRuntime(14651):     at 
android.database.sqlite.SQLiteDatabase.update(SQLiteDatabase.java:1622)
10-21 13:22:01.578 E/AndroidRuntime(14651):     at 
com.google.android.apps.mytracks.content.MyTracksProvider.update(MyTracksProvide
r.java:379)
10-21 13:22:01.578 E/AndroidRuntime(14651):     at 
android.content.ContentProvider$Transport.update(ContentProvider.java:204)
10-21 13:22:01.578 E/AndroidRuntime(14651):     at 
android.content.ContentResolver.update(ContentResolver.java:707)
10-21 13:22:01.578 E/AndroidRuntime(14651):     at 
com.google.android.apps.mytracks.MyTracksWaypointDetails.saveDialog(MyTracksWayp
ointDetails.java:154)
10-21 13:22:01.578 E/AndroidRuntime(14651):     at 
com.google.android.apps.mytracks.MyTracksWaypointDetails.onClick(MyTracksWaypoin
tDetails.java:168)
10-21 13:22:01.578 E/AndroidRuntime(14651):     at 
android.view.View.performClick(View.java:2408)
10-21 13:22:01.578 E/AndroidRuntime(14651):     at 
android.view.View$PerformClick.run(View.java:8816)
10-21 13:22:01.578 E/AndroidRuntime(14651):     at 
android.os.Handler.handleCallback(Handler.java:587)
10-21 13:22:01.578 E/AndroidRuntime(14651):     at 
android.os.Handler.dispatchMessage(Handler.java:92)
10-21 13:22:01.578 E/AndroidRuntime(14651):     at 
android.os.Looper.loop(Looper.java:123)
10-21 13:22:01.578 E/AndroidRuntime(14651):     at 
android.app.ActivityThread.main(ActivityThread.java:4627)
10-21 13:22:01.578 E/AndroidRuntime(14651):     at 
java.lang.reflect.Method.invokeNative(Native Method)
10-21 13:22:01.578 E/AndroidRuntime(14651):     at 
java.lang.reflect.Method.invoke(Method.java:521)
10-21 13:22:01.578 E/AndroidRuntime(14651):     at 
com.android.internal.os.ZygoteInit$MethodAndArgsCaller.run(ZygoteInit.java:868)
10-21 13:22:01.578 E/AndroidRuntime(14651):     at 
com.android.internal.os.ZygoteInit.main(ZygoteInit.java:626)
10-21 13:22:01.578 E/AndroidRuntime(14651):     at 
dalvik.system.NativeStart.main(Native Method)
10-21 13:22:01.598 W/ActivityManager(12584):   Force finishing activity 
com.google.android.maps.mytracks/com.google.android.apps.mytracks.MyTracksWaypoi
ntDetails
10-21 13:22:02.098 W/ActivityManager(12584): Activity pause timeout for 
HistoryRecord{452c7118 
com.google.android.maps.mytracks/com.google.android.apps.mytracks.MyTracksWaypoi
ntDetails}

Original comment by ba...@google.com on 21 Oct 2010 at 3:44

GoogleCodeExporter commented 9 years ago
I haven't checked the code, but it seems to me that we are trying to access DB 
from at least two threads.  First thread: 14666 and second: 14651.

A few earlier lines from the second thread:
0-21 13:21:59.338 D/dalvikvm(14651): GC_FOR_MALLOC freed 7504 objects / 456048 
bytes in 60ms
10-21 13:21:59.378 W/MyTracks(14651): Setting view 2131361865 to true speed: 
2131230951 pace: 2131230952
10-21 13:21:59.378 W/MyTracks(14651): Setting view 2131361871 to true speed: 
2131230949 pace: 2131230950
10-21 13:21:59.378 W/MyTracks(14651): Setting view 2131361858 to true speed: 
2131230963 pace: 2131230960

A few methods to verify:

boolean  isDbLockedByCurrentThread()
Checks if the database lock is held by this thread.
boolean  isDbLockedByOtherThreads()
Checks if the database is locked by another thread.

I think the fix is to do a static code check and make sure that we don't access 
DB from >1 thread at a time.

Original comment by ba...@google.com on 21 Oct 2010 at 3:55

GoogleCodeExporter commented 9 years ago
I'd except multithreaded access to work, but that two transactions on the same 
row would be mutex'ed. Is that not the case?

Original comment by rdama...@google.com on 21 Oct 2010 at 5:09

GoogleCodeExporter commented 9 years ago
I don't have experience with SQLlite +android, so I don't know.  I tried to 
search for this particular error, but haven't found anything useful.  I will 
check later.

It's hard to find any doc on Sqlite transaction model + thread locking.

Original comment by ba...@google.com on 21 Oct 2010 at 5:14

GoogleCodeExporter commented 9 years ago
No idea - you're probably right :) It could be a race condition between 
inserting a new track point and updating the waypoint, so we probably want to 
catch that and just retry, although it's most ideal if we find out why we're 
using multiple threads.

Original comment by rdama...@google.com on 21 Oct 2010 at 5:28

GoogleCodeExporter commented 9 years ago
I'd like to avoid a race condition, if updating is really forbidden from 
multiple threads.  This should be easy by establishing a lock in our code :)  
But I need to do more research to make sure we use the API the way it was 
intended to.

From the log, it's clear that two threads are trying to update at the same time.

Original comment by ba...@google.com on 21 Oct 2010 at 5:31

GoogleCodeExporter commented 9 years ago
Right now, this is the most frequent crash:

android.database.sqlite.SQLiteException: error code 5: database is locked
at android.database.sqlite.SQLiteStatement.native_execute(Native Method)
at android.database.sqlite.SQLiteStatement.execute(SQLiteStatement.java:55)
at 
android.database.sqlite.SQLiteDatabase.updateWithOnConflict(SQLiteDatabase.java:
1692)
at android.database.sqlite.SQLiteDatabase.update(SQLiteDatabase.java:1622)
at 
com.google.android.apps.mytracks.content.MyTracksProvider.update(MyTracksProvide
r.java:379)
at android.content.ContentProvider$Transport.update(ContentProvider.java:204)
at android.content.ContentResolver.update(ContentResolver.java:707)
at 
com.google.android.apps.mytracks.MyTracksWaypointDetails.saveDialog(MyTracksWayp
ointDetails.java:155)
at 
com.google.android.apps.mytracks.MyTracksWaypointDetails.onClick(MyTracksWaypoin
tDetails.java:169)
at android.view.View.performClick(View.java:2408)
at android.view.View$PerformClick.run(View.java:8816)
at android.os.Handler.handleCallback(Handler.java:587)
at android.os.Handler.dispatchMessage(Handler.java:92)
at android.os.Looper.loop(Looper.java:123)
at android.app.ActivityThread.main(ActivityThread.java:4627)
at java.lang.reflect.Method.invokeNative(Native Method)
at java.lang.reflect.Method.invoke(Method.java:521)
at 
com.android.internal.os.ZygoteInit$MethodAndArgsCaller.run(ZygoteInit.java:868)
at com.android.internal.os.ZygoteInit.main(ZygoteInit.java:626)
at dalvik.system.NativeStart.main(Native Method)

User comments:
Oct 22, 2010 4:36:31 AM1.0.19Crashed
Oct 18, 2010 7:45:48 PM1.0.19pjtno
Sep 29, 2010 8:17:26 PM1.0.17gx ktjufg
Sep 1, 2010 11:04:15 AM1.0.15trying to edit details while recording a track. 
bombs on save
Aug 20, 2010 7:35:07 AM1.0.15inserting a waypoint marker with no type or 
description
Aug 16, 2010 3:39:33 PM1.0.15tried inserting statistical point with same name 
caused crash
...

It looks like a race (accessing DB from UI thread while recording service 
inserts new locations).

Original comment by ba...@google.com on 26 Oct 2010 at 5:07

GoogleCodeExporter commented 9 years ago
All right, here is the reason why this exception happens:
1) We have two separate processes, the service and the app
2) SQLite provides basic locking in a notion of a Java lock that prevents from 
two concurrent DB operations at the same time within the same process
3) In case of multiple processes, there is no way to enforce synchronization, 
thus we have races from time to time.  The more frequent the service inserts 
locations the higher the chance to trigger this crash.  It also explains why I 
couldn't reproduce (because I was sitting and clicking insert waypoints, so I 
wasn't really doing it in parallel).
4) I did some basic performance tests. DB insert takes about 33ms on N1 (tried 
10,000 inserts).

Possible solutions:
1) Retry on error up to N times (use a Java proxy or made the client aware of 
possible failures aka optimistic locking)
2) Update DB through the service, thus avoid distributed locking
3) Come up with a notion of a distributed lock

None of the above is easy and clean at the same time.

Original comment by ba...@google.com on 27 Oct 2010 at 7:05

GoogleCodeExporter commented 9 years ago
I like solution 2 the best.  Invoking a service method is not much harder than 
a local method.  Adding a new method to the service is not hard.  We already 
have method for adding new waypoints which is probably why we don't see crashes 
from inserting new.

Original comment by sandordo...@google.com on 27 Oct 2010 at 3:17

GoogleCodeExporter commented 9 years ago
2) is not perfect (although easy to implement).  The reason is that we still 
have plenty of SQL operations in the app, e.g. track editing, track reloading 
etc.  All of them would have to go through the service (to make it consistent 
and error prone), but the service only runs when you record a track, not to 
mention coupling DB management with the service.

I'm inclined into 1), where I can provide a simple Java proxy class and 
decorate MyTrackProvider interface to automatically intercept each method and 
execute it within a try/catch block and retry on SQL failure.  This wouldn't 
require any changes to existing logic except injecting a small piece of code in 
the factory (wrapping around original implementation).

Also, we can choose a hybrid approach.  Let me know what you think.

Perhaps a code sample would be more convincing?

Original comment by ba...@google.com on 27 Oct 2010 at 4:47

GoogleCodeExporter commented 9 years ago
Isn't this only an issue for write operations? I think the only non-protected 
writes are editing waypoints and track metadata.  I'm inclined to add those to 
the iadl interface along with the existing insert methods.

Original comment by sandordo...@google.com on 27 Oct 2010 at 4:59

GoogleCodeExporter commented 9 years ago
About the solutions:
1) This can create an avalanche effect where writes start to accumulate and 
eventually most of them fail (after spending a lot more battery due to the 
retries) - other than that I like this solution, can you think of a way to 
prevent this issue or minimize it? One possible solution is to have a queue of 
pending changes and then every time the lock is released, apply all of them as 
a single transaction, which will be significantly faster. Blocking the caller 
until his specific operation is applied (so that it doesn't became completely 
async) is a bit annoying but doable.

2) I kind of dislike the fact that we'd be ignoring all the nice provider 
infrastructure in Android to do that. We would be losing a lot of 
flexibility/ease of use.

3) Agh, no.

Original comment by rdama...@google.com on 27 Oct 2010 at 4:59

GoogleCodeExporter commented 9 years ago
Sandor@ We have plenty of stack traces, as this issue is the top crash right 
now, I recall seeing at least 2 different stack traces.

Rodrigo@:
1) There is a reason why it's called optimistic locking :) We hope to have very 
few conflicts.  Retries would be limited to up to 3 per attempt, consider this 
code sample:

void updateFooBar(...) {
  int attempt = 0;
  for (;;) {
  try {
    contentResolver.update(...)
  }  catch (SqlException e) {
    if (e is "DB locked error") {
      if (++attempt > 3) {
        throw e;
     } else {
       // wait a short period of time to for the other OP to finish
       Thread.sleep(20);
     }
   }
  }
  }
}

This would go into the proxy, so no need to repeat this logic for each method 
from the provider.  Number of retries and sleep time would be parametrized.

In order to reduce the lock contention, we would have to improve insert.  My 
performance tests show that each insert takes ~30ms (a lot IMHO).  Perhaps 
redesigning the DB schema would help, but I'm not sure.

5) Lastly, if we didn't have a remote service, we wouldn't have any issues.  
Another reason of dropping it (I know, I know you oppose :) ).

Original comment by ba...@google.com on 27 Oct 2010 at 5:13

GoogleCodeExporter commented 9 years ago
1) Why not made the retry logic go into MyTracksProvider directly? This way 
it's abstracted away from users of the provider.

Let's experiment with a simpler schema.

5) My idea is to build a certain ecosystem around my tracks so that we can have 
other applications that talk to it (this way we don't have to build everything 
into our app), and for that exported service is required.

Original comment by rdama...@google.com on 27 Oct 2010 at 5:38

GoogleCodeExporter commented 9 years ago
1) Because sometimes you may not want to use the proxy, e.g. in a single 
process app.
    The factory is the right place to do so, consider the following code snippets:
    // Old version
    MyTracksProviderUtils interface = new MyTracksProviderUtilsImpl(...)

   // New version
   MyTracksProviderUtils interface = new RetryProxy(3, 20,  new MyTracksProviderUtilsImpl(...));

No more code changes.  Is this something you propose as well?

5) I think this sounds like an interesting idea, lets talk offline.  I want to 
better understand the use case.  You certainly don't want to expose start/stop 
actions which could interfere with the running app (MyTracks), I think.

Original comment by ba...@google.com on 27 Oct 2010 at 5:51

GoogleCodeExporter commented 9 years ago
1) I'm not sure I understand this - our provider will always be in a separate 
process from another app, won't it?

5) I do think we want to expose calls to start/stop recording (I can mention a 
few use cases, such as a Locale plugin and a desktop widget), which is one 
reason why the start recording flow needs refactoring so that call is 
sufficient (no preference changes needed). Feel free to grab me on gtalk.

Original comment by rdama...@google.com on 27 Oct 2010 at 6:06

GoogleCodeExporter commented 9 years ago
1) Yes.  The logic in proxy relies on the fact that the common point between 2 
processes is the shared DB, so if one process writes and blocks for 30ms and 
meanwhile another tries to write, we will catch it (SQLException) and retry.  
2) Let's chat offline when you are back.

Original comment by ba...@google.com on 27 Oct 2010 at 9:18

GoogleCodeExporter commented 9 years ago
Please review:

http://code.google.com/r/rdamazio-mytracks-staging1/source/detail?r=ba142a32107b
2605da777bd59157383da7f2fe0c

Yeah, that simple :)

Original comment by rdama...@google.com on 29 Oct 2010 at 7:40

GoogleCodeExporter commented 9 years ago
For the record, can anyone explain why we set the service to multiprocess in 
first place?

Original comment by ba...@google.com on 29 Oct 2010 at 7:47

GoogleCodeExporter commented 9 years ago
You may want to ask Leif. I have no idea.

Original comment by rdama...@google.com on 31 Oct 2010 at 11:54

GoogleCodeExporter commented 9 years ago

Original comment by rdama...@google.com on 31 Oct 2010 at 11:55