Closed GoogleCodeExporter closed 9 years ago
I'd like to take on this bug.
Original comment by ba...@google.com
on 1 Oct 2010 at 4:52
All yours :)
Original comment by rdama...@google.com
on 1 Oct 2010 at 5:30
I've been trying to reproduce this bug on both G1 and N1 running 1.0.17 and my
own build (including some refactoring to the service and other minor bug
fixes), but I can't reproduce this bug. Since it looks like VM dies, it's
probably related to some system specifc condition, e.g. running low on memory.
Are you able to reproduce this bug? What hardware do you have? If yes, do you
have a full bug report so I could take a look?
BTW, while trying various unexpected combinations of user actions, I came
across a bug where you can delete all tracks while you are recording a track.
It doesn't crash the app, but has some unpleasant consequences (do we have a
bug on that?).
Original comment by ba...@google.com
on 2 Oct 2010 at 8:48
I'm able to reproduce it on a Nexus One with Froyo FRF91. I have the logs
above, but not much more. It doesn't always happen, but happens quite
frequently.
We don't have a bug for delete all tracks, feel free to open one.
Original comment by rdama...@google.com
on 5 Oct 2010 at 7:58
Can you attach a full bug report? (press trackball and volume up altogether).
The full bug report has all information, including system activity and it's
much easier to track down what's going on. We should see a kernel message
related to the crash.
I will open a new bug on the other issue.
Original comment by ba...@google.com
on 5 Oct 2010 at 10:22
Rodrigo, do you happen to have more logs? I still can't reproduce this bug.
Original comment by ba...@google.com
on 12 Oct 2010 at 10:16
Yes, I have more logs, but the only relevant part seems to be this:
I/ActivityManager( 83): Displayed activity
com.google.android.maps.mytracks/com.google.android.apps.mytracks.MyTracksWaypoi
ntDetails: 425 ms (total 425 ms)
D/MyTracksProvider( 5651): MyTracksProviderUtilsImpl.insertWaypoint
D/MyTracksProvider( 5651): MyTracksProvider.insert
W/MyTracks( 5651): Tried to account for location while track is paused
I/ActivityManager( 83): Displayed activity
com.google.android.maps.mytracks/com.google.android.apps.mytracks.MyTracksWaypoi
ntDetails: 475 ms (total 475 ms)
E/Database( 5688): Error updating name=Statistics using UPDATE waypoints SET
name=? WHERE _id = 10
D/AndroidRuntime( 5688): Shutting down VM
W/dalvikvm( 5688): threadid=1: thread exiting with uncaught exception
(group=0x4001d7f0)
I/Process ( 5688): Sending signal. PID: 5688 SIG: 9
I/ActivityThread( 5778): Publishing provider com.google.android.maps.mytracks:
com.google.android.apps.mytracks.content.MyTracksProvider
A database update fails (no idea why, but it seems to fail often), and that
crashes the provider process, which is then restarted.
Original comment by rdama...@google.com
on 20 Oct 2010 at 4:04
You are sending logcat logs, while I'm interested in kernel logs. If you
happen to have the full lug, please send it me offline. A log I'm talking
about is typically around ~2MB.
My guess is that that SQL lite must be triggering some bug in Android (maybe
corrupted DB?) thus restarting the app. Most likely an isolated issue,
specific to your particular device/data.
Original comment by ba...@google.com
on 20 Oct 2010 at 4:32
Btw, this last log was right after clearing all data in mytracks, so it's not
specific to my data :)
I'll try to produce an adb bugreport and send it to you.
Original comment by rdama...@google.com
on 20 Oct 2010 at 4:36
That would be great, thanks!
The kernel log should reveal why Android is shutting down the VM.
Original comment by ba...@google.com
on 20 Oct 2010 at 4:40
One interesting note is that this seems to be related to stopping recording
stats. It has the same odd message:
"Tried to account for location while track is paused"
It is kind of a hack but I think we might want to check if the track is paused
and resume it before we insert a location. It is only a few lines of code and
would paper over the real issue until we can sort out what is really going on.
Original comment by sandordo...@google.com
on 20 Oct 2010 at 4:44
Ah, I missed that line:
W/dalvikvm( 5688): threadid=1: thread exiting with uncaught exception
(group=0x4001d7f0)
I even more want to see the kernel log, which would show us the real exception.
Sandor@ - I think this is a separate issue worth to fix as well, but I'd rather
fix the cause (prevent from a call to insert a waypoint), not the symptom (try
to resume the service after the call has been made if paused).
Original comment by ba...@google.com
on 20 Oct 2010 at 4:51
More logs based on Rodrigo's bug report:
10-21 13:21:59.688 D/MyTracksProvider(14666):
MyTracksProviderUtilsImpl.insertTrackPoint
10-21 13:21:59.708 D/MyTracksProvider(14666): MyTracksProvider.insert
10-21 13:21:59.958 I/ActivityManager(12584): Displayed activity
com.google.android.maps.mytracks/com.google.android.apps.mytracks.MyTracksWaypoi
ntDetails: 919 ms (total 919 ms)
10-21 13:21:59.978 D/dalvikvm(12664): GC_EXPLICIT freed 3290 objects / 136072
bytes in 209ms
10-21 13:22:00.128 D/dalvikvm(12664): GC_EXPLICIT freed 228 objects / 9776
bytes in 39ms
10-21 13:22:01.548 E/Database(14651): Error updating name=Statistics using
UPDATE waypoints SET name=? WHERE _id = 39
10-21 13:22:01.558 D/AndroidRuntime(14651): Shutting down VM
10-21 13:22:01.558 W/dalvikvm(14651): threadid=1: thread exiting with uncaught
exception (group=0x4001d7f0)
10-21 13:22:01.578 E/AndroidRuntime(14651): FATAL EXCEPTION: main
10-21 13:22:01.578 E/AndroidRuntime(14651):
android.database.sqlite.SQLiteException: error code 5: database is locked
10-21 13:22:01.578 E/AndroidRuntime(14651): at
android.database.sqlite.SQLiteStatement.native_execute(Native Method)
10-21 13:22:01.578 E/AndroidRuntime(14651): at
android.database.sqlite.SQLiteStatement.execute(SQLiteStatement.java:55)
10-21 13:22:01.578 E/AndroidRuntime(14651): at
android.database.sqlite.SQLiteDatabase.updateWithOnConflict(SQLiteDatabase.java:
1692)
10-21 13:22:01.578 E/AndroidRuntime(14651): at
android.database.sqlite.SQLiteDatabase.update(SQLiteDatabase.java:1622)
10-21 13:22:01.578 E/AndroidRuntime(14651): at
com.google.android.apps.mytracks.content.MyTracksProvider.update(MyTracksProvide
r.java:379)
10-21 13:22:01.578 E/AndroidRuntime(14651): at
android.content.ContentProvider$Transport.update(ContentProvider.java:204)
10-21 13:22:01.578 E/AndroidRuntime(14651): at
android.content.ContentResolver.update(ContentResolver.java:707)
10-21 13:22:01.578 E/AndroidRuntime(14651): at
com.google.android.apps.mytracks.MyTracksWaypointDetails.saveDialog(MyTracksWayp
ointDetails.java:154)
10-21 13:22:01.578 E/AndroidRuntime(14651): at
com.google.android.apps.mytracks.MyTracksWaypointDetails.onClick(MyTracksWaypoin
tDetails.java:168)
10-21 13:22:01.578 E/AndroidRuntime(14651): at
android.view.View.performClick(View.java:2408)
10-21 13:22:01.578 E/AndroidRuntime(14651): at
android.view.View$PerformClick.run(View.java:8816)
10-21 13:22:01.578 E/AndroidRuntime(14651): at
android.os.Handler.handleCallback(Handler.java:587)
10-21 13:22:01.578 E/AndroidRuntime(14651): at
android.os.Handler.dispatchMessage(Handler.java:92)
10-21 13:22:01.578 E/AndroidRuntime(14651): at
android.os.Looper.loop(Looper.java:123)
10-21 13:22:01.578 E/AndroidRuntime(14651): at
android.app.ActivityThread.main(ActivityThread.java:4627)
10-21 13:22:01.578 E/AndroidRuntime(14651): at
java.lang.reflect.Method.invokeNative(Native Method)
10-21 13:22:01.578 E/AndroidRuntime(14651): at
java.lang.reflect.Method.invoke(Method.java:521)
10-21 13:22:01.578 E/AndroidRuntime(14651): at
com.android.internal.os.ZygoteInit$MethodAndArgsCaller.run(ZygoteInit.java:868)
10-21 13:22:01.578 E/AndroidRuntime(14651): at
com.android.internal.os.ZygoteInit.main(ZygoteInit.java:626)
10-21 13:22:01.578 E/AndroidRuntime(14651): at
dalvik.system.NativeStart.main(Native Method)
10-21 13:22:01.598 W/ActivityManager(12584): Force finishing activity
com.google.android.maps.mytracks/com.google.android.apps.mytracks.MyTracksWaypoi
ntDetails
10-21 13:22:02.098 W/ActivityManager(12584): Activity pause timeout for
HistoryRecord{452c7118
com.google.android.maps.mytracks/com.google.android.apps.mytracks.MyTracksWaypoi
ntDetails}
Original comment by ba...@google.com
on 21 Oct 2010 at 3:44
I haven't checked the code, but it seems to me that we are trying to access DB
from at least two threads. First thread: 14666 and second: 14651.
A few earlier lines from the second thread:
0-21 13:21:59.338 D/dalvikvm(14651): GC_FOR_MALLOC freed 7504 objects / 456048
bytes in 60ms
10-21 13:21:59.378 W/MyTracks(14651): Setting view 2131361865 to true speed:
2131230951 pace: 2131230952
10-21 13:21:59.378 W/MyTracks(14651): Setting view 2131361871 to true speed:
2131230949 pace: 2131230950
10-21 13:21:59.378 W/MyTracks(14651): Setting view 2131361858 to true speed:
2131230963 pace: 2131230960
A few methods to verify:
boolean isDbLockedByCurrentThread()
Checks if the database lock is held by this thread.
boolean isDbLockedByOtherThreads()
Checks if the database is locked by another thread.
I think the fix is to do a static code check and make sure that we don't access
DB from >1 thread at a time.
Original comment by ba...@google.com
on 21 Oct 2010 at 3:55
I'd except multithreaded access to work, but that two transactions on the same
row would be mutex'ed. Is that not the case?
Original comment by rdama...@google.com
on 21 Oct 2010 at 5:09
I don't have experience with SQLlite +android, so I don't know. I tried to
search for this particular error, but haven't found anything useful. I will
check later.
It's hard to find any doc on Sqlite transaction model + thread locking.
Original comment by ba...@google.com
on 21 Oct 2010 at 5:14
No idea - you're probably right :) It could be a race condition between
inserting a new track point and updating the waypoint, so we probably want to
catch that and just retry, although it's most ideal if we find out why we're
using multiple threads.
Original comment by rdama...@google.com
on 21 Oct 2010 at 5:28
I'd like to avoid a race condition, if updating is really forbidden from
multiple threads. This should be easy by establishing a lock in our code :)
But I need to do more research to make sure we use the API the way it was
intended to.
From the log, it's clear that two threads are trying to update at the same time.
Original comment by ba...@google.com
on 21 Oct 2010 at 5:31
Right now, this is the most frequent crash:
android.database.sqlite.SQLiteException: error code 5: database is locked
at android.database.sqlite.SQLiteStatement.native_execute(Native Method)
at android.database.sqlite.SQLiteStatement.execute(SQLiteStatement.java:55)
at
android.database.sqlite.SQLiteDatabase.updateWithOnConflict(SQLiteDatabase.java:
1692)
at android.database.sqlite.SQLiteDatabase.update(SQLiteDatabase.java:1622)
at
com.google.android.apps.mytracks.content.MyTracksProvider.update(MyTracksProvide
r.java:379)
at android.content.ContentProvider$Transport.update(ContentProvider.java:204)
at android.content.ContentResolver.update(ContentResolver.java:707)
at
com.google.android.apps.mytracks.MyTracksWaypointDetails.saveDialog(MyTracksWayp
ointDetails.java:155)
at
com.google.android.apps.mytracks.MyTracksWaypointDetails.onClick(MyTracksWaypoin
tDetails.java:169)
at android.view.View.performClick(View.java:2408)
at android.view.View$PerformClick.run(View.java:8816)
at android.os.Handler.handleCallback(Handler.java:587)
at android.os.Handler.dispatchMessage(Handler.java:92)
at android.os.Looper.loop(Looper.java:123)
at android.app.ActivityThread.main(ActivityThread.java:4627)
at java.lang.reflect.Method.invokeNative(Native Method)
at java.lang.reflect.Method.invoke(Method.java:521)
at
com.android.internal.os.ZygoteInit$MethodAndArgsCaller.run(ZygoteInit.java:868)
at com.android.internal.os.ZygoteInit.main(ZygoteInit.java:626)
at dalvik.system.NativeStart.main(Native Method)
User comments:
Oct 22, 2010 4:36:31 AM1.0.19Crashed
Oct 18, 2010 7:45:48 PM1.0.19pjtno
Sep 29, 2010 8:17:26 PM1.0.17gx ktjufg
Sep 1, 2010 11:04:15 AM1.0.15trying to edit details while recording a track.
bombs on save
Aug 20, 2010 7:35:07 AM1.0.15inserting a waypoint marker with no type or
description
Aug 16, 2010 3:39:33 PM1.0.15tried inserting statistical point with same name
caused crash
...
It looks like a race (accessing DB from UI thread while recording service
inserts new locations).
Original comment by ba...@google.com
on 26 Oct 2010 at 5:07
All right, here is the reason why this exception happens:
1) We have two separate processes, the service and the app
2) SQLite provides basic locking in a notion of a Java lock that prevents from
two concurrent DB operations at the same time within the same process
3) In case of multiple processes, there is no way to enforce synchronization,
thus we have races from time to time. The more frequent the service inserts
locations the higher the chance to trigger this crash. It also explains why I
couldn't reproduce (because I was sitting and clicking insert waypoints, so I
wasn't really doing it in parallel).
4) I did some basic performance tests. DB insert takes about 33ms on N1 (tried
10,000 inserts).
Possible solutions:
1) Retry on error up to N times (use a Java proxy or made the client aware of
possible failures aka optimistic locking)
2) Update DB through the service, thus avoid distributed locking
3) Come up with a notion of a distributed lock
None of the above is easy and clean at the same time.
Original comment by ba...@google.com
on 27 Oct 2010 at 7:05
I like solution 2 the best. Invoking a service method is not much harder than
a local method. Adding a new method to the service is not hard. We already
have method for adding new waypoints which is probably why we don't see crashes
from inserting new.
Original comment by sandordo...@google.com
on 27 Oct 2010 at 3:17
2) is not perfect (although easy to implement). The reason is that we still
have plenty of SQL operations in the app, e.g. track editing, track reloading
etc. All of them would have to go through the service (to make it consistent
and error prone), but the service only runs when you record a track, not to
mention coupling DB management with the service.
I'm inclined into 1), where I can provide a simple Java proxy class and
decorate MyTrackProvider interface to automatically intercept each method and
execute it within a try/catch block and retry on SQL failure. This wouldn't
require any changes to existing logic except injecting a small piece of code in
the factory (wrapping around original implementation).
Also, we can choose a hybrid approach. Let me know what you think.
Perhaps a code sample would be more convincing?
Original comment by ba...@google.com
on 27 Oct 2010 at 4:47
Isn't this only an issue for write operations? I think the only non-protected
writes are editing waypoints and track metadata. I'm inclined to add those to
the iadl interface along with the existing insert methods.
Original comment by sandordo...@google.com
on 27 Oct 2010 at 4:59
About the solutions:
1) This can create an avalanche effect where writes start to accumulate and
eventually most of them fail (after spending a lot more battery due to the
retries) - other than that I like this solution, can you think of a way to
prevent this issue or minimize it? One possible solution is to have a queue of
pending changes and then every time the lock is released, apply all of them as
a single transaction, which will be significantly faster. Blocking the caller
until his specific operation is applied (so that it doesn't became completely
async) is a bit annoying but doable.
2) I kind of dislike the fact that we'd be ignoring all the nice provider
infrastructure in Android to do that. We would be losing a lot of
flexibility/ease of use.
3) Agh, no.
Original comment by rdama...@google.com
on 27 Oct 2010 at 4:59
Sandor@ We have plenty of stack traces, as this issue is the top crash right
now, I recall seeing at least 2 different stack traces.
Rodrigo@:
1) There is a reason why it's called optimistic locking :) We hope to have very
few conflicts. Retries would be limited to up to 3 per attempt, consider this
code sample:
void updateFooBar(...) {
int attempt = 0;
for (;;) {
try {
contentResolver.update(...)
} catch (SqlException e) {
if (e is "DB locked error") {
if (++attempt > 3) {
throw e;
} else {
// wait a short period of time to for the other OP to finish
Thread.sleep(20);
}
}
}
}
}
This would go into the proxy, so no need to repeat this logic for each method
from the provider. Number of retries and sleep time would be parametrized.
In order to reduce the lock contention, we would have to improve insert. My
performance tests show that each insert takes ~30ms (a lot IMHO). Perhaps
redesigning the DB schema would help, but I'm not sure.
5) Lastly, if we didn't have a remote service, we wouldn't have any issues.
Another reason of dropping it (I know, I know you oppose :) ).
Original comment by ba...@google.com
on 27 Oct 2010 at 5:13
1) Why not made the retry logic go into MyTracksProvider directly? This way
it's abstracted away from users of the provider.
Let's experiment with a simpler schema.
5) My idea is to build a certain ecosystem around my tracks so that we can have
other applications that talk to it (this way we don't have to build everything
into our app), and for that exported service is required.
Original comment by rdama...@google.com
on 27 Oct 2010 at 5:38
1) Because sometimes you may not want to use the proxy, e.g. in a single
process app.
The factory is the right place to do so, consider the following code snippets:
// Old version
MyTracksProviderUtils interface = new MyTracksProviderUtilsImpl(...)
// New version
MyTracksProviderUtils interface = new RetryProxy(3, 20, new MyTracksProviderUtilsImpl(...));
No more code changes. Is this something you propose as well?
5) I think this sounds like an interesting idea, lets talk offline. I want to
better understand the use case. You certainly don't want to expose start/stop
actions which could interfere with the running app (MyTracks), I think.
Original comment by ba...@google.com
on 27 Oct 2010 at 5:51
1) I'm not sure I understand this - our provider will always be in a separate
process from another app, won't it?
5) I do think we want to expose calls to start/stop recording (I can mention a
few use cases, such as a Locale plugin and a desktop widget), which is one
reason why the start recording flow needs refactoring so that call is
sufficient (no preference changes needed). Feel free to grab me on gtalk.
Original comment by rdama...@google.com
on 27 Oct 2010 at 6:06
1) Yes. The logic in proxy relies on the fact that the common point between 2
processes is the shared DB, so if one process writes and blocks for 30ms and
meanwhile another tries to write, we will catch it (SQLException) and retry.
2) Let's chat offline when you are back.
Original comment by ba...@google.com
on 27 Oct 2010 at 9:18
Please review:
http://code.google.com/r/rdamazio-mytracks-staging1/source/detail?r=ba142a32107b
2605da777bd59157383da7f2fe0c
Yeah, that simple :)
Original comment by rdama...@google.com
on 29 Oct 2010 at 7:40
For the record, can anyone explain why we set the service to multiprocess in
first place?
Original comment by ba...@google.com
on 29 Oct 2010 at 7:47
You may want to ask Leif. I have no idea.
Original comment by rdama...@google.com
on 31 Oct 2010 at 11:54
Original comment by rdama...@google.com
on 31 Oct 2010 at 11:55
Original issue reported on code.google.com by
rdama...@google.com
on 30 Sep 2010 at 6:18Attachments: