e-mission / e-mission-data-collection

Repository for our own data collection
BSD 3-Clause "New" or "Revised" License
4 stars 19 forks source link

SharedPreferences are sometimes not saved properly #36

Closed shankari closed 9 years ago

shankari commented 9 years ago

It looks like SharedPreferences do not save properly in some cases. Here's what I saw from the app logs for a trip in which I left Tied House at 8:30pm and came home. The app appeared to have been killed and relaunched by the geofence exit, since the id was 9.

[9|1435807867723|7/1/15 8:31 PM|FINE]TripDiaryStateMachineReceiver : after reading from the prefs, the current state is local.state.waiting_for_trip_start

After processing the geofence exit, the new state was ongoing_trip.

[14|1435807867853|7/1/15 8:31 PM|FINE]TripDiaryStateMachineReceiver : newState after handling action is local.state.ongoing_trip

However, when the state is next read, it is still waiting_for_trip_start

[87|1435808413146|7/1/15 8:40 PM|FINE]TripDiaryStateMachineReceiver : after reading from the prefs, the current state is local.state.waiting_for_trip_start

So when the trip end is detected, we think that the current state is already waiting_for_trip_start, so we do not turn off ongoing tracking, and we do not establish the geofence.

shankari commented 9 years ago

Additional logging added in pull request #35

shankari commented 9 years ago

A potential solution might be to use commit() instead of apply(). commit() saves synchronously and returns a result value apply() saves asynchronously and fails silently. I am not sure what might cause the save to fail. It is not supposed to be the android lifecycle because:

You don't need to worry about Android component lifecycles and their interaction with apply() writing to disk. The framework makes sure in-flight disk writes from apply() complete before switching states.

On the other hand, this is a service, not an activity. Maybe that's why it is confusing?

sfwatergit commented 9 years ago

If this refers specifically to the call on SharedPrefs in TripDiaryStateMachineReceiver, then it is due to the lifecycle of BroadcastReceiver, which is not a service, but ceases to exist as a separate component almost immediately after completion of any running code. You may wish to extend WakefulBroadcastReceiver, use call goAsync within the onReceive(Context,Intent) method, or simply refactor the BroadcastReceiver as an inner static class of a service, which is a bit more idiomatic in my experience. I tend to prefer use of the CommonsWare WakefulIntentService for state machines and other short small class that act as short-lived data synchronizers: https://github.com/commonsguy/cwac-wakeful.

shankari commented 9 years ago

If this refers specifically to the call on SharedPrefs in TripDiaryStateMachineReceiver, then it is due to the lifecycle of BroadcastReceiver, which is not a service, but ceases to exist as a separate component almost immediately after completion of any running code.

Yes, it refers to TripDiaryStateMachineReceiver. However, as I quoted above, the documentation for apply() specifically says that

You don't need to worry about Android component lifecycles and their interaction with apply() writing to disk. The framework makes sure in-flight disk writes from apply() complete before switching states.

Presumably, this includes switching from the existing to the not-existing state.

Ah, I think that the confusion was because I said "this is a service". You are absolutely right that it is not. I meant to say "this is not an activity" and was sloppy. But the question about why the save failed is still, IMHO valid.

sfwatergit commented 9 years ago

There is some lack of detail in the Android documentation about killing a process with a thread running asynchronously.

Check out: http://developer.android.com/reference/android/content/BroadcastReceiver.html#onReceive(android.content.Context, android.content.Intent) http://developer.android.com/reference/android/content/BroadcastReceiver.html#onReceive(android.content.Context,%20android.content.Intent) and compare with:

Which says that you should not run asynchronous processes from the broadcast receiver’s onReceive(). Whereas:

http://developer.android.com/reference/android/content/SharedPreferences.Editor.html#apply() http://developer.android.com/reference/android/content/SharedPreferences.Editor.html#apply()

despite SharedPreferences.Editor.apply() being an async call, the docs indicate that you need not worry about state change, as you’ve mentioned.

So, my understanding is that even if the process is killed, even if your processing is not done within 10 secs any calls to async.apply() should eventually complete.

However, reading down a bit, I notice your last call in onReceive() is to connect to a service, which itself may be long-running… those Callbacks down there are worrisome… (I’ve experienced IntentServices killed off when connecting to a service after onStartCommand (though this can be ameliorated on a case-specific basis somewhat with START_STICKY).

OK, so aside from the Callbacks in the service (which may or may not return if the object reference is destroyed), the instant you call .connect(), the BR's onReceive() must return, which makes it eligible to be killed off immediately. The background processing may still run, though, if you manage to connect before Android kills the BackgroundReceiver. This may be dangerous; however, if several intents are received sequentially, since the context of the BroadcastReceiver may be leaked. Well, this is perhaps intentional, since I see you are maintaining the context as an instance variable.

So, another possibility is that the action bearing end_trip is waiting to be written; however, another .apply() is called almost immediately afterwards on the waiting_for_trip_start Action. Despite what is written there, my hunch is that the .apply()s are not completed synchronously, since they are, as indicated, asynchronous writes (despite living in separate processes and saving immediately to memory, the actual IO on the SharedPreference (which is an XML file) must be synchronous. The ordering and safety guarantee, I believe, on applies to, um, .apply()s, which are initiated in the same process (i.e., the BroadcastReceiver), but not necessarily to .apply()s called from different processes, which is what would happen if the BroadcastReceiver responds to Intent B before Intent A’s .apply() calls have a chance to complete.

Do you think these reasons apply in your situation?

Sid Feygin

Ph. D Student UC Berkeley: Civil and Environmental Engineering Department, Civil Systems sid.feygin@berkeley.edu

smartcities.berkeley.edu xmobile.berkeley.edu

On Aug 22, 2015, at 11:25 PM, shankari notifications@github.com wrote:

If this refers specifically to the call on SharedPrefs in TripDiaryStateMachineReceiver, then it is due to the lifecycle of BroadcastReceiver, which is not a service, but ceases to exist as a separate component almost immediately after completion of any running code.

Yes, it refers to TripDiaryStateMachineReceiver. However, as I quoted above, the documentation for apply() specifically says that

You don't need to worry about Android component lifecycles and their interaction with apply() writing to disk. The framework makes sure in-flight disk writes from apply() complete before switching states.

Presumably, this includes switching from the existing to the not-existing state.

Ah, I think that the confusion was because I said "this is a service". You are absolutely right that it is not. I meant to say "this is not an activity" and was sloppy. But the question about why the save failed is still, IMHO valid.

— Reply to this email directly or view it on GitHub https://github.com/e-mission/e-mission-data-collection/issues/36#issuecomment-133792955.

shankari commented 9 years ago

However, reading down a bit, I notice your last call in onReceive() is to connect to a service, which itself may be long-running… those Callbacks down there are worrisome… (I’ve experienced IntentServices killed off when connecting to a service after onStartCommand (though this can be ameliorated on a case-specific basis somewhat with START_STICKY).

Which service are you talking about? The callbacks for creating the geofence, etc? IIRC, those are executed in the thread of the geofence creation, not the thread of the broadcast receiver. So basically, we start a new thread to perform a new operation and provide code to be run in that thread once the operation is complete.

Do you think these reasons apply in your situation? I am not sure - it seems like there is a fair amount of guessing/reading between the lines on what the apply() call does, which feels like reverse engineering the android library. I would prefer not to reverse engineer undocumented behavior which can change in future versions unless I absolutely have to. I would prefer to robustify the code and move on.

My plan for this issue was as follows:

  1. Improve logging
  2. See if it recurs
  3. If it does, change the apply() to commit(), which is a blocking, synchronous call.

2 above hasn't happened yet, so I haven't made the change. If it doesn't recur until I am ready to deploy to the play store, I will probably make the change anyway just to be on the safe side. The performance implications don't appear to be very significant.

shankari commented 9 years ago

Looks like the file handler option to log to file is in fact not thread safe.

Logging with both the filehandler and a database handler at the same time indicates that chunks of log are just dropped on the floor, although the presence of a .lck file would seem to indicate that the FileHandler does file level locking. Apparently it is not good enough across contexts.

See #42 for more details. So it might just be that the change in the preferences was not logged correctly and this might be a non-issue after all.

sfwatergit commented 9 years ago

The mention of services above was to reiterate that despite the fact that you would run the geofence creation in a separate thread, the BroadcastReceiver component and process would certainly will be destroyed. This may or may not a problem for the thread, since it maintains the object reference (hence the leaked context) and won't be garbage collected, but the thread interaction with repeated calls to the BroadcastIntent (and any services invoked therein) may result in race conditions.

See also: https://groups.google.com/forum/#!topic/android-developers/iqwmDdUumXg . This describes a very similar issue.

Even improved logging will make it difficult to catch a race condition. However, SharedPreferences.OnSharedPreferenceChangeListener may be useful here if you would like to log exactly when the shared prefs are changed. I can imagine you could test this by running a long-running service from your main activity (bound or unbound) and listen for changes from the broadcast receiver.

Also, are you able to recover the logcat?

shankari commented 9 years ago

Also, are you able to recover the logcat?

No. The events happened on 1st Jul. I noticed this on 5th Jul and filed the bug. This incident had already moved out of logcat by then.

Even improved logging will make it difficult to catch a race condition.

Agreed. But once we know that the file based logging is not reliable, there is no indication that there was a race condition, or even that the shared preferences were not saved correctly.

I originally thought that the shared preferences were not saved correctly because I saw a newState = ongoing_trip followed by read from the shared preferences of waiting_for_trip_state. But if we could have missing logs, then it could be that we detected a trip end in the interim and that legitimately caused the state the change, except the trip end detection was overriden.

I am going to close this and the other "mysterious behavior with location stuff" bugs for now since they were not based on reliable logs. I will reopen them if I see the same behavior in the database-based logs.

shankari commented 9 years ago

I am seeing similar behavior even with the reliable logs, so it looks like the callbacks don't work properly from a BroadcastReceiver, as @sfwatergit suggested.

Below, we show three instances where this happened. First, let us show what a successful invocation looks like.

[1|1441757464330|9/8/15 5:11 PM|FINE]GeofenceExitIntentService onCreate called
[3|1441757464396|9/8/15 5:11 PM|FINE]GeofenceExitIntentService onStartCommand called with intent Intent { cmp=edu.berkeley.eecs.cfc_tracker/.location.GeofenceExitIntentService (has extras) } flags 0 sta
rtId 1
[5|1441757464437|9/8/15 5:11 PM|FINE]GeofenceExitIntentService onStart called with startId 1
[7|1441757464476|9/8/15 5:11 PM|FINE]GeofenceExitIntentService geofence exit intent action = null
[9|1441757464573|9/8/15 5:11 PM|FINE]GeofenceExitIntentService got geofence intent callback with type 2 and location Location[fused 37.874832,-122.259241 acc=845 et=+12d1h19m15s758ms]
[11|1441757464633|9/8/15 5:11 PM|FINE]GeofenceExitIntentService Geofence exited! Intent = Intent { cmp=edu.berkeley.eecs.cfc_tracker/.location.GeofenceExitIntentService (has extras) } Starting ongoing m
onitoring...
[13|1441757464869|9/8/15 5:11 PM|FINE]TripDiaryStateMachineReceiver after reading from the prefs, the current state is local.state.waiting_for_trip_start
[15|1441757464985|9/8/15 5:11 PM|FINE]TripDiaryStateMachineReceiver TripDiaryStateMachineReceiver onConnected(null) called
[17|1441757465044|9/8/15 5:11 PM|FINE]TripDiaryStateMachineReceiver TripDiaryStateMachineReceiver handleAction(local.state.waiting_for_trip_start, local.transition.exited_geofence) called
[19|1441757465078|9/8/15 5:11 PM|FINE]TripDiaryStateMachineReceiver TripDiaryStateMachineReceiver handleTripStart(local.transition.exited_geofence) called
[21|1441757465131|9/8/15 5:11 PM|FINE]CreateGeofenceAction Removing geofence with ID = DYNAMIC_EXIT_GEOFENCE
[23|1441757465211|9/8/15 5:11 PM|FINE]LocationHandler default request is Request[PRIORITY_BALANCED_POWER_ACCURACY requested=3600000ms fastest=600000ms]
[25|1441757465288|9/8/15 5:11 PM|FINE]ActivityRecognitionHandler Starting activity recognition with interval = 30000
[27|1441757465342|9/8/15 5:11 PM|FINE]LocationChangeIntentService onCreate called
[29|1441757465381|9/8/15 5:11 PM|FINE]LocationChangeIntentService onStart called with Intent { cmp=edu.berkeley.eecs.cfc_tracker/.location.LocationChangeIntentService (has extras) } startId 1

Here are the failures.

[1|1441979577703|9/11/15 6:52 AM|FINE]GeofenceExitIntentService onCreate called
[3|1441979577773|9/11/15 6:52 AM|FINE]GeofenceExitIntentService onStartCommand called with intent Intent { cmp=edu.berkeley.eecs.cfc_tracker/.location.GeofenceExitIntentService (has extras) } flags 0 st
artId 1
[5|1441979577818|9/11/15 6:52 AM|FINE]GeofenceExitIntentService onStart called with startId 1
[7|1441979577860|9/11/15 6:52 AM|FINE]GeofenceExitIntentService geofence exit intent action = null
[9|1441979577942|9/11/15 6:52 AM|FINE]GeofenceExitIntentService got geofence intent callback with type 2 and location Location[fused 37.391921,-122.085298 acc=10 et=+14d15h1m9s567ms alt=-5.70001220703125 vel=6.6303 bear=29.0]
[11|1441979577986|9/11/15 6:52 AM|FINE]GeofenceExitIntentService Geofence exited! Intent = Intent { c
mp=edu.berkeley.eecs.cfc_tracker/.location.GeofenceExitIntentService (has extras) } Starting ongoing 
monitoring...
[13|1441979578216|9/11/15 6:52 AM|FINE]TripDiaryStateMachineReceiver after reading from the prefs, the current state is local.state.waiting_for_trip_start
shankari commented 9 years ago

Another one.

[1|1441680097199|9/7/15 7:41 PM|FINE]GeofenceExitIntentService onCreate called
[3|1441680097295|9/7/15 7:41 PM|FINE]GeofenceExitIntentService onStartCommand called with intent Intent { cmp=edu.berkeley.eecs.cfc_tracker/.location.GeofenceExitIntentService (has extras) } flags 0 sta
rtId 1
[5|1441680097353|9/7/15 7:41 PM|FINE]GeofenceExitIntentService onStart called with startId 1
[7|1441680097398|9/7/15 7:41 PM|FINE]GeofenceExitIntentService geofence exit intent action = null
[9|1441680097531|9/7/15 7:41 PM|FINE]GeofenceExitIntentService got geofence intent callback with type 2 and location Location[fused 37.390951,-122.086278 acc=30 et=+11d3h49m46s980ms]
[11|1441680097576|9/7/15 7:41 PM|FINE]GeofenceExitIntentService Geofence exited! Intent = Intent { cmp=edu.berkeley.eecs.cfc_tracker/.location.GeofenceExitIntentService (has extras) } Starting ongoing m
onitoring...
[13|1441680097811|9/7/15 7:41 PM|FINE]TripDiaryStateMachineReceiver after reading from the prefs, the current state is local.state.waiting_for_trip_start
[15|1441680097943|9/7/15 7:41 PM|FINE]TripDiaryStateMachineReceiver TripDiaryStateMachineReceiver onConnected(null) called
[17|1441680097991|9/7/15 7:41 PM|FINE]TripDiaryStateMachineReceiver TripDiaryStateMachineReceiver handleAction(local.state.waiting_for_trip_start, local.transition.exited_geofence) called
shankari commented 9 years ago

In both those cases, the thread is terminated at that point. I can tell because the log ID gets reset right after that.

[17|1441680097991|9/7/15 7:41 PM|FINE]TripDiaryStateMachineReceiver TripDiaryStateMachineReceiver handleAction(local.state.waiting_for_trip_start, local.transition.exited_geofence) called
[1|1441682030836|9/7/15 8:13 PM|FINE]BuiltinUserCache While searching for regex, got 1 results
[13|1441979578216|9/11/15 6:52 AM|FINE]TripDiaryStateMachineReceiver after reading from the prefs, the current state is local.state.waiting_for_trip_start
[1|1441980831591|9/11/15 7:13 AM|FINE]BuiltinUserCache While searching for regex, got 0 results

It is really clear in the first of these, since the implementation of handleAction is basically a simple if/then statement that should have called handleTripStart. There are no async calls that might fail between those two statements, which makes it pretty clear that the process was terminated at that point.

I am converting this to a service. Thanks to @sfwatergit for the catch!

shankari commented 9 years ago

I see this behavior even after converting to a service. However, I think that in this case, there might be a race caused by the delay in write.

Here are the related logs:

We detect the end of the trip

[41|1442946453283|9/22/15 11:27 AM|FINE]TripDiaryStateMachineService TripDiaryStateMachineReceiver handleAction(local.state.ongoing_trip, local.transition.stopped_moving) called
[43|1442946453406|9/22/15 11:27 AM|FINE]TripDiaryStateMachineService TripDiaryStateMachineReceiver handleTripEnd(local.transition.stopped_moving) called
[45|1442946453511|9/22/15 11:27 AM|FINE]ActivityRecognitionHandler Stopping activity recognition with interval = 30000
[47|1442946453596|9/22/15 11:27 AM|FINE]CreateGeofenceAction mLastLocation has elapsed time = 324019199000000
[49|1442946453631|9/22/15 11:27 AM|FINE]CreateGeofenceAction Last location is Location[fused 37.870491,-122.260089 acc=30 et=+3d18h0m19s199ms] creating geofence
[51|1442946453748|9/22/15 11:27 AM|FINE]CreateGeofenceAction creating geofence at location 37.8704911, -122.2600891
[53|1442946453960|9/22/15 11:27 AM|FINE]TripDiaryStateMachineService newState after handling action is local.state.waiting_for_trip_start

Almost immediately, the geofence exits so we read the value from a different context.

The value read is still the old one (ongoing_trip) so we ignore it.

[1|1442946547133|9/22/15 11:29 AM|FINE]GeofenceExitIntentService onCreate called
[3|1442946547349|9/22/15 11:29 AM|FINE]GeofenceExitIntentService onStartCommand called with intent Intent { cmp=edu.berkeley.eecs.cfc_tracker/.location.GeofenceExitIntentService (has extras) } flags 0 startId 1
[5|1442946548410|9/22/15 11:29 AM|FINE]GeofenceExitIntentService onStart called with startId 1
[7|1442946548482|9/22/15 11:29 AM|FINE]GeofenceExitIntentService geofence exit intent action = null
[9|1442946548879|9/22/15 11:29 AM|FINE]GeofenceExitIntentService got geofence intent callback with type 2 and location Location[fused 37.870813,-122.258860 acc=21 et=+3d18h1m55s322ms bear=9.0]
[11|1442946548953|9/22/15 11:29 AM|FINE]GeofenceExitIntentService Geofence exited! Intent = Intent {cmp=edu.berkeley.eecs.cfc_tracker/.location.GeofenceExitIntentService (has extras) } Starting ongoing
 monitoring...
[1|1442946550661|9/22/15 11:29 AM|FINE]TripDiaryStateMachineReceiver TripDiaryStateMachineReciever onReceive(android.app.ReceiverRestrictedContext@421d54b0, Intent { act=local.transition.exited_geofence flg=0x10 cmp=edu.berkeley.eecs.cfc_tracker/.location.TripDiaryStateMachineReceiver }) called
[3|1442946550829|9/22/15 11:29 AM|FINE]TripDiaryStateMachineService after reading from the prefs, the current state is local.state.ongoing_trip
[5|1442946551237|9/22/15 11:29 AM|FINE]TripDiaryStateMachineService TripDiaryStateMachineReceiver onConnected(null) called
[7|1442946551300|9/22/15 11:29 AM|FINE]TripDiaryStateMachineService TripDiaryStateMachineReceiver handleAction(local.state.ongoing_trip, local.transition.exited_geofence) called
[9|1442946551371|9/22/15 11:29 AM|FINE]TripDiaryStateMachineService TripDiaryStateMachineReceiver handleTripEnd(local.transition.exited_geofence) called

I can think of two possible reasons:

  1. There is a race - the apply() hasn't yet saved the state to preferences before the geofence reads it. The argument in favor of this case is that the gap between the end and the next start is small. But on the other hand, the gap is a couple of minutes, which is an eternity in computer time. But we also don't know how aggressive the android system is.
  2. Note though that the code expects to print a "newState saved in prefManager is " line after the apply() but it doesn't. That argues that maybe the service was also terminated aggressively, before the apply() was invoked. But we have moved to a service.

Next steps - figure out the lifecycle of an intent service by reading docs. I have already done this a bit, and it is not clear when it is terminated but I could try some more.

Next, I will try two options, in this order:

  1. switch to commit()
  2. switch to a generic service and call stopSelf() only after the new state is set
shankari commented 9 years ago

Actually, this is probably a result of (2) above. This stackoverflow answer (http://stackoverflow.com/questions/15524280/service-vs-intentservice) linked to the source code of IntentService https://android.googlesource.com/platform/frameworks/base/+/refs/heads/master/core/java/android/app/IntentService.java which is pretty clear.

Once the onHandleIntent() is completed, stopSelf() is called. This means that any async tasks launched from onHandleIntent() are not guaranteed to finish.

Will fix this by using a standard Service instead of an IntentService and calling stopSelf from the setNewState method to ensure that we wait until the state is set before the service is terminated.

sfwatergit commented 9 years ago

There may be an issue with maintenance of the wake lock when activating the phone following sleep. You might visited the wakeful intent service mentioned earlier. I have never find I/o issues using this dependency. If you prefer not to use an external library, consider the nearly equivalent wakeful broadcast receiver. On Sep 22, 2015 9:34 PM, "shankari" notifications@github.com wrote:

Actually, this is probably a result of (2) above. This stackoverflow answer ( http://stackoverflow.com/questions/15524280/service-vs-intentservice) linked to the source code of IntentService

https://android.googlesource.com/platform/frameworks/base/+/refs/heads/master/core/java/android/app/IntentService.java which is pretty clear.

Once the onHandleIntent() is completed, stopSelf() is called. This means that any async tasks launched from onHandleIntent() are not guaranteed to finish.

Will fix this by using a standard service instead of an IntentService and calling stopSelf from the setNewState method to ensure that we wait until the state is set before the service is terminated.

— Reply to this email directly or view it on GitHub https://github.com/e-mission/e-mission-data-collection/issues/36#issuecomment-142487475 .

shankari commented 9 years ago

@sfwatergit I really don't think that a wakelock is the issue. While I am using a BroadcastReceiver, it is not triggered from an alarm, which is admittedly the most common BroadcastReceiver. Instead, I am generating the broadcasts in my code, which is invoked from IntentServices that handle various location based intents that I registered for.

From the documentation for your suggested library (emphasis mine)

The recommended pattern for Android's equivalent to cron jobs and Windows scheduled tasks is to use AlarmManager. This works well when coupled with an IntentService, as the service will do its work on a background thread and shut down when there is no more work to do.

There's one small problem: IntentService does nothing to keep the device awake. If the alarm was a WAKEUP variant, the phone will only stay awake on its own while the BroadcastReceiver handling the alarm is in its onReceive() method. Otherwise, the phone may fall back asleep.

I have checked in my fix to use a regular Service instead of an IntentService, and not hit this for a while. Fingers crossed...

shankari commented 9 years ago

Should be fixed by ec6370389abbbf98f1e63b6a14d1b8c6d1e52d32