e-mission / e-mission-docs

Repository for docs and issues. If you need help, please file an issue here. Public conversations are better for open source projects than private email.
https://e-mission.readthedocs.io/en/latest
BSD 3-Clause "New" or "Revised" License
15 stars 34 forks source link

Push user labels to the server without waiting for trip end #640

Open shankari opened 3 years ago

shankari commented 3 years ago

the sync mechanism to push data from the phone to the server was really designed for background operation. In order to avoid mucking up the trip end detection on the phone, it pushes data only when there is a completed trip, and that too, only the data upto the completed trip.

/tmp/2021-05-26_Walking_tr_America_Los_Angeles.loggerDB.withdate.log:4986,1621877304.9870002,2021-05-24T10:28:24.987000-07:00,"BuiltinUserCache : We don't have a completed trip, so we don't want to push anything yet"
/tmp/2021-05-26_Walking_tr_America_Los_Angeles.loggerDB.withdate.log:5030,1621881102.5779998,2021-05-24T11:31:42.578000-07:00,"BuiltinUserCache : We don't have a completed trip, so we don't want to push anything yet"
/tmp/2021-05-26_Walking_tr_America_Los_Angeles.loggerDB.withdate.log:9698,1621898346.755,2021-05-24T16:19:06.755000-07:00,"BuiltinUserCache : We don't have a completed trip, so we don't want to push anything yet"

This is fine for sensed data, but it has always been a bone of contention about labels - e.g. https://github.com/e-mission/e-mission-docs/issues/351

Although the labels are visible on the phone, people are unhappy about the delay in pushing them to the server. These people include:

The program participants notice it because the labeling rate is on the leaderboard, they update the labels, but the leaderboard doesn't get updated.

There are a couple of potential solutions:

The second option is definitely the more intuitive to the user, but I am concerned about regressing in terms of robustness. We have been really careful to ensure that users don't require a data plan to use the app in an ongoing fashion - the new approach will mean that they will need to have internet access at any time that they want to label the trips. What if they want to label at a bus stop?

Another option is to use the synchronous option, and fall back to the async if the sync fails. That is great, except for potential timing and ordering issues. What if we had an entry that was stored locally, and then when the user got access to WiFi, they labeled trips before the saved entries were uploaded?

We would mark the incremental processing as done until the synchronous save, and the asynchronously saved entries, which were saved earlier will be lost.

Need to think through how to handle this properly, and should get input on what people expect.

shankari commented 3 years ago

One option is to update the confirmed_trip directly while using the manual/ for long-term reproducibility. With this, when the user tries to label a trip, we: a) save it into the manual/ entries on the phone, and b) update the confirmed_trip on the server synchronously

In the common case, where the user does have access to the network, the data is available on the server immediately, and the leaderboard will be updated on the next run. We could even have the "update confirmed_trip` function launch the leaderboard update immediately to see Instant Updates.

In the uncommon case, where the user does not have access to the network, the data is still cached locally and updated when possible. No guarantees about when your updates will make it to the leaderboard in that case.

shankari commented 2 years ago

Right now, user labels are stored in the usercache: multilabel: https://github.com/e-mission/e-mission-phone/blob/master/www/js/survey/multilabel/multi-label-ui.js#L207 enketo-trip-button: https://github.com/e-mission/e-mission-phone/blob/master/www/js/survey/enketo/service.js#L138 so they are basically treated just like location points or motion activity points and are pushed up when the next trip ends but they can be labeled offline as well.

shankari commented 2 years ago

Design constraints:

  1. async backup is an absolute requirement. we cannot lose labels if users are inspired to give them to us
  2. would also be nice to have the dashboard and the leaderboard update instantaneously when the labels are pushed because the users don't really care about where the labels are; they just want to make sure that they see the results (aka "the leaderboard works")

For review: the steps that we need to get the leaderboard to work are:

  1. label get to server (will be put into server-side usercache to begin with)
  2. label is moved from usercache to timeseries in the moveToLongTerm
  3. newly arrived labels are matched to any existing confirmed trips in matchIncomingLabels
  4. we process all the sensor data that came in and create cleaned trips
  5. match the cleaned trips to any existing labels that had come in before we processed the trip in create_confirmed_objects and create confirmed trips. Cleaned trips without matching labels will still have confirmed trips, but with empty labels.

I would suggest looking up the matching pipeline design and really understanding it, and then seeing if there are any timing issue with pushing the data synchronously.

My quick take is that I designed it well enough that there won't be, but we should definitely write out the scenarios and convince ourselves.

Related issue and initial high level design: https://github.com/e-mission/e-mission-docs/issues/476#issuecomment-561727669 The issue also has some other high level designs questions around the mismatch between the trip and section objects which you can keep in mind for a future improvement, but let's focus on trips for now.

Related PR: https://github.com/e-mission/e-mission-server/pull/780

shankari commented 2 years ago
  1. This is an end to end feature, so you will edit both the phone and the server.
  2. the functions where we currently save the data are linked above. Both of them call a native plugin to store the data in the usercache, which is an SQLite database window.cordova....BEMUserCache.putMessage(...
  3. putMessage puts the JSON representation of the label {label: "foo", ts: 232} into the sqlite database which is stored locally on the phone
  4. e-mission server, as part of the intake pipeline (emission/pipeline/intake.py) takes the raw location points and creates trip objects
  5. during a trip, e-mission-phone senses the data and stores it locally in the SQLIte database. It also runs very basic heuristics on trip end and trip start detection. When it detects that the trip is complete on the phone, it sends all the data collected since the last time it pushed and pushes it to the server. It does not do any other analysis - no section segmentation, mode detection, smoothing, etc. There's a periodic sync that retries once an hour or so (doze mode) if there is no internet at the end of the trip.
  6. The trip points are deleted from the phone once they are pushed to the server. But there could still be a gap between the trip end detection and the push and the push and the pipeline completion, so in that period, we read the "transitions" that the phone detected (either from server or from phone) and recreate "draft" trips that have no mode etc
  7. So we will only push data when trips complete. So
    • finish trip at 4pm today, no labels not pushed
    • if I label at 5pm today, no trip ended, so not pushed immediately.
    • periodic sync runs at 6pm, no trip end detected, so not pushed
    • periodic sync runs at 7pm, no trip end detected, so not pushed
    • .....
    • 9am tomorrow, start trip
    • 9:30am tomorrow, end trip, push all data since last push, labels are pushed
shankari commented 2 years ago

I will share:

sebastianbarry commented 2 years ago

Here is how I think the problem looks, using an example of a user who has the app downloaded:

Note: The label hasn't been input yet


The problem being, if they don't have internet conneciton, the leaderboard does not update.

shankari commented 2 years ago

@sebastianbarry as we discussed, I will share:

Can you let me know when you have labeled your trips and then taken a trip (walk/run/etc) so that we know that the relevant logs will be in the logs that I share?

sebastianbarry commented 2 years ago

I have access to the logs and can view them, but I am not entirely sure how to decipher them. When you are available, we should quickly go over how to read the logs, and compare them in order to figure out where the discrepancy (or delay) between labeling trips / the leaderboard score is located

I can also get on this weekend if that would be easier for you! :)

sebastianbarry commented 2 years ago

Rewriting this comment, with numbering instead of dotting


Here is how I think the problem looks, using an example of a user who has the app downloaded:

  1. The user leaves their house and drives to the store
  2. While the user is out, raw location data (stored as location points) is being stored on the SQLite database local to the phone
  3. The local device is looking at the raw location data, and can tell that the user has departed (and therefore the trip has ended), by the behavior of the raw location points a. trip start is stored on the SQLite server and thus begins a new trip with start_location and points throughout the trip
  4. The user arrives at the store
  5. The local device is looking at the raw location data, and can tell that the user has arrived (and therefore the trip has ended), by the behavior of the raw location points a. trip end is stored on the SQLite server on the same object as the trip start, and the trip has been completed. b. This trip object is sent to the e-mission-server i. Analysis and server-related things happen (not sure exactly what happens on the server, but is not so important) ii. ### The user's leaderboard is updated with a completed trip which has not been labeled yet, lowering their leaderboard score. If the user views the leaderboard at this point, their device would ask the server what their score is, and because this trip is not labeled yet, the server would respond with their score not being 100%
  6. The e-mission-phone deletes the trip entry out of the SQLite server
  7. The e-mission-server app sends a new trip object (of their recent" trip to the store") to the device after analysis has happened a. This way, the Diary reads from this new container that the server has sent the trip object to, to generate the trip "Label" tab of the app

Note: The label hasn't been input yet

  1. The user inputs their trip label while at the store, but they lost their internet connection a. The trip label object gets saved to a local version on the SQLite database, awaiting for internet before re-uploading the trip label object to the server b. ### If the user views the leaderboard at this point, their device would ask the server what their score is, but because they can't see the server, the server would respond with their score as not being 100%
  2. The user re-establishes connection to the internet again
  3. The app detects internet has been restored, and uploads the contents of the SQLite database to the e-mission-server: the trip label object, together with the trip_ID
  4. The e-mission-server completes the trip object, with the newly acquired trip-label data a. ### The user's leaderboard is updated with a completed and labeled trip, raising their leaderboard score. If the user views the leaderboard at this point, their device would ask the server what their score is, and because this trip is not labeled yet, the server would respond with their score being 100%

The problem being, if they don't have internet conneciton, the leaderboard does not update.

shankari commented 2 years ago

@sebastianbarry There are some areas where this doesn't work the way that you expect. I think that your expectation is based on how other apps work, but this app doesn't work that way, at least not yet.