Push user labels to the server without waiting for trip end

e-mission / e-mission-docs

Repository for docs and issues. If you need help, please file an issue here. Public conversations are better for open source projects than private email.

https://e-mission.readthedocs.io/en/latest

BSD 3-Clause "New" or "Revised" License

15 stars 34 forks source link

Push user labels to the server without waiting for trip end #640

Open shankari opened 3 years ago

shankari commented 3 years ago

the sync mechanism to push data from the phone to the server was really designed for background operation. In order to avoid mucking up the trip end detection on the phone, it pushes data only when there is a completed trip, and that too, only the data upto the completed trip.

/tmp/2021-05-26_Walking_tr_America_Los_Angeles.loggerDB.withdate.log:4986,1621877304.9870002,2021-05-24T10:28:24.987000-07:00,"BuiltinUserCache : We don't have a completed trip, so we don't want to push anything yet"
/tmp/2021-05-26_Walking_tr_America_Los_Angeles.loggerDB.withdate.log:5030,1621881102.5779998,2021-05-24T11:31:42.578000-07:00,"BuiltinUserCache : We don't have a completed trip, so we don't want to push anything yet"
/tmp/2021-05-26_Walking_tr_America_Los_Angeles.loggerDB.withdate.log:9698,1621898346.755,2021-05-24T16:19:06.755000-07:00,"BuiltinUserCache : We don't have a completed trip, so we don't want to push anything yet"

This is fine for sensed data, but it has always been a bone of contention about labels - e.g. https://github.com/e-mission/e-mission-docs/issues/351

Although the labels are visible on the phone, people are unhappy about the delay in pushing them to the server. These people include:

the survey data collectors
at least two of the program participants

The program participants notice it because the labeling rate is on the leaderboard, they update the labels, but the leaderboard doesn't get updated.

There are a couple of potential solutions:

we could push updates to manual/* entries even though there is no completed trip, since they do not affect trip detection in any way
we could use the new putOne method or write something similar to push data directly to the server https://github.com/e-mission/e-mission-server/blob/master/emission/net/api/cfc_webapp.py#L284, which is the earlier https://github.com/e-mission/e-mission-docs/issues/351

The second option is definitely the more intuitive to the user, but I am concerned about regressing in terms of robustness. We have been really careful to ensure that users don't require a data plan to use the app in an ongoing fashion - the new approach will mean that they will need to have internet access at any time that they want to label the trips. What if they want to label at a bus stop?

Another option is to use the synchronous option, and fall back to the async if the sync fails. That is great, except for potential timing and ordering issues. What if we had an entry that was stored locally, and then when the user got access to WiFi, they labeled trips before the saved entries were uploaded?

We would mark the incremental processing as done until the synchronous save, and the asynchronously saved entries, which were saved earlier will be lost.

Need to think through how to handle this properly, and should get input on what people expect.

shankari commented 3 years ago

One option is to update the confirmed_trip directly while using the manual/ for long-term reproducibility. With this, when the user tries to label a trip, we: a) save it into the manual/ entries on the phone, and b) update the confirmed_trip on the server synchronously

In the common case, where the user does have access to the network, the data is available on the server immediately, and the leaderboard will be updated on the next run. We could even have the "update confirmed_trip` function launch the leaderboard update immediately to see Instant Updates.

In the uncommon case, where the user does not have access to the network, the data is still cached locally and updated when possible. No guarantees about when your updates will make it to the leaderboard in that case.

shankari commented 2 years ago

Right now, user labels are stored in the usercache: multilabel: https://github.com/e-mission/e-mission-phone/blob/master/www/js/survey/multilabel/multi-label-ui.js#L207 enketo-trip-button: https://github.com/e-mission/e-mission-phone/blob/master/www/js/survey/enketo/service.js#L138 so they are basically treated just like location points or motion activity points and are pushed up when the next trip ends but they can be labeled offline as well.

shankari commented 2 years ago

Design constraints:

async backup is an absolute requirement. we cannot lose labels if users are inspired to give them to us
would also be nice to have the dashboard and the leaderboard update instantaneously when the labels are pushed because the users don't really care about where the labels are; they just want to make sure that they see the results (aka "the leaderboard works")

For review: the steps that we need to get the leaderboard to work are:

label get to server (will be put into server-side usercache to begin with)
label is moved from usercache to timeseries in the moveToLongTerm
newly arrived labels are matched to any existing confirmed trips in matchIncomingLabels
we process all the sensor data that came in and create cleaned trips
match the cleaned trips to any existing labels that had come in before we processed the trip in create_confirmed_objects and create confirmed trips. Cleaned trips without matching labels will still have confirmed trips, but with empty labels.

I would suggest looking up the matching pipeline design and really understanding it, and then seeing if there are any timing issue with pushing the data synchronously.

My quick take is that I designed it well enough that there won't be, but we should definitely write out the scenarios and convince ourselves.

Related issue and initial high level design: https://github.com/e-mission/e-mission-docs/issues/476#issuecomment-561727669 The issue also has some other high level designs questions around the mismatch between the trip and section objects which you can keep in mind for a future improvement, but let's focus on trips for now.

shankari commented 2 years ago

This is an end to end feature, so you will edit both the phone and the server.
the functions where we currently save the data are linked above. Both of them call a native plugin to store the data in the usercache, which is an SQLite database window.cordova....BEMUserCache.putMessage(...
putMessage puts the JSON representation of the label {label: "foo", ts: 232} into the sqlite database which is stored locally on the phone
e-mission server, as part of the intake pipeline (emission/pipeline/intake.py) takes the raw location points and creates trip objects
during a trip, e-mission-phone senses the data and stores it locally in the SQLIte database. It also runs very basic heuristics on trip end and trip start detection. When it detects that the trip is complete on the phone, it sends all the data collected since the last time it pushed and pushes it to the server. It does not do any other analysis - no section segmentation, mode detection, smoothing, etc. There's a periodic sync that retries once an hour or so (doze mode) if there is no internet at the end of the trip.
The trip points are deleted from the phone once they are pushed to the server. But there could still be a gap between the trip end detection and the push and the push and the pipeline completion, so in that period, we read the "transitions" that the phone detected (either from server or from phone) and recreate "draft" trips that have no mode etc
So we will only push data when trips complete. So
- finish trip at 4pm today, no labels not pushed
- if I label at 5pm today, no trip ended, so not pushed immediately.
- periodic sync runs at 6pm, no trip end detected, so not pushed
- periodic sync runs at 7pm, no trip end detected, so not pushed
- .....
- 9am tomorrow, start trip
- 9:30am tomorrow, end trip, push all data since last push, labels are pushed

shankari commented 2 years ago

I will share:

your uploaded logs (you should convert the sqlite to human readable with timestamps using the script in /bin, see document in e-mission-docs repo)
the webserver logs
the intake logs

sebastianbarry commented 2 years ago

Here is how I think the problem looks, using an example of a user who has the app downloaded:

The user leaves their house and drives to the store
While the user is out, raw location data (stored as location points) is being stored on the SQLite database local to the phone
The local device is looking at the raw location data, and can tell that the user has departed (and therefore the trip has ended), by the behavior of the raw location points
- trip start is stored on the SQLite server and thus begins a new trip with start_location and points throughout the trip
The user arrives at the store
The local device is looking at the raw location data, and can tell that the user has arrived (and therefore the trip has ended), by the behavior of the raw location points
- trip end is stored on the SQLite server on the same object as the trip start, and the trip has been completed.
- This trip object is sent to the e-mission-server
- Analysis and server-related things happen (not sure exactly what happens on the server, but is not so important)
- The user's leaderboard is updated with a completed trip which has not been labeled yet, lowering their leaderboard score. If the user views the leaderboard at this point, their device would ask the server what their score is, and because this trip is not labeled yet, the server would respond with their score not being 100%
The e-mission-phone deletes the trip entry out of the SQLite server
The e-mission-server app sends a new trip object (of their recent" trip to the store") to the device after analysis has happened
- This way, the Diary reads from this new container that the server has sent the trip object to, to generate the trip "Label" tab of the app

Note: The label hasn't been input yet

The user inputs their trip label while at the store, but they lost their internet connection
- The trip label object gets saved to a local version on the SQLite database, awaiting for internet before re-uploading the trip label object to the server
- If the user views the leaderboard at this point, their device would ask the server what their score is, but because they can't see the server, the server would respond with their score as not being 100%
The user re-establishes connection to the internet again
The app detects internet has been restored, and uploads the contents of the SQLite database to the e-mission-server: the trip label object, together with the trip_ID
The e-mission-server completes the trip object, with the newly acquired trip-label data
- The user's leaderboard is updated with a completed and labeled trip, raising their leaderboard score. If the user views the leaderboard at this point, their device would ask the server what their score is, and because this trip is not labeled yet, the server would respond with their score being 100%

The problem being, if they don't have internet conneciton, the leaderboard does not update.

shankari commented 2 years ago

@sebastianbarry as we discussed, I will share:

your uploaded logs (you should convert the sqlite to human readable with timestamps using the script in /bin, see document in e-mission-docs repo)
the webserver logs
the intake logs

Can you let me know when you have labeled your trips and then taken a trip (walk/run/etc) so that we know that the relevant logs will be in the logs that I share?

sebastianbarry commented 2 years ago

I have access to the logs and can view them, but I am not entirely sure how to decipher them. When you are available, we should quickly go over how to read the logs, and compare them in order to figure out where the discrepancy (or delay) between labeling trips / the leaderboard score is located

I can also get on this weekend if that would be easier for you! :)

sebastianbarry commented 2 years ago

Rewriting this comment, with numbering instead of dotting

Here is how I think the problem looks, using an example of a user who has the app downloaded:

The user leaves their house and drives to the store
While the user is out, raw location data (stored as location points) is being stored on the SQLite database local to the phone
The local device is looking at the raw location data, and can tell that the user has departed (and therefore the trip has ended), by the behavior of the raw location points a. trip start is stored on the SQLite server and thus begins a new trip with start_location and points throughout the trip
The user arrives at the store
The local device is looking at the raw location data, and can tell that the user has arrived (and therefore the trip has ended), by the behavior of the raw location points a. trip end is stored on the SQLite server on the same object as the trip start, and the trip has been completed. b. This trip object is sent to the e-mission-server i. Analysis and server-related things happen (not sure exactly what happens on the server, but is not so important) ii. ### The user's leaderboard is updated with a completed trip which has not been labeled yet, lowering their leaderboard score. If the user views the leaderboard at this point, their device would ask the server what their score is, and because this trip is not labeled yet, the server would respond with their score not being 100%
The e-mission-phone deletes the trip entry out of the SQLite server
The e-mission-server app sends a new trip object (of their recent" trip to the store") to the device after analysis has happened a. This way, the Diary reads from this new container that the server has sent the trip object to, to generate the trip "Label" tab of the app

Note: The label hasn't been input yet

The user inputs their trip label while at the store, but they lost their internet connection a. The trip label object gets saved to a local version on the SQLite database, awaiting for internet before re-uploading the trip label object to the server b. ### If the user views the leaderboard at this point, their device would ask the server what their score is, but because they can't see the server, the server would respond with their score as not being 100%
The user re-establishes connection to the internet again
The app detects internet has been restored, and uploads the contents of the SQLite database to the e-mission-server: the trip label object, together with the trip_ID
The e-mission-server completes the trip object, with the newly acquired trip-label data a. ### The user's leaderboard is updated with a completed and labeled trip, raising their leaderboard score. If the user views the leaderboard at this point, their device would ask the server what their score is, and because this trip is not labeled yet, the server would respond with their score being 100%

The problem being, if they don't have internet conneciton, the leaderboard does not update.

shankari commented 2 years ago

@sebastianbarry There are some areas where this doesn't work the way that you expect. I think that your expectation is based on how other apps work, but this app doesn't work that way, at least not yet.

7 (a) is a bit complicated. The server doesn't send the trip to the phone (how could it? think about client/server architecture and connections). Instead,
- for the diary, the phone pulls information from the server periodically using usercache/get calls and caches it locally
- for the label view, the phone loads the trips when it is launched
- as part of unifying the views, we will probably just load the trips when the app is launched, although we may want to have a cache as a stretch goal
8 Let's say they input their label at the store, but that they didn't lose their internet connection, what would happen?
10 more changes here:
- A. the app doesn't detect that the internet has been restored. I know that there are other apps that do, but it is functionality that you would need to build in, and we have not yet done so. Note that in order to "detect that the internet has been restored", either:
- a. the underlying phone OS needs to be able to notify the app that internet status has changed, OR
- b. the app needs to constantly check whether the internet is up Option (b) above will increase the power consumption of the app if we check too frequently since the app needs to be woken up continuously. Option (a) would be ideal, but I am not sure iOS supports such notifications (would need to check).
- B. the app does not upload the contents of the database to the server unless a trip has just ended. I want to highlight this again. Even if the internet is now available, the app WILL NOT upload the data. It will only do so after the user takes their next trip. Concretely:
  - user comes home from the grocery store
  - user's phone connects to home WiFi
  - nothing happens
  - user goes to work the next day
  - user's phone connects to work WiFi
  - user's phone uploads trip data + labels that were inputed at the grocery store

e-mission / e-mission-docs

Push user labels to the server without waiting for trip end #640

Design constraints:

If the user views the leaderboard at this point, their device would ask the server what their score is, but because they can't see the server, the server would respond with their score as not being 100%

Rewriting this comment, with numbering instead of dotting