Common trip system building

shankari commented 3 years ago

This tracks the tasks required to actually close the loop on user interaction with labels.

Once we have determined common v/s novel trips for a user, we need to represent those in a data structure (both)
- This can be a tag on a trip, or a separate structure of its own, similar to the earlier tour model graph structure
We need to determine how we display these to the user (UX design + implementation)
- we should distinguish between inferred labels and user labels because the user may want to correct inferred labels
we may want to highlight novel trips so make it easier for users to find and label them
- should this be notifications, or just a new screen in the app? maybe repurpose the "labels" screen?
we need to be able to update the data structure incrementally (server)
- we need to be able to run the modeling script maybe once a day or once a week and use only the new trips to update the data structure
- there is already code to read the new trips that need to be considered (the pipeline code) but the data structure needs to have enough information to support an incremental update
- incremental update needs to drop old common trips not just keep adding them because otherwise it will get too large

shankari commented 3 years ago

to give some additional context: e-mission is currently written in angular1 also called angularJS on the internet. It is also a reactive framework in which you make changes to $scope variables in a controller which are automatically updated in the HTML.

So concretely, if I want to display a username, I can have <h3>{{username}}</h3> in the HTML, set $scope.username = "Jenna"; in the javascript controller and it will show up. It also has some additional tags to make the UI coding easier, so for example, (syntax may be a bit off, high level workflow only)

<ion-list ng-repeat="trip in tripList">
    <ion-item>{{trip.start_fmt_time}} -> {{trip.end_fmt_time}}</ion-item>
</ion-list>

then if the javascript has $scope.tripList = [{start_fmt_time: "...", end_fmt_time: "..."}, {...}, {...}, ...] then the list will show all the trips.

shankari commented 3 years ago

If you have built the native version of the app (the second set of instructions on the README) you should be able to open this app in your IDE.

https://github.com/e-mission/e-mission-docs/blob/4fad0c4dd82893529bdc808fc5a8329f98ea7367/docs/dev/front/how_to_test_changes%20_to_a_plugin.md#2-then-open-the-project-in-your-ide

and on your phone if you like

https://codewithchris.com/deploy-your-app-on-an-iphone/

shankari commented 3 years ago

If you have built the UI-only change version of the app (first set of instructions in the README), you need to install the e-mission-devapp in an emulator and then connect to the live-reloading dev server. Please submit a PR to clarify this in the README.

shankari commented 3 years ago

The previous attempts at creating a graph of common trips are generally in https://github.com/e-mission/e-mission-server/tree/master/emission/analysis/modelling/tour_model

The original goal was to create a graph of people's regular patterns, similar to tour_model

We called it a tour model because that is apparently what such models are called in the travel behavior literature.

shankari commented 3 years ago

The data representation (I think) https://github.com/e-mission/e-mission-server/blob/master/emission/analysis/modelling/tour_model/tour_model.py which is a classic graph data structure. Not sure if that is the correct data structure we need to use now.

This has some wrapper objects: https://github.com/e-mission/e-mission-server/blob/master/emission/analysis/modelling/tour_model/tour_model_matrix.py

And this is how the matrix was created: https://github.com/e-mission/e-mission-server/blob/master/emission/analysis/modelling/tour_model/create_tour_model_matrix.py

Note that this is not incremental, and we will almost certainly need incremental creation in order to avoid overwhelming the server over a two year data collection period.

shankari commented 3 years ago

On the phone, this data was retrieved and displayed in the UI shown above.

we just read it from the local cache, we saved to the cache in the OUTPUT_GEN step of the pipeline https://github.com/e-mission/e-mission-server/blob/master/emission/net/usercache/builtin_usercache_handler.py#L204

shankari commented 3 years ago

Ideas:

group people by how regular their travel patterns are (what % of trips are novel)
- one with regular patterns (e.g. office worker)
- one with irregular patterns (e.g. multiple jobs - restaurant worker - or traveling job - plumber, delivery person)
walking the dog
- trip routes (and maybe distance and duration) are different but they are not really novel
- cannot distinguish algorithmically
- supports displaying list of trips
- label multiple trips with single click (select 10 trips from the list and say "walking the dog" for all of them)
going to friends' house
- may be different routes
- different routes = clusters because they can represent different travel modes (car route will likely be different from bus route)
other examples
- grocery shopping but to different locations

Critical from energy standpoint is mode, replaced_mode or distance Prompt every day upto once a week

Hypothetical: use same route while walking dog
- let's say we have 30 trips in a week
- we look at start, end, distance, duration and potentially trajectory
- all the 10 dog walk trips get into the same cluster
- we don't need labels for all the trips, we can cluster based on sensor information alone
- if at least one trip in the cluster is labeled, the proposal is to label all the other trips in the cluster with the same labels
Minimum user interaction would just set the labels for the trips on the server as part of the algorithm
More user interaction would involve confirmation from the user (taking a sample of trips that were labeled by algorithm, display to user for one-item feedback). This can be another way of quickly getting user confirmation for groups of trips without having to label each one, but without relying wholly on inference.
if the algorithm was wrong, and user re-labels, next round will re-cluster. no need to ask users for additional input using a micro-survey etc.

As few opportunities for open response as possible; reduce the other category as much as possible. walking the dog: walk, recreation/exercise, no travel

drove_alone, shared_ride problem: cannot distinguish between them. Only factor is through user feedback
other examples: eBike. pilot eBike
only ones for mode, purpose is fuzzier

suggestion: both drove_alone and shared_ride in cluster. Look at the proportion of labels and assign to new trips accordingly. So over the course of a month or so, the energy impact will be accurate. Need to balance user input and accuracy.

But would be then show these to the user for confirmation as well? Show intermittently: "if people used to carpool but got tired of their travel companions and stopped, need to know" show this maybe once a month. Tricky because assigning from a distribution means that individual trips won't be accurate, but also asking how many times do you think you carpooled in the past month is subject is recall bias.

Ask for a week's worth of data every few months. This might be good not just for these mixed clusters but for all trips in general. Go back to primary data collection mode every few weeks.

When we are in "secondary data collection mode" will we still ask for confirmation of automatically assigned labels? Yes, to some extent until we gain confidence in the algorithm assessment.

shankari commented 3 years ago

for analysis alone, we want to assign labels based on the clusters that @corinne-hcr has built so far and evaluate accuracy. This does not appear to be a very heavy lift. As part of system integration, however, we need to use those clusters as a model. In other words, we need to:

store the clusters (aka save the model)
update the model periodically
apply the model to label trips

Store the clusters

We need to come up with a data structure to store the clusters. This might be as simple as a list of lists e.g.

{cluster1: [tripid1, tripid2, tripid3,...],
 cluster2: [tripid4, tripid5, tripid10,...],
}

Update the clusters

When should we update this model? We will create it after an initial intensive "primary data collection period" but then we will go to "secondary data collection" for a few months in which we ask for confirmation for trips but not full trip labeling. Do we assume that people will confirm and correct the majority of their trips during the secondary data collection period?

If so, we will have to rebuild the model periodically (like maybe once a week)
If not, we can wait until the next round of primary data collection; we said "this might be a week every few months"

This might be a question to ask @andyduvall to weigh in on.

If we rebuild the model every week, we may want to consider incremental updates in which we only add new trips to existing clusters, but that might be hard to do. We may want to wait on that and treat it as a future performance enhancement.

Apply the model

As we get new unlabeled trips, we need to access the stored cluster model, and assign labels according to the existing clusters. This will likely involve a more complicated data structure because just storing the trip ids may not be enough to figure out which cluster a new trip should fall into. We may need to store some cluster level attributes like distance/duration cutoffs so we can efficiently determine how to match the trip.

This does need to be incremental as in we cannot recluster every time we label a new trip. We must use stored clusters. This is because we get ~ 5-10 trips a day per person and the cluster algorithm can be space and time intensive. I turned off the tour model earlier because it was too slow (need to get some numbers). Note also that @corinne-hcr cannot run two versions of the pipeline side by side while restructuring on her laptop. And rebuilding the model every trip will result in rebuilding multiple time a day for each user.

Note also that there is a latency issue with taking too long - if the user looks at the diary and we haven't run the inference yet, it will look like there were no labels and the algorithm is "not working".

shankari commented 3 years ago

System design of labeling trips:

we should have a new pipeline stage (e.g. INFER_LABELS) that works with the current set of trips and fills in the labels)
to do this, it will presumably call a infer_labels module + method which will read in the stored model and match the incoming trip to it and return the labels
- the current pipeline states work like this:
  - you call something like infer_labels.infer_labels
  - that step automatically determines the time range for trips that have not yet been processed (e.g. https://github.com/e-mission/e-mission-server/blob/gis-based-mode-detection/emission/analysis/classification/inference/mode/rule_engine.py#L60)
  - it reads those trips, and calls a sub-method (infer_labels.infer_label(trip), e.g. https://github.com/e-mission/e-mission-server/blob/gis-based-mode-detection/emission/analysis/classification/inference/mode/rule_engine.py#L109)
  - the sub-method actually creates potentially multiple data structure(s) with the inference. this allows us to have multiple inferences per trip (potentially from different clusters or from different algorithms) and use an ensemble method over them
  - the sub-method picks one from the ensemble and fills in the confirmed_trip and saves it (e.g. https://github.com/e-mission/e-mission-server/blob/gis-based-mode-detection/emission/analysis/classification/inference/mode/rule_engine.py#L78). Note that you will need to add additional fields to the confirmed_trip object (in emission/core/wrapper) to support this.
  - the UI already reads confirmed trips to fill in the "labels" screen, for example, so it should be able to take it from there.

shankari commented 3 years ago

Note that we may want to use different clusterings from different folds, generate labels for each clustering and determine the confidence of non-mixed clusters (not mixed shared_ride and carpool) depending on whether all the models return the same labels (e.g. even for the "work" part of the example we have been using all along).

shankari commented 3 years ago

Current "label screen" javascript is here https://github.com/e-mission/e-mission-phone/blob/master/www/js/diary/infinite_scroll_list.js

In particular, note that Timeline.readAllConfirmedTrips(currEnd, ONE_WEEK).then((ctList) should return all confirmed trips for the last week, and if you print them out, or use a debugger to view ctList, you should be able to see your new fields.

Here's where we embed that data structure in the UI: https://github.com/e-mission/e-mission-phone/blob/master/www/templates/diary/infinite_scroll_list.html

notably <div ng-repeat="input in userInputDetails" class={{input.width}} style="text-align: center;" ng-attr-id="{{ 'userinput' + input.name"> iterates through all the user inputs and displays them in {{trip.userInput[input.name].text}}

shankari commented 3 years ago

@GabrielKS, can you put in your high level design into this issue? @PatGendre, who is deploying the platform in La Rochelle is interested. @asiripanich may be too. As deployers, as opposed to end-users, they are more likely to give feedback on the configuration options 😄

GabrielKS commented 3 years ago

@PatGendre @asiripanich @shankari I've attached the second draft of my trip label inference system UI proposal in PDF and Word form; I've pasted the executive summary below.

UI Draft 2.pdf UI Draft 2.docx

Behind the scenes, a server-side trip inference algorithm will produce a data structure that comprises, for each trip, a list of label sets with probabilities. This allows a client-side algorithm to refine the inference as the user manually confirms or corrects labels; it will also aid in analysis when the user has not confirmed or corrected labels. Below is an example of what this data structure might look like. Inference Data Structure

To integrate the inference system into the phone app user interface, we will make three main changes. First, we display unfilled labels in red, inferred labels in yellow, and manually filled-out or verified labels in green. Second, we add a To Label view on the Label screen that displays only trips that users are expected to manually provide input on, according to the criteria below. Third, we change when we notify users as detailed below. We will also make some smaller UI changes to improve usability, including adding a confirm button, a feature to label many of the same type of trip at once, and a map of trips. Below is a screenshot of progress so far, showing the red/yellow/green color scheme and the confirm button. UI Progress So Far

Expectations and notification behavior will be highly configurable to support many use cases. The basic idea of this configuration is that for each label category, comprising red labels and yellow labels with varying degrees of certainty, trip administrators will be able to select an expectation setting and a notification setting. The user could be expected to label every trip in a given category, none of them, or some random sample in between. The user could be notified after every trip in a given category, at the end of the day, or less frequently.

Study administrators will also be able to configure a “primary mode,” in which user input expectations are high, and a “secondary mode,” in which less is demanded of the user; the app can automatically cycle between primary and secondary modes according to a configurable schedule.

Please see the full proposal for a complete configuration example and many more details. Please post or send me any feedback you may have!

UI Draft 2.pdf UI Draft 2.docx

shankari commented 3 years ago

@jf87 not sure if you are planning to use labels

PatGendre commented 3 years ago

@GabrielKS thanks a lot for the explanations :-)

Here a few questions and thoughts :

I suppose the trip label inference algorithm with run as a stage in the pipeline?
The mode is actually at the section level, not at the trip level. In the end do you intend to merge the A-mission branch with your development ?
We sometimes encounter "unknown" as inferred mode (e.g. when cycling too fast), do you see this unkown label value as special case? (maybe not, I'm not sure)
If mode values include Car Driver vs Car Passenger, it seems difficult to correctly predict it : do you think the inference algorithm is robust enough to behave well for any kind of label values, i.e return a low probability in this case? (may be this part of your work to look at the inferred data and answer this question, by the way?)

GabrielKS commented 3 years ago

@PatGendre Thanks for the thoughts; I'll do my best to respond given my limited knowledge of the e-mission system outside of the areas I'm directly working with:

Yes, the label inference algorithm is a stage in the pipeline. I've prototyped this with a placeholder algorithm here. @corinne-hcr will be working on the actual algorithm.
My work is intended to reduce the user input burden, so wherever we have the user inputting labels, that's where the inferences will be — and currently that's at the trip level, even for mode. Currently the plan is to use @corinne-hcr's clustering algorithms, plus maybe the section-level mode detection, to produce a trip-level mode inference. If in the future we allow users to label trip sections, then it will make sense to do section-level mode inference. I'm in support of the change to section-level user-confirmed mode labels, but I'm not fully aware of what it would entail.
Again, I won't be using the section-level mode detection data directly. My data structure does allow for an unknown mode — you would not literally assign mode_confirm=unknown, you would just have entries for all the modes with equal probabilities, so I suppose it is a special case in a sense — and that's what I'd expect @corinne-hcr's analysis algorithm will do. I'll note that it's rare for us to truly have no idea what the mode is, if we look at all the data — if the user is moving quickly, we can at least assign walking a lower probability.
I agree that there will be some labels that will be hard to infer even if/when we combine the section-level mode detection with the clustering algorithms. In the UI proposal I give the example of "Pattern 1," an office worker who sometimes drives individually and sometimes carpools to work. In this instance, I think the best we can do is to assign probabilities based on how frequently each mode occurs in the user's manually labeled data and either ask the user for confirmation or just allocate trips according to those probabilities for analysis afterwards (if the user drives individually 30% of the time and carpools 70% of the time, we can pretend behind the scenes that 30% of the yellow mode labels are drove_alone and 70% are shared_ride). This is one area where @andyduvall's primary/secondary data collection mode idea could be useful: we ask the user to label all their trips during the primary mode to get a sense of the frequencies and then allow those labels to remain yellow during the secondary mode.

shankari commented 3 years ago

@PatGendre to summarize: the current feature works on trip-level labels. We currently have trip-level user labels, but even after we switch to section-level user input for the mode, we will still want to have trip-level purpose labels. So this is not dependent on merging the A-Mission code.

If/when we do have section-level user inputs, we can expand this functionality to the section level as well.

PatGendre commented 3 years ago

@shankari @GabrielKS thanks for your detailed answers, it is pretty clear now :-) and definitely a very promising feature!

shankari commented 3 years ago

Tracking initial staging deployment at: https://github.com/corinne-hcr/e-mission-server/pull/2

shankari commented 3 years ago

@PatGendre @jf87 @asiripanich Initial deployment complete, in beta testing.

Picking tuning parameters is being tracked at: https://github.com/e-mission/e-mission-docs/issues/656

Most recent set of UI changes, which can give you a sense of how the feature works, is being tracked at: https://github.com/e-mission/e-mission-phone/pull/772

shankari commented 3 years ago

@GabrielKS I am writing down all the potential issues for next week here so (a) I don't forget them and (b) we can prioritize them.

Please let me know if there's anything else in your notes that I am missing.

UI + expectations:

Trips not appearing in order in the UI: https://github.com/e-mission/e-mission-docs/issues/658
Partial labels and weird checkmarks: https://github.com/e-mission/e-mission-docs/issues/660
Too many expected trips: https://github.com/e-mission/e-mission-docs/issues/654
Hiding all "old" trips from "To label"

Trip matching and modeling:

Round trips (e.g. going for a run v/s going for a bike ride get lumped together). This will potentially require re-enabling the second round of clustering, or at least a simplified version thereof
Wrong labels but in "All trips" (from same user as UI issues)
- 23rd July at 3:59, 29th July at 5:09 (should be recreation/exercise, walk, replacing regular bike)
- 6th August at 6:05pm (should be car with others, home, replacing regular bike)
Trip start/end issues:
- Galaxy Edge 1:
  - ended trip home early on "deep dive day"
  - skipped trips on 2021-07-24 (have phone logs)
- Galaxy Edge 2: skipped one leg of trip on 2021-08-06 (have phone logs)
- Same user as the UI issues: started trip late on 19 Jul at 3pm

High-level end to end feature:

Detect mislabelings and mark them in the UI

GabrielKS commented 3 years ago

Additional things I have:

More urgent:

Someone (probably me) should write a clear explanation of the new UI, how it works, and how it relates to the old UI so we don't have to keep telling people e.g. that if you're not seeing yellow trips on To Label it's not necessarily because they do not exist
- @shankari seems to be worried that this is a problem with the new UI design. I think it's rather a problem with how our existing users have been taught to understand the Label screen and how we haven't explained the change very well. For users who never used the old UI (which is hopefully all users after this summer), it won't feel quite as weird that some trips don't appear in the default view.
A staging user claimed that the All Unlabeled tab was not scrolled to the bottom when they switched to it. I have been unable to reproduce this issue, but it might bear some more investigation.

Less urgent:

Since I added the yellow and red labels and did not remove the cyan and magenta-ish start and end bits, the label screen is now quite ugly. I would love to fix this before I present my creation to the world. I have some ideas for how to do so, but I wonder if there is someone among us who is a better graphic designer than me.
Request from a staging user: the map you get when you click the three dots should be zoomable
Request from a staging user: rename some things on the profile page to reduce use of jargon
Request from a staging user: use the sensed mode to help infer mode — seems it is in fact helpful if we can infer one of the labels even if we can't infer the others, because they like using the confirm button
In my own testing on the iOS emulator, from time to time the one of the Label screen's tabs is styled incorrectly — black text instead of turquoise

Fun quotes:

"Do I need to tap that again or just wait?" (unfortunately, I did not write down what that was referring to…)
"I think this is gonna make it a lot quicker and easier for people"
"I don't have to label as many things now, so that's nice"

shankari commented 3 years ago

A staging user claimed that the All Unlabeled tab was not scrolled to the bottom when they switched to it. I have been unable to reproduce this issue, but it might bear some more investigation.

I wonder if this is the same as https://github.com/e-mission/e-mission-docs/issues/658 That user says that it happens after they finish a trip; the trips for that day usually show up in the middle.

shankari commented 3 years ago

Another usability question: What happens if you click the green checkmark and not all of them are auto-labeled? Some of them are red. I don't want to click it and mess something up.

GabrielKS commented 3 years ago

Another usability question: What happens if you click the green checkmark and not all of them are auto-labeled? Some of them are red. I don't want to click it and mess something up.

The confirm button iterates through all label types and, for each, populates the user input with the client-computed inference iff

Said inference exists, which it doesn't if it wouldn't pass the confidence threshold
There is no existing user input for that label
The inferred value is not "other"

for (const inputType of ConfirmHelper.INPUTS) {
  const inferred = trip.finalInference[inputType];
  // TODO: figure out what to do with "other". For now, do not verify.
  if (inferred && !trip.userInput[inputType] && inferred != "other") $scope.store(inputType, inferred, false);
}

So the explanation I've been giving to users, "the confirm button turns yellow labels green" is exactly correct; red labels are simply ignored.

However, I've just realized that the confidence threshold the client has been using is not the one in the config file — the config file is on the server side and I never wrote the code to send it over — it's been using a placeholder value of 0.5. This explains why we've been seeing so many trips with mixed yellow and red labels! I will fix that ASAP. (This affects both what is displayed and what is confirmable as the logic only happens once, so at least it's consistent.)

// Display a label as red if its most probable inferred value has a probability of less than or equal to confidenceThreshold
// TODO: make this configurable
const confidenceThreshold = 0.5;
// [...]
// Apply threshold
if (max.p <= confidenceThreshold) max.labelValue = undefined;

shankari commented 3 years ago

The confirm button iterates through all label types and, for each, populates the user input with the client-computed inference iff

I know, and I told the user verbally. But wanted to record their initial response for you to think about UX improvements.

GabrielKS commented 3 years ago

Ah, okay. Didn't get that that was the user talking.

shankari commented 3 years ago

@GabrielKS Another issue that came up with at least two users today: they expected to see the auto-labeling in the diary and when it didn't show up, they thought that the auto-labeling didn't work. One of the users went to the diary after they saw that there were no trips in "To Label", even at the end of the week 😦

This sounds deceptively easy, but is actually going to require a fair amount of rewriting, because the diary retrieves data using a completely different API call than the label screen. I had thought about merging the diary and label screens, but was hoping to not tackle that just yet.

Maybe we can do a minimal rewrite in which we retrieve the confirmed trips in addition to the user labels as currently retrieved. Then we can read the inferred labels from there.

shankari commented 3 years ago

@GabrielKS Jeanne also brought up displaying only recent trips to users. Unfortunately, instead of only showing trips from the last n days, she wants to start with a clean slate on 16th Aug and then display all trips. Given that requirement, I think that maybe the easiest option is to just hardcode that into the client instead of putting it into the server since it doesn't seem super general. We would not cherry-pick that change into master, obviously.

shankari commented 3 years ago

Filed https://github.com/e-mission/e-mission-docs/issues/662 to track the UI issues that came up today.

shankari commented 3 years ago

Filed https://github.com/e-mission/e-mission-docs/issues/663 to track the "confidence too high" issue from https://github.com/e-mission/e-mission-eval-private-data/pull/28#issuecomment-894704661

shankari commented 3 years ago

Closing this for now. We can track any pending problems in separate issues.

e-mission / e-mission-docs