Fix segmentation so that we can do GIS-based mode inference

shankari commented 6 years ago

There are a ton of small fixes needed to fix segmentation so that we can do GIS-based mode inference. Let's see if we can keep track of them in one issue so that we remember them all.

shankari commented 6 years ago

After the fixes, the trip segmentation is correct. Yay!

start_fmt_time	end_fmt_time
2018-02-26T09:26:55.265398-08:00	2018-02-26T12:00:51.991475-08:00
2018-02-26T14:19:16.417127-08:00	2018-02-26T14:25:54.496145-08:00
2018-02-26T16:10:21.125442-08:00	2018-02-26T18:11:43.967409-08:00

But the section segmentation is not.

shankari commented 6 years ago

After the fixes, the trip segmentation is correct. Yay!

start_fmt_time	end_fmt_time
2018-02-26T09:26:55.265398-08:00	2018-02-26T12:00:51.991475-08:00
2018-02-26T14:19:16.417127-08:00	2018-02-26T14:25:54.496145-08:00
2018-02-26T16:10:21.125442-08:00	2018-02-26T18:11:43.967409-08:00

Section segmentation issues

But the section segmentation is not. For example, for the first trip, the sections look like this

Note the walk to the train station sloshing over into the train trip, and the big gap during the train ride. Let's see if we can fix that somehow.

shankari commented 6 years ago

First, consider the slosh over from WALKING to IN_VEHICLE

2018-02-26T09:37:09.319461-08:00 MotionTypes.WALKING
2018-02-26T09:37:16.419204-08:00 MotionTypes.IN_VEHICLE

Here's where we switch from WALKING to IN_VEHICLE. We assume that changes propogate forward - e.g. 9:34 -> 9:37 is WALKING because it was WALKING at 9:34. But it actually looks like it is IN_VEHICLE. So we should have broken the section at 9:34 instead of 9:37

The root cause is that during transitions, if there are large-ish gaps, we don't know exactly when the transition happened. Right now, we fix this by:

assigning to the section on either side of the transition OR
creating a big stop with a start and end point

A better fix would be to determine where the transition is by looking at the speeds and seeing if we can see a clear shift in speeds.

2018-03-27 21:56:15,910:DEBUG:140735691387712:At 2018-02-26T09:34:57.987726-08:00, retained existing activity MotionTypes.WALKING because of no change 2018-03-27 21:56:15,911:DEBUG:140735691387712:At 2018-02-26T09:37:11.204804-08:00, found new activity MotionTypes.IN_VEHICLE compared to current MotionTypes.WALKING - creating new section with start_time 2018-02-26T09:37:11.204804-08:00

Let's see if we have a similar root cause for the big gap in the transit sections.

The big gap in the transit sections is

2018-02-26T10:09:10.849257-08:00 MotionTypes.IN_VEHICLE
2018-02-26T10:37:06.732640-08:00

The corresponding change to the motion activities is

2018-03-27 22:01:16,830:DEBUG:140735691387712:At 2018-02-26T10:08:35.647760-08:00, retained existing activity MotionTypes.IN_VEHICLE because of no change
2018-03-27 22:01:16,830:DEBUG:140735691387712:At 2018-02-26T10:10:57.174305-08:00, retained existing activity MotionTypes.IN_VEHICLE because of no change
...
2018-03-27 22:01:16,835:DEBUG:140735691387712:At 2018-02-26T10:21:34.631955-08:00, found new activity MotionTypes.WALKING compared to current MotionTypes.IN_VEHICLE - creating new section with start_time 2018-02-26T10:21:34.631955-08:00
2018-03-27 22:01:16,835:DEBUG:140735691387712:At 2018-02-26T10:22:32.513430-08:00, retained existing activity MotionTypes.WALKING because of no change
2018-03-27 22:01:16,836:DEBUG:140735691387712:At 2018-02-26T10:25:19.825444-08:00, found new activity MotionTypes.IN_VEHICLE compared to current MotionTypes.WALKING - creating new section with start_time 2018-02-26T10:25:19.825444-08:00

So then when converting this to sections, there are no points between 10:09 and 10:21, so we end the section at 10:09.

2018-03-27 22:01:16,923:DEBUG:140735691387712:Considering MotionTypes.IN_VEHICLE from 2018-02-26T09:37:11.204804-08:00 -> 2018-02-26T10:21:34.631955-08:00
2018-03-27 22:01:16,925:DEBUG:140735691387712:with iloc, section start point = 
Location({'_id': ObjectId('5abb2195f6858f0f828a26a7')
 'fmt_time': '2018-02-26T09:37:16.419204-08:00'
 'loc': {'type': 'Point' 'coordinates': [-122.09809683742127 37.40364156259304]}

 section end point = Location({'_id': ObjectId('5abb2195f6858f0f828a2755')
 'fmt_time': '2018-02-26T10:09:10.849257-08:00'
 'loc': {'type': 'Point' 'coordinates': [-122.29990953446286 37.54018354522394]}

And there are no points between

2018-03-27 22:01:16,926:DEBUG:140735691387712:Considering MotionTypes.WALKING from 2018-02-26T10:21:34.631955-08:00 -> 2018-02-26T10:25:19.825444-08:00
2018-03-27 22:01:16,928:INFO:140735691387712:Found no location points between ... 'fmt_time': '2018-02-26T10:21:34.631955-08:00', 'fmt_time': '2018-02-26T10:25:19.825444-08:00',

And again, because of the big gap in locations, the section only starts at 10:37.

2018-03-27 22:01:16,928:DEBUG:140735691387712:Considering MotionTypes.IN_VEHICLE from 2018-02-26T10:25:19.825444-08:00 -> 2018-02-26T11:29:03.018860-08:00
2018-03-27 22:01:16,931:DEBUG:140735691387712:with iloc, 
section start point = Location('fmt_time': '2018-02-26T10:37:06.732640-08:00',
'loc': {'type': 'Point', 'coordinates': [-122.40303033318312, 37.615935257650165]}),
section end point = Location({'fmt_time': '2018-02-26T11:25:08.787427-08:00',
'loc': {'type': 'Point', 'coordinates': [-122.26936768884903, 37.83899754049432]})

Given the locations shown here (10:09 at Hillsdale and 10:37 just beyond Millbrae), I bet that the walking section was at Millbrae station. It is just that we don't have any location points around there.

So it looks like our original goal of using the location points only for segmentation and then the motion activity only for mode detection, is not sufficiently robust to the errors that we see in the field.

Instead, we need to use hybrid approaches for both. In this case, we should be able to extrapolate and determine the points related to the WALKING motion activity. Hopefully, that will be around Millbrae, and we can then use GIS matching to classify the train ride as Caltrain. Because without that, I don't see how we can even have the correct sections that we need for the GIS matching.

There are also a bunch of locations that are filtered out. While they are terrible overall, they do clump around the Millbrae station at this time, so can serve as a second level check to the extrapolation.

screenshot-2018-3-29 fix trip segmentation with invalid ios points

shankari commented 6 years ago

In this case, we should be able to extrapolate and determine the points related to the WALKING motion activity.

While extrapolating, I ran into an issue that we expect that the end points of the section are locations in the location database. We certainly make that assumption in the section segmentation code, and we may make the same assumption later, in the clean and resample code.

For now, fixing this by (shocker!) inserting locations for the interpolated end points since there is basically no other choice.

shankari commented 6 years ago

First level of fixes move the walk segment to San Mateo. The problem is that just using the interpolated values directly leads to offsets because if we walked for part of the trip, for example, the interpolated value will be off by quite a bit.

At least for the first walking section around millbrae, we do have unfiltered locations around Millbrae.

2018-02-26T09:26:55.265398-08:00 2018-02-26T09:37:09.319461-08:00 MotionTypes.WALKING
2018-02-26T09:37:16.419204-08:00 2018-02-26T10:21:25.265398-08:00 MotionTypes.IN_VEHICLE
2018-02-26T10:21:55.265398-08:00 2018-02-26T10:24:55.265398-08:00 MotionTypes.WALKING
2018-02-26T10:25:25.265398-08:00 2018-02-26T11:28:55.265398-08:00 MotionTypes.IN_VEHICLE
2018-02-26T11:29:25.265398-08:00 2018-02-26T11:31:55.265398-08:00 MotionTypes.WALKING
2018-02-26T11:32:25.265398-08:00 2018-02-26T11:32:55.265398-08:00 MotionTypes.IN_VEHICLE
2018-02-26T11:33:25.265398-08:00 2018-02-26T12:00:51.991475-08:00 MotionTypes.WALKING

lo.get_map_for_geojson_unsectioned(gfc.get_feature_list_from_df(ts.get_data_df("background/location",
    time_query=estt.TimeQuery("data.ts", arrow.get("2018-02-26T10:21:34.631955-08:00").timestamp, arrow.get("2018-02-26T10:25:19.825444-08:00").timestamp))))

screenshot-2018-3-29 fix section segmentation with slosh long gaps

I wonder if this is another slosh issue. Let me fix the sloshing and then see what this looks like, and whether we can combine with non-filtered location to handle non-uniform speeds.

shankari commented 6 years ago

Working on slosh issues now...

First fix is to simply start the segment at the beginning of the transition instead of at the end. So if the transition occurred during

2018-03-29 14:17:05,780:DEBUG:140735691387712:At 2018-02-26T09:34:57.987726-08:00, retained existing activity MotionTypes.WALKING because of no change
2018-03-29 14:17:05,780:DEBUG:140735691387712:At 2018-02-26T09:37:11.204804-08:00, found new activity MotionTypes.IN_VEHICLE compared to current MotionTypes.WALKING -

just create the new section at 09:34:57.987726 instead of 09:37:11.204804

This fixes most of the slosh. We now have only two issues left.

an extra walking section from

2018-02-26T11:26:55.265398-08:00 2018-02-26T11:28:55.265398-08:00 MotionTypes.WALKING

Making the extrapolation better for large gaps

Let's look at the spurious walk section first.

shankari commented 6 years ago

Extra walking section

The extra walking section turned out to be an example of flip-flopping. Fixed it in 7678818d3d535d82e25c851e66baf1a84fd06eeb. That also fixed a bunch of flip flopping on the way back.

The sections are now:

To berkeley:

2018-02-26T09:26:55.265398-08:00 2018-02-26T09:34:29.706641-08:00 MotionTypes.WALKING
2018-02-26T09:35:25.265398-08:00 2018-02-26T10:18:55.265398-08:00 MotionTypes.IN_VEHICLE
2018-02-26T10:19:25.265398-08:00 2018-02-26T10:22:25.265398-08:00 MotionTypes.WALKING
2018-02-26T10:22:55.265398-08:00 2018-02-26T11:31:55.265398-08:00 MotionTypes.IN_VEHICLE
2018-02-26T11:32:25.265398-08:00 2018-02-26T12:00:51.991475-08:00 MotionTypes.WALKING

From berkeley:

2018-02-26T16:06:31.429716-08:00 2018-02-26T16:26:03.451954-08:00 MotionTypes.WALKING
2018-02-26T16:26:33.451954-08:00 2018-02-26T16:34:06.250344-08:00 MotionTypes.IN_VEHICLE
2018-02-26T16:34:13.848574-08:00 2018-02-26T16:36:33.451954-08:00 MotionTypes.WALKING
2018-02-26T16:37:03.451954-08:00 2018-02-26T17:20:03.451954-08:00 MotionTypes.IN_VEHICLE
2018-02-26T17:20:33.451954-08:00 2018-02-26T17:30:39.993083-08:00 MotionTypes.WALKING
2018-02-26T17:31:03.451954-08:00 2018-02-26T17:56:03.451954-08:00 MotionTypes.IN_VEHICLE
2018-02-26T17:56:33.451954-08:00 2018-02-26T18:11:43.967409-08:00 MotionTypes.AIR_OR_HSR

Remaining fixes are:

spurious one-flip walk sections (e.g. 2018-02-26T16:34:13.848574-08:00 2018-02-26T16:36:33.451954-08:00 MotionTypes.WALKING)
fixing resampled data, which will also fix section start/end when there are no points

shankari commented 6 years ago

Fixing section start/end when there are no filtered points available

We first tried resampling (in 4a0a41580e6e83b6cf5dd5083f8392cda81017a4) but because of the large gaps, the resampled points were not perfect, and we ended up with the following. Note that one of them was so off that the resulting section actually got classified as AIR_OR_HSR

screenshot 1	screenshot 2

shankari commented 6 years ago

we considered a couple of techniques for fixing section start/end.

resampled based on section locations alone
look at unfiltered points

I first tried looking at the unfiltered points and it was a bit of a disaster. This is the main section that had no points and looking at the unfiltered locations caused it to be classified as AIR_OR_HSR

2018-03-31 18:40:18,624:DEBUG:140735691387712:matched_point None for motion 2018-02-26T10:19:03.968804-08:00, using resampled location 2018-02-26T10:21:21.416628-08:00
2018-03-31 18:40:18,626:DEBUG:140735691387712:matched_point None for motion 2018-02-26T10:22:32.513430-08:00, using resampled location 2018-02-26T10:22:30.913821-08:00

2018-02-26T10:21:21.416628-08:00 2018-02-26T10:22:30.913821-08:00 MotionTypes.AIR_OR_HSR

This is because 10:21 is actually a bogus point and so was 10:22 (accuracy = 904.693035 to 1000)

2018-03-31 18:40:16,944:DEBUG:140735691387712:in is_huge_invalid_ts_offset: returning True
2018-03-31 18:40:16,944:DEBUG:140735691387712:About to set valid column for index = 94
2018-03-31 18:40:16,978:DEBUG:140735691387712:After dropping 94, filtered points =     
        valid                          fmt_time
89   True  2018-02-26T10:08:46.884665-08:00
90   True  2018-02-26T10:08:52.430627-08:00
91   True  2018-02-26T10:09:04.430614-08:00
92   True  2018-02-26T10:09:10.432059-08:00
93   True  2018-02-26T10:09:10.849257-08:00
94  False  2018-02-26T10:21:22.413477-08:00
95   True  2018-02-26T10:37:06.732640-08:00
96   True  2018-02-26T10:37:15.602964-08:00
97   True  2018-02-26T10:37:21.604339-08:00
98   True  2018-02-26T10:43:05.173996-08:00

And this time range is correct, because the train that leaves Mountain View at 9:35 gets to Millbrae at 10:22

135	237	139	143	Northbound Train No.
9:34	10:10	10:33	11:33	Mountain View
10:22	10:57	11:20	12:20	Millbrae

And the location points that we do have around Millbrae are all around 2018-02-26T10:23:16.074416-08:00 to 2018-02-26T10:37:06.732640-08:00. I wonder if the train was late that day and the location points are correct but the motion activity was wrong...

So we have the section and it is correct, but we just don't know that it is in Millbrae. Let's see if targeted resampling works any better.

shankari commented 6 years ago

For targeted resampling, we generally don't need to have a close transition. Looking at the values, approx 300 secs (5 mins) is probably good enough to get the start points correctly, at least on iOS. That will still miss some of the points but let's see what we can do with resampling around that.

shankari commented 6 years ago

ok, so resampling does not work at this point. It is better to take a real location that is not "fresh" than it is to re-sample

section 1	section 2

shankari commented 6 years ago

so I removed all the resampling, and ignored sections with no points associated with them even if they were not technically a flip-flop, and things actually look pretty good. All the AIR_OR_HSR are gone, and almost all of the weird zoomy things are gone.

2018-02-26T09:26:55.265398-08:00 2018-02-26T09:34:29.706641-08:00 MotionTypes.WALKING
2018-02-26T09:35:49.405266-08:00 2018-02-26T11:31:33.932659-08:00 MotionTypes.IN_VEHICLE
2018-02-26T11:46:41.487192-08:00 2018-02-26T12:00:51.991475-08:00 MotionTypes.WALKING

2018-02-26T16:06:24.557804-08:00 2018-02-26T16:23:13.049559-08:00 MotionTypes.WALKING
2018-02-26T16:29:13.978054-08:00 2018-02-26T16:39:59.499586-08:00 MotionTypes.IN_VEHICLE
2018-02-26T17:21:45.850317-08:00 2018-02-26T17:30:39.993083-08:00 MotionTypes.WALKING
2018-02-26T17:32:02.441165-08:00 2018-02-26T17:55:39.864102-08:00 MotionTypes.IN_VEHICLE
2018-02-26T18:02:02.208424-08:00 2018-02-26T18:11:43.967409-08:00 MotionTypes.WALKING

The only exception is the section from Berkeley to Millbrae which looks like this.

Before we started making the changes, it looked like this

So this argues that we should have kept the transition at the end in this case. Maybe that is the simple fix that solves everything:

non motorized -> motorized, segment at the beginning of the transition
motorized -> non-motorized, segment at the end of the transition

Let's try that now...

shankari commented 6 years ago

That actually works pretty well.

2018-02-26T09:26:55.265398-08:00 2018-02-26T09:34:29.706641-08:00 MotionTypes.WALKING
2018-02-26T09:35:49.405266-08:00 2018-02-26T11:31:33.932659-08:00 MotionTypes.IN_VEHICLE
2018-02-26T11:46:41.487192-08:00 2018-02-26T12:00:51.991475-08:00 MotionTypes.WALKING

2018-02-26T16:06:24.557804-08:00 2018-02-26T16:23:13.049559-08:00 MotionTypes.WALKING
2018-02-26T16:29:13.978054-08:00 2018-02-26T17:22:09.704108-08:00 MotionTypes.IN_VEHICLE
2018-02-26T17:24:36.549367-08:00 2018-02-26T17:30:39.993083-08:00 MotionTypes.WALKING
2018-02-26T17:32:02.441165-08:00 2018-02-26T17:55:39.864102-08:00 MotionTypes.IN_VEHICLE
2018-02-26T18:02:02.208424-08:00 2018-02-26T18:11:43.967409-08:00 MotionTypes.WALKING

The only issue left is the segmentation of the caltrain trip while coming back. Although this looks legit (only 5 minute gap), it is actually very illegit, since the last cluster of points around San Mateo is the one at 17:55, and we definitely didn't make it from San Mateo to Mountain View in 5 minutes.

In some ways, this is the reverse of https://github.com/e-mission/e-mission-server/issues/577#issuecomment-376381364 in that we cover a short distance over a long time (so it looked like a trip end). In this, we cover a long distance in a short time.

But to maintain consistency, each long distance in short time should have a corresponding short distance in long time....

shankari commented 6 years ago

From the schedule, that train is at: Millbrae: 17:33 Hillsdale: 17:43 Palo Alto: 17:56 Mountain View: 18:03

So the 17:55 at San Mateo is clearly wrong and is off by more than 10 minutes. But the problem is that the distance between Millbrae and San Mateo is large enough that ~ 20 minutes still doesn't feel wrong or like the end of a trip or sth. Conceivably you could bike or use city driving and take that long. So it is not clear how we can fix this.

But we can, while creating stops, say that if it looks like a stop is long, we have to extend it to the beginning of the next section.

Trying that now...

shankari commented 6 years ago

design decision while making this.

so far, consistent with the design so far, I make the changes to the stop in the CLEAN_AND_RESAMPLE stage.
while converting stop -> filtered_stop, I can squish it by setting exit = enter or enter = exit

but then when do I adjust the sections to match? option 1

in get_filtered_section -> get_filtered_points, just like we extend the section to the start or end of the trip, we do the same for the stop. The problem with this is that now we have made changes to the _filteredtrip, not the raw trip. So far, all the section munging has been based on raw data. Since the filtered_sections are not stored yet, we have no easy way of getting the filtered_stops short of passing in the stop_map, which seems like a bad dependency
we can just deal with it during the linking. That is sort of what we do with trips and places, but it would mean that the bulk of the section cleanup would happen on potentially truncated sections.

It is not possible to say which is better, so we will simply return use the more principled approach and see how it does.

There are two competing priorities while implementing the stop squishing.

if the stop is large, we want to extend either section to cover it
if the first or last point of a section is bad, we want to filter it and have the start or end of the stop reset to the correct start/end (e.g. in fill_stop).

If there were errors in the section start/end, we clearly don't want them to make it into the filtered stop. So arguably, the order should be
section -> stop
squish stop
stop -> section

This also resolves our earlier question about where the stop squishing code should be.

shankari commented 6 years ago

after these change, the trips + sections on iphone2 look great. The stop squishing also fixed some other sections (like the start of the trip back, so it is all good).

We may want to add resampled points for the squished stops as well, but that is an optimization.

2018-02-26T09:26:55.265398-08:00 2018-02-26T09:34:29.706641-08:00 MotionTypes.WALKING
2018-02-26T09:35:49.405266-08:00 2018-02-26T11:31:33.932659-08:00 MotionTypes.IN_VEHICLE
2018-02-26T11:46:41.487192-08:00 2018-02-26T12:00:51.991475-08:00 MotionTypes.WALKING

2018-02-26T16:06:24.557804-08:00 2018-02-26T16:23:13.049559-08:00 MotionTypes.WALKING
2018-02-26T16:23:13.049559-08:00 2018-02-26T17:24:36.549367-08:00 MotionTypes.IN_VEHICLE
2018-02-26T17:24:36.549367-08:00 2018-02-26T17:30:39.993083-08:00 MotionTypes.WALKING
2018-02-26T17:32:02.441165-08:00 2018-02-26T18:02:02.208424-08:00 MotionTypes.IN_VEHICLE
2018-02-26T18:02:02.208424-08:00 2018-02-26T18:11:43.967409-08:00 MotionTypes.WALKING

After trying it on the same day for iphone3, we are close, but not perfect.

2018-02-26T09:27:03-08:00 2018-02-26T09:29:24.000052-08:00 MotionTypes.BICYCLING
2018-02-26T09:29:37.000052-08:00 2018-02-26T09:30:05.000052-08:00 MotionTypes.IN_VEHICLE
2018-02-26T09:30:19.000052-08:00 2018-02-26T09:32:32.000052-08:00 MotionTypes.WALKING
2018-02-26T09:36:59.697741-08:00 2018-02-26T10:22:04.000002-08:00 MotionTypes.IN_VEHICLE
2018-02-26T10:25:03.052444-08:00 2018-02-26T11:30:42.848401-08:00 MotionTypes.IN_VEHICLE
2018-02-26T11:30:42.848401-08:00 2018-02-26T12:02:46.000041-08:00 MotionTypes.WALKING

The only real issue is the flip-flopping at the beginning. A minor issue is the VEHICLE -> VEHICLE without merging, but most other stuff will work without it...

shankari commented 6 years ago

2018-04-01 19:52:17,062:DEBUG:140735691387712:while starting flip_flop detection, changes are 
[(0, 0, 1)                              FF
 (0, 1, <MotionTypes.WALKING: 7>)       FF
 (1, 2, <MotionTypes.BICYCLING: 1>)     FF
 (2, 5, <MotionTypes.IN_VEHICLE: 0>)
 (5, 7, BICYCLING<1>)
 (7, 9, <MotionTypes.IN_VEHICLE: 0>)
 (9, 9, BICYCLING<1>)                   FF
 (9, 23, <MotionTypes.WALKING: 7>)

2018-04-01 19:52:17,183:DEBUG:140735691387712:after generating unique entries, list = [(0, 5), (5, 7), (7, 9), (9, 23), ...]

but both (0, 5) and (7, 9) are BICYCLING, so we can merge them

2018-04-01 19:52:17,184:DEBUG:140735691387712:after merging entries, changes are 
[(0, 7, 1)
 (7, 9, <MotionTypes.IN_VEHICLE: 0>)
 (9, 23, <MotionTypes.WALKING: 7>)

In order to fix this, we need to remove that IN_VEHICLE, which is not hard because it is less than 5 minutes long so clearly invalid, and then we need to merge the two non-motorized modes although they are labelled differently because they have (hopefully) the same speed profile.

Let's see if that works...

shankari commented 6 years ago

Let's see if that works...

No.

>>> for s in cleaned_sections:
       print(s.data.start_fmt_time, s.data.end_fmt_time, s.data.sensed_mode)

2018-02-26T09:27:03-08:00 2018-02-26T12:02:46.000041-08:00 MotionTypes.BICYCLING

shankari commented 6 years ago

Actually, that was due to a bug during the refactoring. It does work after all.

2018-02-26T09:27:03-08:00 2018-02-26T09:32:32.000052-08:00 MotionTypes.BICYCLING
2018-02-26T09:36:59.697741-08:00 2018-02-26T10:22:04.000002-08:00 MotionTypes.IN_VEHICLE
2018-02-26T10:25:03.052444-08:00 2018-02-26T11:30:42.848401-08:00 MotionTypes.IN_VEHICLE
2018-02-26T11:30:42.848401-08:00 2018-02-26T12:02:46.000041-08:00 MotionTypes.WALKING

Note that the initial segment is set to bicycling, let's take a quick look at why that happens

shankari commented 6 years ago

It's because the minimum duration checks invalidated all the non-flip-flopped values.

2018-04-01 22:05:52,551:DEBUG:140735691387712:comparing 2, 5 to see if there is a flipflop
2018-04-01 22:05:52,551:DEBUG:140735691387712:Sanity checking section 2018-02-26T09:27:42.753745-08:00 -> 2018-02-26T09:29:09.129657-08:00 for type MotionTypes.IN_VEHICLE = False
2018-04-01 22:05:52,551:DEBUG:140735691387712:comparing 5, 7 to see if there is a flipflop
2018-04-01 22:05:52,551:DEBUG:140735691387712:Sanity checking section 2018-02-26T09:29:09.129657-08:00 -> 2018-02-26T09:29:26.914830-08:00 for type MotionTypes.BICYCLING = False
2018-04-01 22:05:52,552:DEBUG:140735691387712:comparing 7, 9 to see if there is a flipflop
2018-04-01 22:05:52,552:DEBUG:140735691387712:Sanity checking section 2018-02-26T09:29:26.914830-08:00 -> 2018-02-26T09:30:17.725312-08:00 for type MotionTypes.IN_VEHICLE = False

So all the first parts got merged into one section, which is what we want!

2018-04-01 22:05:52,550:DEBUG:140735691387712:while starting flip_flop detection, changes are
[(0, 0, 1)                              FF      0
 (0, 1, <MotionTypes.WALKING: 7>)       FF      1
 (1, 2, <MotionTypes.BICYCLING: 1>)     FF      2
 (2, 5, <MotionTypes.IN_VEHICLE: 0>)    FF      3
 (5, 7, 1)                              FF      4
 (7, 9, <MotionTypes.IN_VEHICLE: 0>)    FF      5
 (9, 9, 1)                              FF      6
 (9, 23, <MotionTypes.WALKING: 7>)

2018-04-01 22:05:52,612:DEBUG:140735691387712:backward merged_streaks = [(0, 6)]

2018-04-01 22:05:52,613:DEBUG:140735691387712:before merging entries, changes were 
[(0, 0)
 (0, 1)
 (1, 2)
 (2, 5)
 (5, 7)
 (7, 9)
 (9, 9)
 (9, 23)

So we ended up with one merged section overall. Yay!

2018-04-01 22:05:52,613:DEBUG:140735691387712:after generating unique entries, list = 
[(0, 23)

And it is just an artifact of the fact that BICYCLING was first that makes the mode be bicycling. And it doesn't really matter because we will override it in the mode inference step anyway.

2018-04-01 22:05:52,614:DEBUG:140735691387712:After merging, list = 
[(1, <MotionTypes.IN_VEHICLE: 0>)

2018-04-01 22:05:52,614:DEBUG:140735691387712:after merging entries, changes are [(0, 23, 1),

2018-04-01 22:05:52,618:DEBUG:140735691387712:Considering MotionTypes.BICYCLING from 2018-02-26T09:27:03-08:00 -> 2018-02-26T09:34:36.836596-08:00

But it is still a bit curious that although the biggest contiguous mode was WALKING, and we merged backwards, we still ended up with mode == BICYCLING. Let's do a quick check to see why...

shankari commented 6 years ago

It's because when we merge backwards, we set the after section's start to the merged section's start

     merge
    ------------
    |                  | 
   t1                t2     t3

replaces t3 by t1. But when we do, we should retain the mode for t3. Is this going to break stuff?

shankari commented 6 years ago

No it did not break stuff.

2018-02-26T09:27:03-08:00 2018-02-26T09:32:32.000052-08:00 MotionTypes.WALKING
2018-02-26T09:36:59.697741-08:00 2018-02-26T10:22:04.000002-08:00 MotionTypes.IN_VEHICLE
2018-02-26T10:25:03.052444-08:00 2018-02-26T11:30:42.848401-08:00 MotionTypes.IN_VEHICLE
2018-02-26T11:30:42.848401-08:00 2018-02-26T12:02:46.000041-08:00 MotionTypes.WALKING

Going to test against a bunch of more use cases, and then move on to the mode inference.

shankari commented 6 years ago

One thing to note is that if we are checking for validity of all sections and marking them as FF if they are not valid, then WALKING sections could be marked invalid because of zig zags, for example. Or misclassified bicycling sections could be marked as invalid and merged with subsequent IN_VEHICLE sections.

To avoid that, we may want to only check for the validity of non-walking sections.

shankari commented 6 years ago

Running this on android for the same dates almost works. There are just a couple of gaps at the start of train trips, and the trip segmentation seems to segment at Millbrae every time.

2018-02-26T16:39:34-08:00 2018-02-26T17:23:49-08:00 MotionTypes.IN_VEHICLE
2018-02-26T17:24:14.577000-08:00 2018-02-26T17:25:50.682000-08:00 MotionTypes.ON_FOOT

and

2018-02-26T17:36:12-08:00 2018-02-26T18:02:07-08:00 MotionTypes.IN_VEHICLE
2018-02-26T18:02:37-08:00 2018-02-26T18:12:56.195000-08:00 MotionTypes.ON_FOOT

Ending	Starting

shankari commented 6 years ago

Starting gap in android trip

Looking more closely into the 2018-02-26T17:25:50.682000-08:00 -> 2018-02-26T17:36:12-08:00 segmentation...

we discovered what appeared to be a legitimate trip end at 17:28

2018-04-02 01:17:38,469:DEBUG:140735691387712:------------------------------2018-02-26T17:28:53-08:00------------------------------

2018-04-02 01:17:38,488:DEBUG:140735691387712:prev_point.ts = 1519694905.0, curr_point.ts = 1519694933.0, time gap = 28.0 (vs 300), distance_gap = 7.218737025990433 (vs 100), speed_gap = 0.25781203664251545 (vs 0.3333333333333333) continuing trip
2018-04-02 01:17:38,489:DEBUG:140735691387712:last5MinsDistances.max() = 90.4292158917, last10PointsDistance.max() = 64.2955042684

and we discovered it on the phone as well

5   2018-02-26T17:29:24.775000-08:00    2
6   2018-02-26T17:34:52.767000-08:00    1

Next point was after the geofence exit

2018-04-02 01:17:38,492:DEBUG:140735691387712:------------------------------2018-02-26T17:35:48.344000-08:00------------------------------
2018-04-02 01:17:38,492:DEBUG:140735691387712:Setting new trip start point AttrDict({'fmt_time': '2018-02-26T17:35:48.344000-08:00', 'loc': {'type': 'Point', 'coordinates': [-122.3863437, 37.6005421]}) with idx 75
2018-04-02 01:17:38,497:DEBUG:140735691387712:------------------------------2018-02-26T17:36:12-08:00------------------------------
2018-04-02 01:17:38,500:DEBUG:140735691387712:last5MinsDistances = [ 4516.26067731] with length 1
....
2018-04-02 01:17:40,058:INFO:140735691387712:Found trip end at 2018-02-26T18:12:56.195000-08:00

2018-04-02 01:17:40,126:DEBUG:140735691387712:start_loc_doc = 'fmt_time': '2018-02-26T17:35:48.344000-08:00', end_loc_doc = 'fmt_time': '2018-02-26T18:12:56.195000-08:00')

So far so good. This means that the gap will persist until the cleaning and resampling stage, when we should join it with the previous trip end. Why didn't that happen?

the transition distance is small

2018-04-02 01:17:40,140:DEBUG:140735691387712:while determining new_start_place, transition_distance = 74.93464111158376
2018-04-02 01:17:40,167:DEBUG:140735691387712:transition_distance 74.93464111158376 < 1000, returning False

but we create a new trip anyway

2018-04-02 01:17:40,168:DEBUG:140735691387712:Inserting entry Entry('start_fmt_time': '2018-02-26T17:35:48.344000-08:00', 'end_fmt_time': '2018-02-26T18:12:56.195000-08:00') into timeseries

Given that this is the start of the trip, we will need have joined it in CLEAN_AND_RESAMPLE. Why didn't that happen?

2018-04-02 01:35:48,896:DEBUG:140735691387712:Considering trip 5ac1eb64f6858fbc8db75a36: 2018-02-26T17:35:48.344000-08:00 -> 2018-02-26T18:12:56.195000-08:00

Because we filtered out the first point...

2018-04-02 01:35:52,438:DEBUG:140735691387712:Found first section, may need to extrapolate start point
2018-04-02 01:35:52,468:DEBUG:140735691387712:First point 5ac1eb5cf6858fbc8866d0fd ([-122.3863437, 37.6005421]) was filtered, raw_start_place 5ac1eb64f6858fbc8db75a33 ([-122.3871728, 37.6003916]) may be bogus
2018-04-02 01:35:52,468:DEBUG:140735691387712:place_to_point_dist = 74.93464111158376, previous place is also bogus, skipping extrapolation

Why did we filter that out? Because the speed was super high. The quartile values are

2018-04-02 01:35:48,799:DEBUG:140735691387712:quartile values are 0.25     2.174765
0.75    33.300918

And the speed of this first point was

	fmt_time	speed	distance	latitude	longitude
2018-02-26T17:35:48.344000-08:00	0.000000	0.000000	37.600542	-122.386344	NaN
2018-02-26T17:36:12-08:00	190.913962	4516.260677	37.579100	-122.342812	23.656
2018-02-26T17:36:19.731000-08:00	0.000000	0.000000	37.579100	-122.342812	7.731
2018-02-26T17:36:26-08:00	76.370828	478.768727	37.577182	-122.337948	6.269
2018-02-26T17:37:24-08:00	34.715453	2013.496284	37.565230	-122.320785	58.000

The distance doesn't appear to be too large given that it is the first point after a geofence exit, but the time is short. See how at row 5, the distance is 2km but the time is almost a minute, as opposed to 4km in less than 30 secs.

Let us see if this is the same reason on the other trips too. if so, we may want to treat the first point after a geofence exit, at the beginning of a motorized trip, as special, at least on android...

Other transitions don't have this issue. Tabling it for now until we see how serious it is.

shankari commented 6 years ago

Loaded data from both iPhone and android for 8th March, which had led to an AIR_OR_HSR section before. Worked perfect on both, including Caltrain + BART transitions. I think this is pretty much ready to go to GIS-based mode inference now

shankari commented 6 years ago

Let's do one last check of the flip-flop, which was the 26th on iPhone2.

shankari commented 6 years ago

that worked too!

2018-03-26T08:08:35.692105-07:00 2018-03-26T08:38:50.965987-07:00 MotionTypes.BICYCLING

I hearby declare victory and move on to the GIS-based mode inference

shankari commented 6 years ago

Argh! But this broke Vaz's car trips. These should have been car.

2018-03-08T08:08:19.133518-08:00 -> 2018-03-08T08:15:03.796000-08:00
2018-03-08T14:33:04.610196-08:00 -> 2018-03-08T14:41:46-08:00
2018-03-08T14:56:06.780093-08:00 -> 2018-03-08T15:05:21.231000-08:00

Instead, we have

2018-03-08T08:08:09.232308-08:00 2018-03-08T08:16:23.290000-08:00 MotionTypes.ON_FOOT 
2018-03-08T14:33:04.610196-08:00 2018-03-08T14:41:46-08:00 MotionTypes.IN_VEHICLE
2018-03-08T14:54:57.783576-08:00 2018-03-08T15:06:44.436000-08:00 MotionTypes.ON_FOOT

Let's revisit their segmentation...

shankari commented 6 years ago

OK, so this is because the overall trips are really short. The raw trip is from 2018-03-08T08:09:10.145000-08:00 -> 2018-03-08T08:16:23.290000-08:00, which is around 7 minutes long. Of that, the last 30 seconds is WALKING, and unfortunately, the first valid IN_VEHICLE is 3 minutes later.

2018-04-02 17:48:52,109:DEBUG:140735691387712:At 2018-03-08T08:12:01.740000-08:00, retained existing activity MotionTypes.IN_VEHICLE because of no change

So the section is 4 minutes, just under 5 minutes. So yes, Virginia, people can take very short car trips.

Let's fix this by making the check more complex. If this is the first or last section and there is a gap between the raw section start and the raw trip start/end, extend the raw section to the trip start/end since that is what we will do anyway for the cleaned data.

An alternate check is to say that the flip-flopped section being deleted should be shorter than the section that is being merged with. Otherwise, the tail wags the dog.

shankari commented 6 years ago

If this is the first or last section and there is a gap between the raw section start and the raw trip start/end, extend the raw section to the trip start/end since that is what we will do anyway for the cleaned data.

This fixed one of them, but not the other. The other had the following profile.

2018-04-02 19:34:44,713:DEBUG:140735691387712:At 2018-03-08T15:00:12.064000-08:00, retained existing activity MotionTypes.IN_VEHICLE because of no change
2018-04-02 19:34:44,714:DEBUG:140735691387712:At idx 1, time 2018-03-08T15:06:44.436000-08:00, found new activity MotionTypes.ON_FOOT compared to current MotionTypes.IN_VEHICLE
2018-04-02 19:34:44,714:DEBUG:140735691387712:creating new section for MotionTypes.IN_VEHICLE at 0 -> 1 with start_time 2018-03-08T15:00:12.064000-08:00 -> 2018-03-08T15:06:44.436000-08:00
2018-04-02 19:34:44,714:INFO:140735691387712:Detected trip end! Ending section at 2018-03-08T15:06:44.436000-08:00

Basically, we had an IN_VEHICLE section and it was long enough, but there was only one of it and then it ended. So it looked like a flip flop

2018-04-02 19:34:44,714:DEBUG:140735691387712:while starting flip_flop detection, changes are [(0, 1, 0), (1, 1, 2)]

Basically, we suck at really short motorized trips, all our heuristics are failing...

shankari commented 6 years ago

If this is the first or last section and there is a gap between the raw section start and the raw trip start/end, extend the raw section to the trip start/end since that is what we will do anyway for the cleaned data.

This fixed one of them, but not the other. The other had the following profile.

2018-04-02 19:34:44,713:DEBUG:140735691387712:At 2018-03-08T15:00:12.064000-08:00, retained existing activity MotionTypes.IN_VEHICLE because of no change
2018-04-02 19:34:44,714:DEBUG:140735691387712:At idx 1, time 2018-03-08T15:06:44.436000-08:00, found new activity MotionTypes.ON_FOOT compared to current MotionTypes.IN_VEHICLE
2018-04-02 19:34:44,714:DEBUG:140735691387712:creating new section for MotionTypes.IN_VEHICLE at 0 -> 1 with start_time 2018-03-08T15:00:12.064000-08:00 -> 2018-03-08T15:06:44.436000-08:00
2018-04-02 19:34:44,714:INFO:140735691387712:Detected trip end! Ending section at 2018-03-08T15:06:44.436000-08:00

Basically, we had an IN_VEHICLE section and it was long enough, but there was only one of it and then it ended. So it looked like a flip flop

2018-04-02 19:34:44,714:DEBUG:140735691387712:while starting flip_flop detection, changes are [(0, 1, 0), (1, 1, 2)]

Basically, we suck at really short motorized trips, all our heuristics are failing...

shankari commented 6 years ago

This fixed one of them, but not the other.

Changed by adding a trip_pct and only considering a flip flop if the section was less than 25% of the total trip time. This fixes all the vaz trips.

shankari commented 6 years ago

Penultimate set of checks - got a report from an alert tester that their bike trips were classified as car. This is very tricky because they

accelerate/decelerate at the rate of cars, and on my commute I top out over 40 km/h on flat (25mph) and average ~26km/h (16+mph). And I take roads, so GIS isn't helpful there either.

Fortunately, it turns out that there is a pattern in these trips that we may be able to embody as a rule.

2018-04-02 14:09:07,196:DEBUG:140735691387712:while starting flip_flop detection, changes are [(0, 1, 1)
 (1, 3, <MotionTypes.IN_VEHICLE: 0>)
 (3, 8, 7)]

Ok, so the IN_VEHICLE is pretty short :) But the duration is long, almost 8 minutes. The actual activity points are:

So this is actually a flip-flop but doesn't seem like one because when we go from BICYCLING -> IN_VEHICLE, we merge backwards. If we had merged forwards, this would have been removed and the entire section would be marked as WALKING. Later, when the speed was calculated, the mean would have been way above walking, giving us BICYCLING.

But it would have been wrong to merge through to the WALKING because the speed profile of the first part is

count 21.000000
mean 6.085418
std 2.272487
min 0.000000
25% 5.032491
50% 6.276725
75% 7.059986
max 9.865935

count 9.000000
mean 0.253537
std 0.095076
min 0.000000
25% 0.285229
50% 0.285229
75% 0.285229
max 0.285229

I think it is still correct to split into two parts. The only question is what the first part should be labelled as, and it is hard to make the case that it should be IN_VEHICLE because if we had merged forward instead of backwards, we would have ended up with BICYCLING because the IN_VEHICLE would have been a flip-flop instead.

I think that the real issue here is that this is a toss-up - not absolutely clear in either direction. In that case, maybe we should mark it as a TOSSUP and let the speed determine which way to go.

Let's see if that works for the next one as well...

Not really. Detected as BICYCLING -> IN_VEHICLE

BICYCLING count 14.000000
mean 5.398825
std 2.517793
min 0.000000
25% 3.901205
50% 6.455850
75% 7.052127
max 8.119158
dtype: float64

IN_VEHICLE count 19.000000
mean 4.793253
std 3.874820
min 0.000000
25% 0.339064
50% 5.922970
75% 7.902541
max 10.046842
dtype: float64

2018-04-02 14:09:07,818:DEBUG:140735691387712:while starting flip_flop detection, changes are [(0, 2, 1), (2, 6, <MotionTypes.IN_VEHICLE: 0>)]

Again, they both look pretty straightforward, both with decent points and decent length

2018-03-27T17:33:29.509713-07:00 -> 2018-03-27T17:39:35.486804-07:00
2018-03-27T17:39:39.999938-07:00 -> 2018-03-27T17:48:26.086079-07:00

BUT, note that the transition time from BICYCLING -> IN_VEHICLE is bogus, it should take more than 4 secs. So one of the sides must be bogus. Can again mark as TOSSUP/UNKNOWN.

Last one:

2018-04-02 14:09:05,838:DEBUG:140735691387712:while starting flip_flop detection, changes are 
[(0, 0, 1)                                  FF      0
 (0, 2, <MotionTypes.IN_VEHICLE: 0>)                1
 (2, 2, 7)                                  FF      2
 (2, 3, <MotionTypes.BICYCLING: 1>)         FF      3
 (3, 5, <MotionTypes.IN_VEHICLE: 0>)        FF      4
 (5, 5, 7)                                  FF      5
 (5, 7, <MotionTypes.BICYCLING: 1>)         FF      6
 (7, 9, <MotionTypes.WALKING: 7>)           FF      7
 (9, 12, <MotionTypes.BICYCLING: 1>)
 (12, 19, <MotionTypes.WALKING: 7>)]

2018-04-02 14:09:05,952:DEBUG:140735691387712:forward merged_streaks = [(2, 7)]
2018-04-02 14:09:05,952:DEBUG:140735691387712:backward merged_streaks = [(0, 0)]

2018-04-02 14:09:05,953:DEBUG:140735691387712:after generating unique entries, list = [(0, 9), (9, 12), (12, 19)]

This is again a tricky set of changes. If we had merged forward, then the bicycling would have been retained and the IN_VEHICLE would have been removed. and we would have merged all this with the bicycling for a final working solution.

So I think we should treat BICYCLING -> IN_VEHICLE transitions as special.

In particular, the pattern that I see is BICYCLING (~ 1 min) -> IN_VEHICLE (~ 5 mins) -> something else, typically WALKING

The expected behavior is that we should have BICYCLING + IN_VEHICLE merged into one BIKE_OR_CAR section, which should be merged with a subsequent BICYCLING section if it exists or not merged if it doesn't exist.

Alternatively, we can just say that the pattern above maps to BICYCLING. Let's experiment and see how that works.

shankari commented 6 years ago

So our target rule is: if you see BICYCLING (idx_diff = 1) - (< 1 minute) -> IN_VEHICLE (idx_diff =2 but time ~ 5 minutes) -> BICYCLING because of this formulation, the first BICYCLING will be marked as a flipflop so we can add it as a new check to should_merge

shankari commented 6 years ago

ok with this fix, there are exactly two errors left and they are hard to fix because of substantial motion activity classification as IN_VEHICLE.

**********13 : 2018-03-27T17:33:24.832366-07:00 -> 2018-03-27T17:48:26.086079-07:00**********
2018-03-27T17:33:24.832366-07:00 2018-03-27T17:39:24.999937-07:00 MotionTypes.BICYCLING
2018-03-27T17:39:39.999938-07:00 2018-03-27T17:48:26.086079-07:00 MotionTypes.IN_VEHICLE

**********17 : 2018-03-28T16:33:43.016169-07:00 -> 2018-03-28T17:00:26.000100-07:00**********
2018-03-28T16:33:43.016169-07:00 2018-03-28T16:55:04.000063-07:00 MotionTypes.IN_VEHICLE
2018-03-28T16:55:51.000065-07:00 2018-03-28T16:56:52.000064-07:00 MotionTypes.BICYCLING
2018-03-28T16:57:01.000064-07:00 2018-03-28T17:00:26.000100-07:00 MotionTypes.WALKING

In particular, trip 17 flipped from BICYCLING -> WALKING to the IN_VEHICLE -> BICYCLING -> WALKING because of the trip_pct fix.

The transitions are:

[(0, 1, 0)                                  (> 10 mins)
 (1, 1, 7)                                  FF      1
 (1, 2, <MotionTypes.BICYCLING: 1>)         FF      2
 (2, 3, <MotionTypes.WALKING: 7>)           FF      3
 (3, 5, <MotionTypes.IN_VEHICLE: 0>)        FF      4
 (5, 5, 7)                                  FF      5
 (5, 6, <MotionTypes.BICYCLING: 1>)         FF      6
 (6, 7, <MotionTypes.WALKING: 7>)           FF      7
 (7, 8, <MotionTypes.RUNNING: 8>)           FF      8
 (8, 10, <MotionTypes.WALKING: 7>)          FF      9
 (10, 12, <MotionTypes.BICYCLING: 1>)
 (12, 14, <MotionTypes.WALKING: 7>)]        FF      11

which turns into

2018-04-03 09:00:34,704:DEBUG:140735691387712:flip_flop_streaks = [(1, 9), (11, 10)]
2018-04-03 09:00:34,761:DEBUG:140735691387712:while merging, comparing curr speed 6.372634580458256 with before 7.448699573481284 and after 4.655161289677437
2018-04-03 09:00:34,762:DEBUG:140735691387712:before is closer, merge forward, returning 1
2018-04-03 09:00:34,763:DEBUG:140735691387712:after generating unique entries, list = [(0, 10), (10, 12), (12, 14)]

For the record, for trip 13, there was no flip flopping at all

2018-04-03 09:00:33,789:DEBUG:140735691387712:while starting flip_flop detection, changes are [(0, 2, 1), (2, 6, <MotionTypes.IN_VEHICLE: 0>)]
2018-04-03 09:00:33,790:DEBUG:140735691387712:flip_flop_list = []
2018-04-03 09:00:33,790:DEBUG:140735691387712:flip_flop_streaks = []
2018-04-03 09:00:33,790:DEBUG:140735691387712:forward merged_streaks = []
2018-04-03 09:00:33,790:DEBUG:140735691387712:backward merged_streaks = []

Let's play around with some bus trips to see if we can come up with an overarching unified model for fast bike, short car and bus.

shankari commented 6 years ago

Checked out some bus trips too. They are all classified as IN_VEHICLE, which is good. Spot checked the GIS information - they are all pretty good, except for one bus stop which is actually in the correct location, but OSM does not have the bus stop information (at Dwight and Piedmont).

Other bus characteristics:

buses are pretty similar to fast bikes (median speed is 6.9, 2.4, 6.0, compared to 13 and 17 for train) so we really can't use speed to distinguish between them
on android, bus trips have no BICYCLING. If this holds, then mixed or close proximity IN_VEHICLE + BICYCLING modes can be merged to BICYCLING

we should get some iOS bus data to confirm that though. for now, let's do the GIS integration!

shankari commented 6 years ago

segmentation looks pretty good. Created pull request that was merged to the tripaware server. https://github.com/e-mission/e-mission-server/pull/578

shankari commented 6 years ago

One more issue reported by one of the URAP students was mixed walk and bike for a walking trip. While testing that, also found that Tom's trip to school with Willow was marked as all WALKING. Fixing both of those.

shankari commented 6 years ago

For Tom's trip, using the same merge forward and merge back rules for WALKING <-> BICYCLING as we do with WALKING <-> IN_VEHICLE solved the problem (044eafbe70b8f4c0cdaaf885f4ecae9c42cac640)

shankari commented 6 years ago

So for URAP student's trip, we see this set of sections along with their original and predicted modes.

**********1 : 2018-04-06T12:02:55.370441-07:00 -> 2018-04-06T12:18:52.967424-07:00**********
2018-04-06T12:02:55.370441-07:00 2018-04-06T12:06:48.002566-07:00 MotionTypes.WALKING 1.7419816444435243
2018-04-06T12:02:55.370441-07:00 2018-04-06T12:06:48.002566-07:00 PredictedModeTypes.BICYCLING 1.7419816444435243

2018-04-06T12:07:21.999432-07:00 2018-04-06T12:07:43.998962-07:00 MotionTypes.RUNNING 1.1558113323084085
2018-04-06T12:07:21.999432-07:00 2018-04-06T12:07:43.998962-07:00 PredictedModeTypes.WALKING 1.1558113323084085

2018-04-06T12:08:46.998215-07:00 2018-04-06T12:18:52.967424-07:00 MotionTypes.WALKING 0.08121076478030438
2018-04-06T12:08:46.998215-07:00 2018-04-06T12:18:52.967424-07:00 PredictedModeTypes.WALKING 0.08121076478030438

Basically, there are two main issues:

The small RUNNING section splits up the two walking sections so that they are not merged, and are evaluated separately.
The resulting first section has only three actual points but is reconstructed to start from the previous end location. The resulting speed of this section is above the walking speed limit, so we convert it to bicycling.

This suggests at least two obvious fixes.

merge RUNNING sections with WALKING. This alone should fix this particular issue since the overall speed will get below the threshold.
bump up the tolerance of the walking speed a bit if the underlying mode was walking

We should also look to see how the extrapolation happened that resulted in that high a speed. And maybe if the underlying section is WALKING and the computed speed is close to the max walking speed, we use the max walking speed for the interpolation instead of the computed speed.

Quick check on the extrapolation.

Yes, we find 3 points.

2018-04-07 11:17:25,772:DEBUG:140735495942976:deleting 0 points from section points
2018-04-07 11:17:25,774:DEBUG:140735495942976:Found 3 results

and we extrapolated based on the speed of 1.74 m/s.

2018-04-07 11:17:25,778:DEBUG:140735495942976:Found first section, may need to extrapolate start point
2018-04-07 11:17:25,789:DEBUG:140735495942976:Adding distance 365.9192626980049 to original 66.70459984566997 to extend section start from [-122.26297302190915, 37.87055511505241] to [-122.25908662369412, 37.87174562558791]
2018-04-07 11:17:25,791:DEBUG:140735495942976:After subtracting time 210.05919510662173 from original 22.572930336 to cover additional distance 365.9192626980049 at speed 1.7419816471841274, new_start_ts = 1523041375.37

But that 1.74 m/s is from a very small set of points, at least one of which could be zig-zag. Let's add in a heuristic that says that if the extrapolated distance is >>> measured distance (in this case, 395 versus 66) and the underlying mode is walk or bike, and the measured speed is close to the cap (diff < 25%), set the speed to the max for the mode.

This is important to handle proper mode detection of short walking trips due to the large geofence radius on iOS.

We add this at the clean and resample stage instead of the mode inference stage because at the mode inference stage, we don't know much much we are extrapolating.

Well, technically, we can compare the first point in the resampled and raw data to figure it out. But then the start time is going to be off too. Let's put it in the cleaning + resampling stage for now.

shankari commented 6 years ago

Looking at trips on the test phones, iPhone3 looks pretty bad.

**********4 : 2018-04-06T16:20:42.818245-07:00 -> 2018-04-06T17:36:00.999960-07:00**********
2018-04-06T16:20:42.818245-07:00 2018-04-06T16:38:58.278862-07:00 MotionTypes.WALKING 1.2293202448810991
2018-04-06T16:20:42.818245-07:00 2018-04-06T16:38:58.278862-07:00 PredictedModeTypes.WALKING 1.2293202448810991

2018-04-06T16:38:58.278862-07:00 2018-04-06T17:14:20.373167-07:00 MotionTypes.IN_VEHICLE 5.0034408107066355
2018-04-06T16:38:58.278862-07:00 2018-04-06T17:14:20.373167-07:00 PredictedModeTypes.CAR 5.0034408107066355

2018-04-06T17:14:20.373167-07:00 2018-04-06T17:19:37.000098-07:00 MotionTypes.WALKING 0.9145166170592021
2018-04-06T17:14:20.373167-07:00 2018-04-06T17:19:37.000098-07:00 PredictedModeTypes.WALKING 0.9145166170592021

2018-04-06T17:20:40.000099-07:00 2018-04-06T17:34:26.999960-07:00 MotionTypes.IN_VEHICLE 3.7672128139726015
2018-04-06T17:20:40.000099-07:00 2018-04-06T17:34:26.999960-07:00 PredictedModeTypes.TRAIN 3.7672128139726015

2018-04-06T17:35:18.999961-07:00 2018-04-06T17:36:00.999960-07:00 MotionTypes.WALKING 1.2453066309970888
2018-04-06T17:35:18.999961-07:00 2018-04-06T17:36:00.999960-07:00 PredictedModeTypes.WALKING 1.2453066309970888

The first CAR trip should be train and the second TRAIN trip should be bike. Investigating further....

shankari commented 6 years ago

The issue with the first case is that the segmentation happens midway through the trip.

Investigating further, even after detecting flip flops and merging them, we have

0   [(0, 3, 7
1   (3, 8, <MotionTypes.BICYCLING: 1>
2   (8, 10, 7
3   (10, 71, <MotionTypes.IN_VEHICLE: 0>
7   (71, 85, 7>
16  (85, 106, 0
26  (106, 108, 7)]

However, both the BICYCLING and WALKING are skipped.

2018-04-07 20:10:19,783:INFO:140735495942976:Found 0 filtered points and 0 unfiltered points between 2018-04-06T16:29:54.785455-07:00 and 2018-04-06T16:35:02.563035-07:00 for type MotionTypes.BICYCLING, skipping...
2018-04-07 20:10:19,786:INFO:140735495942976:Found 0 filtered points and 0 unfiltered points between 2018-04-06T16:35:02.563035-07:00 and 2018-04-06T16:35:12.738195-07:00 for type MotionTypes.WALKING, skipping...

So then we have a big stop which we try to squish.

2018-04-07 20:10:22,219:DEBUG:140735495942976:stop distance = 2513 > 150, squishing it between 2018-04-06T16:20:42.818245-07:00 -> 2018-04-06T16:29:14.000026-07:00 and 2018-04-06T16:38:58.278862-07:00 -> 2018-04-06T16:58:57.001378-07:00

but because there are so few walking points, the next section is more dense, and we merge forwards. We can fix this by looking at the modes while squishing and merging towards the non-motorized section.

2018-04-07 20:10:22,259:DEBUG:140735495942976:next_section 2018-04-06T16:29:14.000026-07:00 is more dense than prev_section 2018-04-06T16:38:58.278862-07:00, merging forwards

shankari commented 6 years ago

The issue with the second case is that the sensed mode from the phone, after filtering out flip flops, was IN_VEHICLE. So with the current algorithm, there was no way that this would have ended up as a bicycling trip - it would have either been CAR or TRAIN.

Looking at the sensed modes around that section, it seems hard to argue that we shouldn't pick the IN_VEHICLE. The only really consistent mode (8 minutes) was IN_VEHICLE everything else was a clear idx_diff = 1 kind of flip flop. Why should I then think that this is bicycling? The speed is low, true, but on embarcadero at 5pm, cars are going to be slow too. If we did trajectory matching, we would see that the route didn't completely match, but then we would just fall back to CAR.

We need to experiment with bicycling detection on iOS some more.

15  (84, 85, <MotionTypes.BICYCLING: 1>                 FF      idx_diff
16  (85, 90, <MotionTypes.IN_VEHICLE: 0>                        17:20 -> 17:28
17  (90, 90, 7                                          FF      idx_diff
18  (90, 93, <MotionTypes.IN_VEHICLE: 0>                FF      Sanity checking False
19  (93, 94, 1                                          FF      idx_diff
20  (94, 96, <MotionTypes.IN_VEHICLE: 0>                FF      Sanity checking False
21  (96, 97, 1                                          FF      idx_diff
22  (97, 98, 7                                          FF      idx_diff
23  (98, 99, <MotionTypes.BICYCLING: 1>                 FF      idx_diff
24  (99, 105, <MotionTypes.IN_VEHICLE: 0>               FF      False
25  (105, 106, 1                                        FF      idx_diff

shankari commented 6 years ago

There are some hints, like this

which hint that I am on a trail and not the street, but it is small and could easily be an error as well.

Note also that in this case, the test phone was in my backpack. Maybe we will get different results if it is in a pocket?

At any rate, I am going to skip the second issue for now.

shankari commented 6 years ago

After all these changes, iphone3 has some additional segmentation. This is not terrible, because it all goes to WALKING correctly anyway.

2018-02-26T11:30:42.848401-08:00 2018-02-26T11:47:57.000168-08:00 MotionTypes.WALKING 0.20537946495623852
2018-02-26T11:30:42.848401-08:00 2018-02-26T11:47:57.000168-08:00 PredictedModeTypes.WALKING 0.20537946495623852
2018-02-26T11:49:03.000168-08:00 2018-02-26T11:56:58.000060-08:00 MotionTypes.BICYCLING 1.14644567023953
2018-02-26T11:49:03.000168-08:00 2018-02-26T11:56:58.000060-08:00 PredictedModeTypes.WALKING 1.14644567023953
2018-02-26T11:58:25.000060-08:00 2018-02-26T12:02:46.000041-08:00 MotionTypes.WALKING 1.1535741105440154
2018-02-26T11:58:25.000060-08:00 2018-02-26T12:02:46.000041-08:00 PredictedModeTypes.WALKING 1.1535741105440154

shankari commented 6 years ago

Ah it's because there was a single BICYCLING entry in the middle, but because we merge backward at the beginning and forward at the end, we end up with a difference of 2

2018-04-07 23:00:30,357:DEBUG:140735495942976:At idx 203, time 2018-02-26T11:57:46.374228-08:00, found new activity MotionTypes.BICYCLING compared to current MotionTypes.WALKING
2018-04-07 23:00:30,358:DEBUG:140735495942976:At idx 204, time 2018-02-26T11:57:48.917064-08:00, found new activity MotionTypes.WALKING compared to current MotionTypes.BICYCLING
2018-04-07 23:00:30,358:DEBUG:140735495942976:creating new section for MotionTypes.BICYCLING at 202 -> 204 with start_time 2018-02-26T11:48:39.702499-08:00 -> 2018-02-26T11:57:48.917064-08:00

Let's use idx = 2 for BICYCLING to handle this case.

shankari commented 6 years ago

With that fix (0942774bd8802f20c4a5f84a131436b837f986fd) the segmentation is correct again.

**********0 : 2018-02-26T09:27:03-08:00 -> 2018-02-26T12:02:46.000041-08:00**********
2018-02-26T09:27:03-08:00 2018-02-26T09:32:32.000052-08:00 MotionTypes.WALKING 2.493409531761379
2018-02-26T09:27:03-08:00 2018-02-26T09:32:32.000052-08:00 PredictedModeTypes.BICYCLING 2.493409531761379
2018-02-26T09:36:59.697741-08:00 2018-02-26T10:22:04.000002-08:00 MotionTypes.IN_VEHICLE 6.302443355537658
2018-02-26T09:36:59.697741-08:00 2018-02-26T10:22:04.000002-08:00 PredictedModeTypes.TRAIN 6.302443355537658
2018-02-26T10:25:03.052444-08:00 2018-02-26T11:30:42.848401-08:00 MotionTypes.IN_VEHICLE 13.092253711686567
2018-02-26T10:25:03.052444-08:00 2018-02-26T11:30:42.848401-08:00 PredictedModeTypes.TRAIN 13.092253711686567
2018-02-26T11:30:42.848401-08:00 2018-02-26T12:02:46.000041-08:00 MotionTypes.WALKING 0.31497006975417674
2018-02-26T11:30:42.848401-08:00 2018-02-26T12:02:46.000041-08:00 PredictedModeTypes.WALKING 0.31497006975417674

shankari commented 6 years ago

Some segmentation regressions on iphone2.
On investigating them, the ground truth is

2018-04-08 01:13:36,114:DEBUG:140735495942976:creating new section for MotionTypes.IN_VEHICLE at 45 -> 120 with start_time 2018-02-26T10:22:32.513430-08:00 -> 2018-02-26T11:29:03.018860-08:00
2018-04-08 01:13:36,114:DEBUG:140735495942976:creating new section for MotionTypes.WALKING at 120 -> 120 with start_time 2018-02-26T11:29:03.018860-08:00 -> 2018-02-26T11:29:03.018860-08:00
2018-04-08 01:13:36,115:DEBUG:140735495942976:creating new section for MotionTypes.IN_VEHICLE at 120 -> 122 with start_time 2018-02-26T11:29:03.018860-08:00 -> 2018-02-26T11:33:22.436681-08:00

Pretty clear flip flop. Why doesn't the last section start at 11:33?

Both the WALKING and IN_VEHICLE are flip flops

2018-04-08 01:13:36,143:DEBUG:140735495942976:comparing 120, 120 to see if there is a flipflop
2018-04-08 01:13:36,143:DEBUG:140735495942976:in is_flip_flop: idx_diff = 0
2018-04-08 01:13:36,143:DEBUG:140735495942976:comparing 120, 122 to see if there is a flipflop
2018-04-08 01:13:36,143:DEBUG:140735495942976:in non-walking is_flip_flop: idx_diff = 2

And they are merged correctly.

2018-04-08 01:13:36,192:DEBUG:140735495942976:after merging entries, changes are [(0, 6, 7), (6, 122, <MotionTypes.IN_VEHICLE: 0>), (122, 136, 7)]

Ah, it is the stop squishing. It turns out that the end point of the in_vehicle section is pretty close to the end time, but the start point of the walking section is pretty far from the start time. And we now always merge from the IN_VEHICLE to the WALKING.

2018-04-08 01:13:36,198:DEBUG:140735495942976:Considering MotionTypes.IN_VEHICLE from 2018-02-26T09:34:57.987726-08:00 -> 2018-02-26T11:33:22.436681-08:00

section end point = Location({'_id': ObjectId('5ac1cb0ef6858fba82b73a36')
 'accuracy': 65.0
 'fmt_time': '2018-02-26T11:31:33.932659-08:00'
 'loc': {'type': 'Point' 'coordinates': [-122.2684165534885 37.86940521405621]}
 'user_id': UUID('49cbc158-1d84-45bf-bcdb-84c13550db17')
 'vaccuracy': 67.019607543945312})

2018-04-08 01:13:36,202:DEBUG:140735495942976:Considering MotionTypes.WALKING from 2018-02-26T11:33:22.436681-08:00 -> 2018-02-26T12:00:51.991475-08:00

section start point = Location({'_id': ObjectId('5ac1cb0ef6858fba82b73a6c')
'fmt_time': '2018-02-26T11:51:08.964357-08:00'
 'loc': {'type': 'Point' 'coordinates': [-122.26480638619722 37.871652147767946]}
 'vaccuracy': 12.0})

This is a regression caused by https://github.com/e-mission/e-mission-server/issues/577#issuecomment-379520895

We can fix it by merging towards the section that is closest to the segmentation on the motion activity.

In this case, the motion activity segmentation was at 11:33. The two end points of the stop are 11:31 and 11:51. So we should set the stop end to 11:33. Similarly, in the prior case, the activity segmentation was at 16:29.

2018-04-07 20:10:19,783:INFO:140735495942976:Found 0 filtered points and 0 unfiltered points between 2018-04-06T16:29:54.785455-07:00 and 2018-04-06T16:35:02.563035-07:00 for type MotionTypes.BICYCLING, skipping...

15  2018-04-06T16:29:54.785455-07:00    7
16  2018-04-06T16:30:29.436247-07:00    9
17  2018-04-06T16:30:30.390100-07:00    1
18  2018-04-06T16:30:48.194715-07:00    9

and the stop end points were

5 2018-04-06T16:29:14.000026-07:00 {'type': 'Point', 'coordinates': [-122.2681376... 6 2018-04-06T16:38:58.278862-07:00 {'type': 'Point', 'coordinates': [-122.2714379

So we should clearly merge to that.

But wait a minute - if this was a transition from IN_VEHICLE to WALKING, why didn't we merge the stop backwards? - i.e. set enter and exit to enter

Because our model has been that when transitioning from IN_VEHICLE to WALKING, we will merge forward since the last big gap is anticipated to be at the end of the motorized gap.

The problem is that here, the big gap was at the beginning of the WALKING section.

shankari commented 6 years ago

At this point, almost everything works, except that the trip to the consulate is now classified as CAR. This is because the bike section is glommed onto the train section.

This is because almost the entire bike section is flip-flopping.

2018-04-08 23:44:12,060:DEBUG:140735495942976:flip_flop_list = [2, 3, 4, 5, 6, 7, 8, 9]
2018-04-08 23:44:12,061:DEBUG:140735495942976:flip_flop_streaks = [(2, 8)]

However, when deciding where to merge it, we compare WALKING and IN_VEHICLE and because it is too fast for WALKING, we pick IN_VEHICLE. But the solution is to really not merge at all but to retain it as its own section. Then in mode inference, we will classify it as bike.

2018-04-08 23:44:12,143:DEBUG:140735495942976:Median calculation from speeds = 5.0705894
11485555
2018-04-08 23:44:12,144:DEBUG:140735495942976:after is walking, but speed is 5, merge fo
rward, returning 1

e-mission / e-mission-docs