Open shankari opened 6 years ago
After the fixes, the trip segmentation is correct. Yay!
start_fmt_time | end_fmt_time |
---|---|
2018-02-26T09:26:55.265398-08:00 | 2018-02-26T12:00:51.991475-08:00 |
2018-02-26T14:19:16.417127-08:00 | 2018-02-26T14:25:54.496145-08:00 |
2018-02-26T16:10:21.125442-08:00 | 2018-02-26T18:11:43.967409-08:00 |
But the section segmentation is not.
After the fixes, the trip segmentation is correct. Yay!
start_fmt_time | end_fmt_time |
---|---|
2018-02-26T09:26:55.265398-08:00 | 2018-02-26T12:00:51.991475-08:00 |
2018-02-26T14:19:16.417127-08:00 | 2018-02-26T14:25:54.496145-08:00 |
2018-02-26T16:10:21.125442-08:00 | 2018-02-26T18:11:43.967409-08:00 |
But the section segmentation is not. For example, for the first trip, the sections look like this
Note the walk to the train station sloshing over into the train trip, and the big gap during the train ride. Let's see if we can fix that somehow.
First, consider the slosh over from WALKING
to IN_VEHICLE
2018-02-26T09:37:09.319461-08:00 MotionTypes.WALKING
2018-02-26T09:37:16.419204-08:00 MotionTypes.IN_VEHICLE
Here's where we switch from WALKING
to IN_VEHICLE
.
We assume that changes propogate forward - e.g. 9:34 -> 9:37 is WALKING
because it was WALKING
at 9:34. But it actually looks like it is
IN_VEHICLE
. So we should have broken the section at 9:34 instead of 9:37
The root cause is that during transitions, if there are large-ish gaps, we don't know exactly when the transition happened. Right now, we fix this by:
A better fix would be to determine where the transition is by looking at the speeds and seeing if we can see a clear shift in speeds.
2018-03-27 21:56:15,910:DEBUG:140735691387712:At 2018-02-26T09:34:57.987726-08:00, retained existing activity MotionTypes.WALKING because of no change 2018-03-27 21:56:15,911:DEBUG:140735691387712:At 2018-02-26T09:37:11.204804-08:00, found new activity MotionTypes.IN_VEHICLE compared to current MotionTypes.WALKING - creating new section with start_time 2018-02-26T09:37:11.204804-08:00
Let's see if we have a similar root cause for the big gap in the transit sections.
The big gap in the transit sections is
2018-02-26T10:09:10.849257-08:00 MotionTypes.IN_VEHICLE
2018-02-26T10:37:06.732640-08:00
The corresponding change to the motion activities is
2018-03-27 22:01:16,830:DEBUG:140735691387712:At 2018-02-26T10:08:35.647760-08:00, retained existing activity MotionTypes.IN_VEHICLE because of no change
2018-03-27 22:01:16,830:DEBUG:140735691387712:At 2018-02-26T10:10:57.174305-08:00, retained existing activity MotionTypes.IN_VEHICLE because of no change
...
2018-03-27 22:01:16,835:DEBUG:140735691387712:At 2018-02-26T10:21:34.631955-08:00, found new activity MotionTypes.WALKING compared to current MotionTypes.IN_VEHICLE - creating new section with start_time 2018-02-26T10:21:34.631955-08:00
2018-03-27 22:01:16,835:DEBUG:140735691387712:At 2018-02-26T10:22:32.513430-08:00, retained existing activity MotionTypes.WALKING because of no change
2018-03-27 22:01:16,836:DEBUG:140735691387712:At 2018-02-26T10:25:19.825444-08:00, found new activity MotionTypes.IN_VEHICLE compared to current MotionTypes.WALKING - creating new section with start_time 2018-02-26T10:25:19.825444-08:00
So then when converting this to sections, there are no points between 10:09 and 10:21, so we end the section at 10:09.
2018-03-27 22:01:16,923:DEBUG:140735691387712:Considering MotionTypes.IN_VEHICLE from 2018-02-26T09:37:11.204804-08:00 -> 2018-02-26T10:21:34.631955-08:00
2018-03-27 22:01:16,925:DEBUG:140735691387712:with iloc, section start point =
Location({'_id': ObjectId('5abb2195f6858f0f828a26a7')
'fmt_time': '2018-02-26T09:37:16.419204-08:00'
'loc': {'type': 'Point' 'coordinates': [-122.09809683742127 37.40364156259304]}
section end point = Location({'_id': ObjectId('5abb2195f6858f0f828a2755')
'fmt_time': '2018-02-26T10:09:10.849257-08:00'
'loc': {'type': 'Point' 'coordinates': [-122.29990953446286 37.54018354522394]}
And there are no points between
2018-03-27 22:01:16,926:DEBUG:140735691387712:Considering MotionTypes.WALKING from 2018-02-26T10:21:34.631955-08:00 -> 2018-02-26T10:25:19.825444-08:00
2018-03-27 22:01:16,928:INFO:140735691387712:Found no location points between ... 'fmt_time': '2018-02-26T10:21:34.631955-08:00', 'fmt_time': '2018-02-26T10:25:19.825444-08:00',
And again, because of the big gap in locations, the section only starts at 10:37.
2018-03-27 22:01:16,928:DEBUG:140735691387712:Considering MotionTypes.IN_VEHICLE from 2018-02-26T10:25:19.825444-08:00 -> 2018-02-26T11:29:03.018860-08:00
2018-03-27 22:01:16,931:DEBUG:140735691387712:with iloc,
section start point = Location('fmt_time': '2018-02-26T10:37:06.732640-08:00',
'loc': {'type': 'Point', 'coordinates': [-122.40303033318312, 37.615935257650165]}),
section end point = Location({'fmt_time': '2018-02-26T11:25:08.787427-08:00',
'loc': {'type': 'Point', 'coordinates': [-122.26936768884903, 37.83899754049432]})
Given the locations shown here (10:09 at Hillsdale and 10:37 just beyond Millbrae), I bet that the walking section was at Millbrae station. It is just that we don't have any location points around there.
So it looks like our original goal of using the location points only for segmentation and then the motion activity only for mode detection, is not sufficiently robust to the errors that we see in the field.
Instead, we need to use hybrid approaches for both. In this case, we should be able to extrapolate and determine the points related to the WALKING motion activity. Hopefully, that will be around Millbrae, and we can then use GIS matching to classify the train ride as Caltrain. Because without that, I don't see how we can even have the correct sections that we need for the GIS matching.
There are also a bunch of locations that are filtered out. While they are terrible overall, they do clump around the Millbrae station at this time, so can serve as a second level check to the extrapolation.
In this case, we should be able to extrapolate and determine the points related to the WALKING motion activity.
While extrapolating, I ran into an issue that we expect that the end points of the section are locations in the location database. We certainly make that assumption in the section segmentation code, and we may make the same assumption later, in the clean and resample code.
For now, fixing this by (shocker!) inserting locations for the interpolated end points since there is basically no other choice.
First level of fixes move the walk segment to San Mateo. The problem is that just using the interpolated values directly leads to offsets because if we walked for part of the trip, for example, the interpolated value will be off by quite a bit.
At least for the first walking section around millbrae, we do have unfiltered locations around Millbrae.
2018-02-26T09:26:55.265398-08:00 2018-02-26T09:37:09.319461-08:00 MotionTypes.WALKING
2018-02-26T09:37:16.419204-08:00 2018-02-26T10:21:25.265398-08:00 MotionTypes.IN_VEHICLE
2018-02-26T10:21:55.265398-08:00 2018-02-26T10:24:55.265398-08:00 MotionTypes.WALKING
2018-02-26T10:25:25.265398-08:00 2018-02-26T11:28:55.265398-08:00 MotionTypes.IN_VEHICLE
2018-02-26T11:29:25.265398-08:00 2018-02-26T11:31:55.265398-08:00 MotionTypes.WALKING
2018-02-26T11:32:25.265398-08:00 2018-02-26T11:32:55.265398-08:00 MotionTypes.IN_VEHICLE
2018-02-26T11:33:25.265398-08:00 2018-02-26T12:00:51.991475-08:00 MotionTypes.WALKING
lo.get_map_for_geojson_unsectioned(gfc.get_feature_list_from_df(ts.get_data_df("background/location",
time_query=estt.TimeQuery("data.ts", arrow.get("2018-02-26T10:21:34.631955-08:00").timestamp, arrow.get("2018-02-26T10:25:19.825444-08:00").timestamp))))
I wonder if this is another slosh issue. Let me fix the sloshing and then see what this looks like, and whether we can combine with non-filtered location to handle non-uniform speeds.
Working on slosh issues now...
First fix is to simply start the segment at the beginning of the transition instead of at the end. So if the transition occurred during
2018-03-29 14:17:05,780:DEBUG:140735691387712:At 2018-02-26T09:34:57.987726-08:00, retained existing activity MotionTypes.WALKING because of no change
2018-03-29 14:17:05,780:DEBUG:140735691387712:At 2018-02-26T09:37:11.204804-08:00, found new activity MotionTypes.IN_VEHICLE compared to current MotionTypes.WALKING -
just create the new section at 09:34:57.987726 instead of 09:37:11.204804
This fixes most of the slosh. We now have only two issues left.
an extra walking section from
2018-02-26T11:26:55.265398-08:00 2018-02-26T11:28:55.265398-08:00 MotionTypes.WALKING
Making the extrapolation better for large gaps
Let's look at the spurious walk section first.
The extra walking section turned out to be an example of flip-flopping. Fixed it in 7678818d3d535d82e25c851e66baf1a84fd06eeb. That also fixed a bunch of flip flopping on the way back.
The sections are now:
To berkeley:
2018-02-26T09:26:55.265398-08:00 2018-02-26T09:34:29.706641-08:00 MotionTypes.WALKING
2018-02-26T09:35:25.265398-08:00 2018-02-26T10:18:55.265398-08:00 MotionTypes.IN_VEHICLE
2018-02-26T10:19:25.265398-08:00 2018-02-26T10:22:25.265398-08:00 MotionTypes.WALKING
2018-02-26T10:22:55.265398-08:00 2018-02-26T11:31:55.265398-08:00 MotionTypes.IN_VEHICLE
2018-02-26T11:32:25.265398-08:00 2018-02-26T12:00:51.991475-08:00 MotionTypes.WALKING
From berkeley:
2018-02-26T16:06:31.429716-08:00 2018-02-26T16:26:03.451954-08:00 MotionTypes.WALKING
2018-02-26T16:26:33.451954-08:00 2018-02-26T16:34:06.250344-08:00 MotionTypes.IN_VEHICLE
2018-02-26T16:34:13.848574-08:00 2018-02-26T16:36:33.451954-08:00 MotionTypes.WALKING
2018-02-26T16:37:03.451954-08:00 2018-02-26T17:20:03.451954-08:00 MotionTypes.IN_VEHICLE
2018-02-26T17:20:33.451954-08:00 2018-02-26T17:30:39.993083-08:00 MotionTypes.WALKING
2018-02-26T17:31:03.451954-08:00 2018-02-26T17:56:03.451954-08:00 MotionTypes.IN_VEHICLE
2018-02-26T17:56:33.451954-08:00 2018-02-26T18:11:43.967409-08:00 MotionTypes.AIR_OR_HSR
Remaining fixes are:
2018-02-26T16:34:13.848574-08:00 2018-02-26T16:36:33.451954-08:00 MotionTypes.WALKING
)
We first tried resampling (in 4a0a41580e6e83b6cf5dd5083f8392cda81017a4) but because of the large gaps, the resampled points were not perfect, and we ended up with the following. Note that one of them was so off that the resulting section actually got classified as AIR_OR_HSR
screenshot 1 | screenshot 2 |
---|---|
we considered a couple of techniques for fixing section start/end.
I first tried looking at the unfiltered points and it was a bit of a disaster.
This is the main section that had no points and looking at the unfiltered locations caused it to be classified as AIR_OR_HSR
2018-03-31 18:40:18,624:DEBUG:140735691387712:matched_point None for motion 2018-02-26T10:19:03.968804-08:00, using resampled location 2018-02-26T10:21:21.416628-08:00
2018-03-31 18:40:18,626:DEBUG:140735691387712:matched_point None for motion 2018-02-26T10:22:32.513430-08:00, using resampled location 2018-02-26T10:22:30.913821-08:00
2018-02-26T10:21:21.416628-08:00 2018-02-26T10:22:30.913821-08:00 MotionTypes.AIR_OR_HSR
This is because 10:21 is actually a bogus point and so was 10:22 (accuracy = 904.693035 to 1000)
2018-03-31 18:40:16,944:DEBUG:140735691387712:in is_huge_invalid_ts_offset: returning True
2018-03-31 18:40:16,944:DEBUG:140735691387712:About to set valid column for index = 94
2018-03-31 18:40:16,978:DEBUG:140735691387712:After dropping 94, filtered points =
valid fmt_time
89 True 2018-02-26T10:08:46.884665-08:00
90 True 2018-02-26T10:08:52.430627-08:00
91 True 2018-02-26T10:09:04.430614-08:00
92 True 2018-02-26T10:09:10.432059-08:00
93 True 2018-02-26T10:09:10.849257-08:00
94 False 2018-02-26T10:21:22.413477-08:00
95 True 2018-02-26T10:37:06.732640-08:00
96 True 2018-02-26T10:37:15.602964-08:00
97 True 2018-02-26T10:37:21.604339-08:00
98 True 2018-02-26T10:43:05.173996-08:00
And this time range is correct, because the train that leaves Mountain View at 9:35 gets to Millbrae at 10:22
135 | 237 | 139 | 143 | Northbound Train No. |
---|---|---|---|---|
9:34 | 10:10 | 10:33 | 11:33 | Mountain View |
10:22 | 10:57 | 11:20 | 12:20 | Millbrae |
And the location points that we do have around Millbrae are all around 2018-02-26T10:23:16.074416-08:00
to 2018-02-26T10:37:06.732640-08:00
. I wonder if the train was late that day and the location points are correct but the motion activity was wrong...
So we have the section and it is correct, but we just don't know that it is in Millbrae. Let's see if targeted resampling works any better.
For targeted resampling, we generally don't need to have a close transition. Looking at the values, approx 300 secs (5 mins) is probably good enough to get the start points correctly, at least on iOS. That will still miss some of the points but let's see what we can do with resampling around that.
ok, so resampling does not work at this point. It is better to take a real location that is not "fresh" than it is to re-sample
section 1 | section 2 |
---|---|
so I removed all the resampling, and ignored sections with no points associated with them even if they were not technically a flip-flop, and things actually look pretty good. All the AIR_OR_HSR
are gone, and almost all of the weird zoomy things are gone.
2018-02-26T09:26:55.265398-08:00 2018-02-26T09:34:29.706641-08:00 MotionTypes.WALKING
2018-02-26T09:35:49.405266-08:00 2018-02-26T11:31:33.932659-08:00 MotionTypes.IN_VEHICLE
2018-02-26T11:46:41.487192-08:00 2018-02-26T12:00:51.991475-08:00 MotionTypes.WALKING
2018-02-26T16:06:24.557804-08:00 2018-02-26T16:23:13.049559-08:00 MotionTypes.WALKING
2018-02-26T16:29:13.978054-08:00 2018-02-26T16:39:59.499586-08:00 MotionTypes.IN_VEHICLE
2018-02-26T17:21:45.850317-08:00 2018-02-26T17:30:39.993083-08:00 MotionTypes.WALKING
2018-02-26T17:32:02.441165-08:00 2018-02-26T17:55:39.864102-08:00 MotionTypes.IN_VEHICLE
2018-02-26T18:02:02.208424-08:00 2018-02-26T18:11:43.967409-08:00 MotionTypes.WALKING
The only exception is the section from Berkeley to Millbrae which looks like this.
Before we started making the changes, it looked like this
So this argues that we should have kept the transition at the end in this case. Maybe that is the simple fix that solves everything:
Let's try that now...
That actually works pretty well.
2018-02-26T09:26:55.265398-08:00 2018-02-26T09:34:29.706641-08:00 MotionTypes.WALKING
2018-02-26T09:35:49.405266-08:00 2018-02-26T11:31:33.932659-08:00 MotionTypes.IN_VEHICLE
2018-02-26T11:46:41.487192-08:00 2018-02-26T12:00:51.991475-08:00 MotionTypes.WALKING
2018-02-26T16:06:24.557804-08:00 2018-02-26T16:23:13.049559-08:00 MotionTypes.WALKING
2018-02-26T16:29:13.978054-08:00 2018-02-26T17:22:09.704108-08:00 MotionTypes.IN_VEHICLE
2018-02-26T17:24:36.549367-08:00 2018-02-26T17:30:39.993083-08:00 MotionTypes.WALKING
2018-02-26T17:32:02.441165-08:00 2018-02-26T17:55:39.864102-08:00 MotionTypes.IN_VEHICLE
2018-02-26T18:02:02.208424-08:00 2018-02-26T18:11:43.967409-08:00 MotionTypes.WALKING
The only issue left is the segmentation of the caltrain trip while coming back. Although this looks legit (only 5 minute gap), it is actually very illegit, since the last cluster of points around San Mateo is the one at 17:55, and we definitely didn't make it from San Mateo to Mountain View in 5 minutes.
In some ways, this is the reverse of https://github.com/e-mission/e-mission-server/issues/577#issuecomment-376381364 in that we cover a short distance over a long time (so it looked like a trip end). In this, we cover a long distance in a short time.
But to maintain consistency, each long distance in short time should have a corresponding short distance in long time....
From the schedule, that train is at: Millbrae: 17:33 Hillsdale: 17:43 Palo Alto: 17:56 Mountain View: 18:03
So the 17:55 at San Mateo is clearly wrong and is off by more than 10 minutes. But the problem is that the distance between Millbrae and San Mateo is large enough that ~ 20 minutes still doesn't feel wrong or like the end of a trip or sth. Conceivably you could bike or use city driving and take that long. So it is not clear how we can fix this.
But we can, while creating stops, say that if it looks like a stop is long, we have to extend it to the beginning of the next section.
Trying that now...
design decision while making this.
CLEAN_AND_RESAMPLE
stage.but then when do I adjust the sections to match? option 1
get_filtered_section -> get_filtered_points
, just like we extend the section to the start or end of the trip, we do the same for the stop. The problem with this is that now we have made changes to the _filteredtrip, not the raw trip. So far, all the section munging has been based on raw data. Since the filtered_sections are not stored yet, we have no easy way of getting the filtered_stops short of passing in the stop_map
, which seems like a bad dependencyIt is not possible to say which is better, so we will simply return use the more principled approach and see how it does.
There are two competing priorities while implementing the stop squishing.
if the first or last point of a section is bad, we want to filter it and have
the start or end of the stop reset to the correct start/end (e.g. in fill_stop
).
If there were errors in the section start/end, we clearly don't want them to make it into the filtered stop. So arguably, the order should be
This also resolves our earlier question about where the stop squishing code should be.
after these change, the trips + sections on iphone2 look great. The stop squishing also fixed some other sections (like the start of the trip back, so it is all good).
We may want to add resampled points for the squished stops as well, but that is an optimization.
2018-02-26T09:26:55.265398-08:00 2018-02-26T09:34:29.706641-08:00 MotionTypes.WALKING
2018-02-26T09:35:49.405266-08:00 2018-02-26T11:31:33.932659-08:00 MotionTypes.IN_VEHICLE
2018-02-26T11:46:41.487192-08:00 2018-02-26T12:00:51.991475-08:00 MotionTypes.WALKING
2018-02-26T16:06:24.557804-08:00 2018-02-26T16:23:13.049559-08:00 MotionTypes.WALKING
2018-02-26T16:23:13.049559-08:00 2018-02-26T17:24:36.549367-08:00 MotionTypes.IN_VEHICLE
2018-02-26T17:24:36.549367-08:00 2018-02-26T17:30:39.993083-08:00 MotionTypes.WALKING
2018-02-26T17:32:02.441165-08:00 2018-02-26T18:02:02.208424-08:00 MotionTypes.IN_VEHICLE
2018-02-26T18:02:02.208424-08:00 2018-02-26T18:11:43.967409-08:00 MotionTypes.WALKING
After trying it on the same day for iphone3, we are close, but not perfect.
2018-02-26T09:27:03-08:00 2018-02-26T09:29:24.000052-08:00 MotionTypes.BICYCLING
2018-02-26T09:29:37.000052-08:00 2018-02-26T09:30:05.000052-08:00 MotionTypes.IN_VEHICLE
2018-02-26T09:30:19.000052-08:00 2018-02-26T09:32:32.000052-08:00 MotionTypes.WALKING
2018-02-26T09:36:59.697741-08:00 2018-02-26T10:22:04.000002-08:00 MotionTypes.IN_VEHICLE
2018-02-26T10:25:03.052444-08:00 2018-02-26T11:30:42.848401-08:00 MotionTypes.IN_VEHICLE
2018-02-26T11:30:42.848401-08:00 2018-02-26T12:02:46.000041-08:00 MotionTypes.WALKING
The only real issue is the flip-flopping at the beginning. A minor issue is the VEHICLE -> VEHICLE without merging, but most other stuff will work without it...
2018-04-01 19:52:17,062:DEBUG:140735691387712:while starting flip_flop detection, changes are
[(0, 0, 1) FF
(0, 1, <MotionTypes.WALKING: 7>) FF
(1, 2, <MotionTypes.BICYCLING: 1>) FF
(2, 5, <MotionTypes.IN_VEHICLE: 0>)
(5, 7, BICYCLING<1>)
(7, 9, <MotionTypes.IN_VEHICLE: 0>)
(9, 9, BICYCLING<1>) FF
(9, 23, <MotionTypes.WALKING: 7>)
2018-04-01 19:52:17,183:DEBUG:140735691387712:after generating unique entries, list = [(0, 5), (5, 7), (7, 9), (9, 23), ...]
but both (0, 5) and (7, 9) are BICYCLING, so we can merge them
2018-04-01 19:52:17,184:DEBUG:140735691387712:after merging entries, changes are
[(0, 7, 1)
(7, 9, <MotionTypes.IN_VEHICLE: 0>)
(9, 23, <MotionTypes.WALKING: 7>)
In order to fix this, we need to remove that IN_VEHICLE, which is not hard because it is less than 5 minutes long so clearly invalid, and then we need to merge the two non-motorized modes although they are labelled differently because they have (hopefully) the same speed profile.
Let's see if that works...
Let's see if that works...
No.
>>> for s in cleaned_sections:
print(s.data.start_fmt_time, s.data.end_fmt_time, s.data.sensed_mode)
2018-02-26T09:27:03-08:00 2018-02-26T12:02:46.000041-08:00 MotionTypes.BICYCLING
Actually, that was due to a bug during the refactoring. It does work after all.
2018-02-26T09:27:03-08:00 2018-02-26T09:32:32.000052-08:00 MotionTypes.BICYCLING
2018-02-26T09:36:59.697741-08:00 2018-02-26T10:22:04.000002-08:00 MotionTypes.IN_VEHICLE
2018-02-26T10:25:03.052444-08:00 2018-02-26T11:30:42.848401-08:00 MotionTypes.IN_VEHICLE
2018-02-26T11:30:42.848401-08:00 2018-02-26T12:02:46.000041-08:00 MotionTypes.WALKING
Note that the initial segment is set to bicycling, let's take a quick look at why that happens
It's because the minimum duration checks invalidated all the non-flip-flopped values.
2018-04-01 22:05:52,551:DEBUG:140735691387712:comparing 2, 5 to see if there is a flipflop
2018-04-01 22:05:52,551:DEBUG:140735691387712:Sanity checking section 2018-02-26T09:27:42.753745-08:00 -> 2018-02-26T09:29:09.129657-08:00 for type MotionTypes.IN_VEHICLE = False
2018-04-01 22:05:52,551:DEBUG:140735691387712:comparing 5, 7 to see if there is a flipflop
2018-04-01 22:05:52,551:DEBUG:140735691387712:Sanity checking section 2018-02-26T09:29:09.129657-08:00 -> 2018-02-26T09:29:26.914830-08:00 for type MotionTypes.BICYCLING = False
2018-04-01 22:05:52,552:DEBUG:140735691387712:comparing 7, 9 to see if there is a flipflop
2018-04-01 22:05:52,552:DEBUG:140735691387712:Sanity checking section 2018-02-26T09:29:26.914830-08:00 -> 2018-02-26T09:30:17.725312-08:00 for type MotionTypes.IN_VEHICLE = False
So all the first parts got merged into one section, which is what we want!
2018-04-01 22:05:52,550:DEBUG:140735691387712:while starting flip_flop detection, changes are
[(0, 0, 1) FF 0
(0, 1, <MotionTypes.WALKING: 7>) FF 1
(1, 2, <MotionTypes.BICYCLING: 1>) FF 2
(2, 5, <MotionTypes.IN_VEHICLE: 0>) FF 3
(5, 7, 1) FF 4
(7, 9, <MotionTypes.IN_VEHICLE: 0>) FF 5
(9, 9, 1) FF 6
(9, 23, <MotionTypes.WALKING: 7>)
2018-04-01 22:05:52,612:DEBUG:140735691387712:backward merged_streaks = [(0, 6)]
2018-04-01 22:05:52,613:DEBUG:140735691387712:before merging entries, changes were
[(0, 0)
(0, 1)
(1, 2)
(2, 5)
(5, 7)
(7, 9)
(9, 9)
(9, 23)
So we ended up with one merged section overall. Yay!
2018-04-01 22:05:52,613:DEBUG:140735691387712:after generating unique entries, list =
[(0, 23)
And it is just an artifact of the fact that BICYCLING was first that makes the mode be bicycling. And it doesn't really matter because we will override it in the mode inference step anyway.
2018-04-01 22:05:52,614:DEBUG:140735691387712:After merging, list =
[(1, <MotionTypes.IN_VEHICLE: 0>)
2018-04-01 22:05:52,614:DEBUG:140735691387712:after merging entries, changes are [(0, 23, 1),
2018-04-01 22:05:52,618:DEBUG:140735691387712:Considering MotionTypes.BICYCLING from 2018-02-26T09:27:03-08:00 -> 2018-02-26T09:34:36.836596-08:00
But it is still a bit curious that although the biggest contiguous mode was WALKING, and we merged backwards, we still ended up with mode == BICYCLING. Let's do a quick check to see why...
It's because when we merge backwards, we set the after section's start to the merged section's start
merge
------------
| |
t1 t2 t3
replaces t3 by t1. But when we do, we should retain the mode for t3. Is this going to break stuff?
No it did not break stuff.
2018-02-26T09:27:03-08:00 2018-02-26T09:32:32.000052-08:00 MotionTypes.WALKING
2018-02-26T09:36:59.697741-08:00 2018-02-26T10:22:04.000002-08:00 MotionTypes.IN_VEHICLE
2018-02-26T10:25:03.052444-08:00 2018-02-26T11:30:42.848401-08:00 MotionTypes.IN_VEHICLE
2018-02-26T11:30:42.848401-08:00 2018-02-26T12:02:46.000041-08:00 MotionTypes.WALKING
Going to test against a bunch of more use cases, and then move on to the mode inference.
One thing to note is that if we are checking for validity of all sections and marking them as FF if they are not valid, then WALKING sections could be marked invalid because of zig zags, for example. Or misclassified bicycling sections could be marked as invalid and merged with subsequent IN_VEHICLE sections.
To avoid that, we may want to only check for the validity of non-walking sections.
Running this on android for the same dates almost works. There are just a couple of gaps at the start of train trips, and the trip segmentation seems to segment at Millbrae every time.
2018-02-26T16:39:34-08:00 2018-02-26T17:23:49-08:00 MotionTypes.IN_VEHICLE
2018-02-26T17:24:14.577000-08:00 2018-02-26T17:25:50.682000-08:00 MotionTypes.ON_FOOT
and
2018-02-26T17:36:12-08:00 2018-02-26T18:02:07-08:00 MotionTypes.IN_VEHICLE
2018-02-26T18:02:37-08:00 2018-02-26T18:12:56.195000-08:00 MotionTypes.ON_FOOT
Ending | Starting |
---|---|
Looking more closely into the 2018-02-26T17:25:50.682000-08:00
->
2018-02-26T17:36:12-08:00
segmentation...
we discovered what appeared to be a legitimate trip end at 17:28
2018-04-02 01:17:38,469:DEBUG:140735691387712:------------------------------2018-02-26T17:28:53-08:00------------------------------
2018-04-02 01:17:38,488:DEBUG:140735691387712:prev_point.ts = 1519694905.0, curr_point.ts = 1519694933.0, time gap = 28.0 (vs 300), distance_gap = 7.218737025990433 (vs 100), speed_gap = 0.25781203664251545 (vs 0.3333333333333333) continuing trip
2018-04-02 01:17:38,489:DEBUG:140735691387712:last5MinsDistances.max() = 90.4292158917, last10PointsDistance.max() = 64.2955042684
and we discovered it on the phone as well
5 2018-02-26T17:29:24.775000-08:00 2
6 2018-02-26T17:34:52.767000-08:00 1
Next point was after the geofence exit
2018-04-02 01:17:38,492:DEBUG:140735691387712:------------------------------2018-02-26T17:35:48.344000-08:00------------------------------
2018-04-02 01:17:38,492:DEBUG:140735691387712:Setting new trip start point AttrDict({'fmt_time': '2018-02-26T17:35:48.344000-08:00', 'loc': {'type': 'Point', 'coordinates': [-122.3863437, 37.6005421]}) with idx 75
2018-04-02 01:17:38,497:DEBUG:140735691387712:------------------------------2018-02-26T17:36:12-08:00------------------------------
2018-04-02 01:17:38,500:DEBUG:140735691387712:last5MinsDistances = [ 4516.26067731] with length 1
....
2018-04-02 01:17:40,058:INFO:140735691387712:Found trip end at 2018-02-26T18:12:56.195000-08:00
2018-04-02 01:17:40,126:DEBUG:140735691387712:start_loc_doc = 'fmt_time': '2018-02-26T17:35:48.344000-08:00', end_loc_doc = 'fmt_time': '2018-02-26T18:12:56.195000-08:00')
So far so good. This means that the gap will persist until the cleaning and resampling stage, when we should join it with the previous trip end. Why didn't that happen?
the transition distance is small
2018-04-02 01:17:40,140:DEBUG:140735691387712:while determining new_start_place, transition_distance = 74.93464111158376
2018-04-02 01:17:40,167:DEBUG:140735691387712:transition_distance 74.93464111158376 < 1000, returning False
but we create a new trip anyway
2018-04-02 01:17:40,168:DEBUG:140735691387712:Inserting entry Entry('start_fmt_time': '2018-02-26T17:35:48.344000-08:00', 'end_fmt_time': '2018-02-26T18:12:56.195000-08:00') into timeseries
Given that this is the start of the trip, we will need have joined it in
CLEAN_AND_RESAMPLE
. Why didn't that happen?
2018-04-02 01:35:48,896:DEBUG:140735691387712:Considering trip 5ac1eb64f6858fbc8db75a36: 2018-02-26T17:35:48.344000-08:00 -> 2018-02-26T18:12:56.195000-08:00
Because we filtered out the first point...
2018-04-02 01:35:52,438:DEBUG:140735691387712:Found first section, may need to extrapolate start point
2018-04-02 01:35:52,468:DEBUG:140735691387712:First point 5ac1eb5cf6858fbc8866d0fd ([-122.3863437, 37.6005421]) was filtered, raw_start_place 5ac1eb64f6858fbc8db75a33 ([-122.3871728, 37.6003916]) may be bogus
2018-04-02 01:35:52,468:DEBUG:140735691387712:place_to_point_dist = 74.93464111158376, previous place is also bogus, skipping extrapolation
Why did we filter that out? Because the speed was super high. The quartile values are
2018-04-02 01:35:48,799:DEBUG:140735691387712:quartile values are 0.25 2.174765
0.75 33.300918
And the speed of this first point was
fmt_time | speed | distance | latitude | longitude | ts_diff | |
---|---|---|---|---|---|---|
2018-02-26T17:35:48.344000-08:00 | 0.000000 | 0.000000 | 37.600542 | -122.386344 | NaN | |
2018-02-26T17:36:12-08:00 | 190.913962 | 4516.260677 | 37.579100 | -122.342812 | 23.656 | |
2018-02-26T17:36:19.731000-08:00 | 0.000000 | 0.000000 | 37.579100 | -122.342812 | 7.731 | |
2018-02-26T17:36:26-08:00 | 76.370828 | 478.768727 | 37.577182 | -122.337948 | 6.269 | |
2018-02-26T17:37:24-08:00 | 34.715453 | 2013.496284 | 37.565230 | -122.320785 | 58.000 |
The distance doesn't appear to be too large given that it is the first point after a geofence exit, but the time is short. See how at row 5, the distance is 2km but the time is almost a minute, as opposed to 4km in less than 30 secs.
Let us see if this is the same reason on the other trips too. if so, we may want to treat the first point after a geofence exit, at the beginning of a motorized trip, as special, at least on android...
Other transitions don't have this issue. Tabling it for now until we see how serious it is.
Loaded data from both iPhone and android for 8th March, which had led to an AIR_OR_HSR
section before. Worked perfect on both, including Caltrain + BART transitions. I think this is pretty much ready to go to GIS-based mode inference now
Let's do one last check of the flip-flop, which was the 26th on iPhone2.
that worked too!
2018-03-26T08:08:35.692105-07:00 2018-03-26T08:38:50.965987-07:00 MotionTypes.BICYCLING
I hearby declare victory and move on to the GIS-based mode inference
Argh! But this broke Vaz's car trips. These should have been car.
2018-03-08T08:08:19.133518-08:00 -> 2018-03-08T08:15:03.796000-08:00
2018-03-08T14:33:04.610196-08:00 -> 2018-03-08T14:41:46-08:00
2018-03-08T14:56:06.780093-08:00 -> 2018-03-08T15:05:21.231000-08:00
Instead, we have
2018-03-08T08:08:09.232308-08:00 2018-03-08T08:16:23.290000-08:00 MotionTypes.ON_FOOT
2018-03-08T14:33:04.610196-08:00 2018-03-08T14:41:46-08:00 MotionTypes.IN_VEHICLE
2018-03-08T14:54:57.783576-08:00 2018-03-08T15:06:44.436000-08:00 MotionTypes.ON_FOOT
Let's revisit their segmentation...
OK, so this is because the overall trips are really short. The raw trip is from 2018-03-08T08:09:10.145000-08:00 -> 2018-03-08T08:16:23.290000-08:00
, which is around 7 minutes long. Of that, the last 30 seconds is WALKING, and unfortunately, the first valid IN_VEHICLE is 3 minutes later.
2018-04-02 17:48:52,109:DEBUG:140735691387712:At 2018-03-08T08:12:01.740000-08:00, retained existing activity MotionTypes.IN_VEHICLE because of no change
So the section is 4 minutes, just under 5 minutes. So yes, Virginia, people can take very short car trips.
Let's fix this by making the check more complex. If this is the first or last section and there is a gap between the raw section start and the raw trip start/end, extend the raw section to the trip start/end since that is what we will do anyway for the cleaned data.
An alternate check is to say that the flip-flopped section being deleted should be shorter than the section that is being merged with. Otherwise, the tail wags the dog.
If this is the first or last section and there is a gap between the raw section start and the raw trip start/end, extend the raw section to the trip start/end since that is what we will do anyway for the cleaned data.
This fixed one of them, but not the other. The other had the following profile.
2018-04-02 19:34:44,713:DEBUG:140735691387712:At 2018-03-08T15:00:12.064000-08:00, retained existing activity MotionTypes.IN_VEHICLE because of no change
2018-04-02 19:34:44,714:DEBUG:140735691387712:At idx 1, time 2018-03-08T15:06:44.436000-08:00, found new activity MotionTypes.ON_FOOT compared to current MotionTypes.IN_VEHICLE
2018-04-02 19:34:44,714:DEBUG:140735691387712:creating new section for MotionTypes.IN_VEHICLE at 0 -> 1 with start_time 2018-03-08T15:00:12.064000-08:00 -> 2018-03-08T15:06:44.436000-08:00
2018-04-02 19:34:44,714:INFO:140735691387712:Detected trip end! Ending section at 2018-03-08T15:06:44.436000-08:00
Basically, we had an IN_VEHICLE
section and it was long enough, but there was only one of it and then it ended. So it looked like a flip flop
2018-04-02 19:34:44,714:DEBUG:140735691387712:while starting flip_flop detection, changes are [(0, 1, 0), (1, 1, 2)]
Basically, we suck at really short motorized trips, all our heuristics are failing...
If this is the first or last section and there is a gap between the raw section start and the raw trip start/end, extend the raw section to the trip start/end since that is what we will do anyway for the cleaned data.
This fixed one of them, but not the other. The other had the following profile.
2018-04-02 19:34:44,713:DEBUG:140735691387712:At 2018-03-08T15:00:12.064000-08:00, retained existing activity MotionTypes.IN_VEHICLE because of no change
2018-04-02 19:34:44,714:DEBUG:140735691387712:At idx 1, time 2018-03-08T15:06:44.436000-08:00, found new activity MotionTypes.ON_FOOT compared to current MotionTypes.IN_VEHICLE
2018-04-02 19:34:44,714:DEBUG:140735691387712:creating new section for MotionTypes.IN_VEHICLE at 0 -> 1 with start_time 2018-03-08T15:00:12.064000-08:00 -> 2018-03-08T15:06:44.436000-08:00
2018-04-02 19:34:44,714:INFO:140735691387712:Detected trip end! Ending section at 2018-03-08T15:06:44.436000-08:00
Basically, we had an IN_VEHICLE
section and it was long enough, but there was only one of it and then it ended. So it looked like a flip flop
2018-04-02 19:34:44,714:DEBUG:140735691387712:while starting flip_flop detection, changes are [(0, 1, 0), (1, 1, 2)]
Basically, we suck at really short motorized trips, all our heuristics are failing...
This fixed one of them, but not the other.
Changed by adding a trip_pct
and only considering a flip flop if the section was less than 25% of the total trip time. This fixes all the vaz trips.
Penultimate set of checks - got a report from an alert tester that their bike trips were classified as car. This is very tricky because they
accelerate/decelerate at the rate of cars, and on my commute I top out over 40 km/h on flat (25mph) and average ~26km/h (16+mph). And I take roads, so GIS isn't helpful there either.
Fortunately, it turns out that there is a pattern in these trips that we may be able to embody as a rule.
2018-04-02 14:09:07,196:DEBUG:140735691387712:while starting flip_flop detection, changes are [(0, 1, 1)
(1, 3, <MotionTypes.IN_VEHICLE: 0>)
(3, 8, 7)]
Ok, so the IN_VEHICLE is pretty short :) But the duration is long, almost 8 minutes. The actual activity points are:
So this is actually a flip-flop but doesn't seem like one because when we go from BICYCLING -> IN_VEHICLE, we merge backwards. If we had merged forwards, this would have been removed and the entire section would be marked as WALKING. Later, when the speed was calculated, the mean would have been way above walking, giving us BICYCLING.
But it would have been wrong to merge through to the WALKING because the speed profile of the first part is
count 21.000000 mean 6.085418 std 2.272487 min 0.000000 25% 5.032491 50% 6.276725 75% 7.059986 max 9.865935 |
count 9.000000 mean 0.253537 std 0.095076 min 0.000000 25% 0.285229 50% 0.285229 75% 0.285229 max 0.285229 |
I think it is still correct to split into two parts. The only question is what the first part should be labelled as, and it is hard to make the case that it should be IN_VEHICLE because if we had merged forward instead of backwards, we would have ended up with BICYCLING because the IN_VEHICLE would have been a flip-flop instead.
I think that the real issue here is that this is a toss-up - not absolutely clear in either direction. In that case, maybe we should mark it as a TOSSUP and let the speed determine which way to go.
Let's see if that works for the next one as well...
Not really. Detected as BICYCLING -> IN_VEHICLE
BICYCLING
count 14.000000 mean 5.398825 std 2.517793 min 0.000000 25% 3.901205 50% 6.455850 75% 7.052127 max 8.119158 dtype: float64 |
IN_VEHICLE
count 19.000000 mean 4.793253 std 3.874820 min 0.000000 25% 0.339064 50% 5.922970 75% 7.902541 max 10.046842 dtype: float64 |
2018-04-02 14:09:07,818:DEBUG:140735691387712:while starting flip_flop detection, changes are [(0, 2, 1), (2, 6, <MotionTypes.IN_VEHICLE: 0>)]
Again, they both look pretty straightforward, both with decent points and decent length
2018-03-27T17:33:29.509713-07:00 -> 2018-03-27T17:39:35.486804-07:00
2018-03-27T17:39:39.999938-07:00 -> 2018-03-27T17:48:26.086079-07:00
BUT, note that the transition time from BICYCLING -> IN_VEHICLE is bogus, it should take more than 4 secs. So one of the sides must be bogus. Can again mark as TOSSUP/UNKNOWN.
Last one:
2018-04-02 14:09:05,838:DEBUG:140735691387712:while starting flip_flop detection, changes are
[(0, 0, 1) FF 0
(0, 2, <MotionTypes.IN_VEHICLE: 0>) 1
(2, 2, 7) FF 2
(2, 3, <MotionTypes.BICYCLING: 1>) FF 3
(3, 5, <MotionTypes.IN_VEHICLE: 0>) FF 4
(5, 5, 7) FF 5
(5, 7, <MotionTypes.BICYCLING: 1>) FF 6
(7, 9, <MotionTypes.WALKING: 7>) FF 7
(9, 12, <MotionTypes.BICYCLING: 1>)
(12, 19, <MotionTypes.WALKING: 7>)]
2018-04-02 14:09:05,952:DEBUG:140735691387712:forward merged_streaks = [(2, 7)]
2018-04-02 14:09:05,952:DEBUG:140735691387712:backward merged_streaks = [(0, 0)]
2018-04-02 14:09:05,953:DEBUG:140735691387712:after generating unique entries, list = [(0, 9), (9, 12), (12, 19)]
This is again a tricky set of changes. If we had merged forward, then the bicycling would have been retained and the IN_VEHICLE
would have been removed. and we would have merged all this with the bicycling for a final working solution.
So I think we should treat BICYCLING -> IN_VEHICLE transitions as special.
In particular, the pattern that I see is BICYCLING (~ 1 min) -> IN_VEHICLE (~ 5 mins) -> something else, typically WALKING
The expected behavior is that we should have BICYCLING + IN_VEHICLE merged into one BIKE_OR_CAR section, which should be merged with a subsequent BICYCLING section if it exists or not merged if it doesn't exist.
Alternatively, we can just say that the pattern above maps to BICYCLING. Let's experiment and see how that works.
So our target rule is:
if you see BICYCLING (idx_diff = 1) - (< 1 minute) -> IN_VEHICLE (idx_diff =2 but time ~ 5 minutes) -> BICYCLING
because of this formulation, the first BICYCLING will be marked as a flipflop so we can add it as a new check to should_merge
ok with this fix, there are exactly two errors left and they are hard to fix because of substantial motion activity classification as IN_VEHICLE
.
**********13 : 2018-03-27T17:33:24.832366-07:00 -> 2018-03-27T17:48:26.086079-07:00**********
2018-03-27T17:33:24.832366-07:00 2018-03-27T17:39:24.999937-07:00 MotionTypes.BICYCLING
2018-03-27T17:39:39.999938-07:00 2018-03-27T17:48:26.086079-07:00 MotionTypes.IN_VEHICLE
**********17 : 2018-03-28T16:33:43.016169-07:00 -> 2018-03-28T17:00:26.000100-07:00**********
2018-03-28T16:33:43.016169-07:00 2018-03-28T16:55:04.000063-07:00 MotionTypes.IN_VEHICLE
2018-03-28T16:55:51.000065-07:00 2018-03-28T16:56:52.000064-07:00 MotionTypes.BICYCLING
2018-03-28T16:57:01.000064-07:00 2018-03-28T17:00:26.000100-07:00 MotionTypes.WALKING
In particular, trip 17 flipped from BICYCLING -> WALKING to the IN_VEHICLE -> BICYCLING -> WALKING because of the trip_pct fix.
The transitions are:
[(0, 1, 0) (> 10 mins)
(1, 1, 7) FF 1
(1, 2, <MotionTypes.BICYCLING: 1>) FF 2
(2, 3, <MotionTypes.WALKING: 7>) FF 3
(3, 5, <MotionTypes.IN_VEHICLE: 0>) FF 4
(5, 5, 7) FF 5
(5, 6, <MotionTypes.BICYCLING: 1>) FF 6
(6, 7, <MotionTypes.WALKING: 7>) FF 7
(7, 8, <MotionTypes.RUNNING: 8>) FF 8
(8, 10, <MotionTypes.WALKING: 7>) FF 9
(10, 12, <MotionTypes.BICYCLING: 1>)
(12, 14, <MotionTypes.WALKING: 7>)] FF 11
which turns into
2018-04-03 09:00:34,704:DEBUG:140735691387712:flip_flop_streaks = [(1, 9), (11, 10)]
2018-04-03 09:00:34,761:DEBUG:140735691387712:while merging, comparing curr speed 6.372634580458256 with before 7.448699573481284 and after 4.655161289677437
2018-04-03 09:00:34,762:DEBUG:140735691387712:before is closer, merge forward, returning 1
2018-04-03 09:00:34,763:DEBUG:140735691387712:after generating unique entries, list = [(0, 10), (10, 12), (12, 14)]
For the record, for trip 13, there was no flip flopping at all
2018-04-03 09:00:33,789:DEBUG:140735691387712:while starting flip_flop detection, changes are [(0, 2, 1), (2, 6, <MotionTypes.IN_VEHICLE: 0>)]
2018-04-03 09:00:33,790:DEBUG:140735691387712:flip_flop_list = []
2018-04-03 09:00:33,790:DEBUG:140735691387712:flip_flop_streaks = []
2018-04-03 09:00:33,790:DEBUG:140735691387712:forward merged_streaks = []
2018-04-03 09:00:33,790:DEBUG:140735691387712:backward merged_streaks = []
Let's play around with some bus trips to see if we can come up with an overarching unified model for fast bike, short car and bus.
Checked out some bus trips too. They are all classified as IN_VEHICLE
, which is good.
Spot checked the GIS information - they are all pretty good, except for one bus stop which is actually in the correct location, but OSM does not have the bus stop information (at Dwight and Piedmont).
Other bus characteristics:
we should get some iOS bus data to confirm that though. for now, let's do the GIS integration!
segmentation looks pretty good. Created pull request that was merged to the tripaware server. https://github.com/e-mission/e-mission-server/pull/578
One more issue reported by one of the URAP students was mixed walk and bike for a walking trip. While testing that, also found that Tom's trip to school with Willow was marked as all WALKING. Fixing both of those.
For Tom's trip, using the same merge forward and merge back rules for WALKING <-> BICYCLING as we do with WALKING <-> IN_VEHICLE solved the problem (044eafbe70b8f4c0cdaaf885f4ecae9c42cac640)
So for URAP student's trip, we see this set of sections along with their original and predicted modes.
**********1 : 2018-04-06T12:02:55.370441-07:00 -> 2018-04-06T12:18:52.967424-07:00**********
2018-04-06T12:02:55.370441-07:00 2018-04-06T12:06:48.002566-07:00 MotionTypes.WALKING 1.7419816444435243
2018-04-06T12:02:55.370441-07:00 2018-04-06T12:06:48.002566-07:00 PredictedModeTypes.BICYCLING 1.7419816444435243
2018-04-06T12:07:21.999432-07:00 2018-04-06T12:07:43.998962-07:00 MotionTypes.RUNNING 1.1558113323084085
2018-04-06T12:07:21.999432-07:00 2018-04-06T12:07:43.998962-07:00 PredictedModeTypes.WALKING 1.1558113323084085
2018-04-06T12:08:46.998215-07:00 2018-04-06T12:18:52.967424-07:00 MotionTypes.WALKING 0.08121076478030438
2018-04-06T12:08:46.998215-07:00 2018-04-06T12:18:52.967424-07:00 PredictedModeTypes.WALKING 0.08121076478030438
Basically, there are two main issues:
RUNNING
section splits up the two walking sections so that they are not merged, and are evaluated separately.This suggests at least two obvious fixes.
We should also look to see how the extrapolation happened that resulted in that high a speed. And maybe if the underlying section is WALKING and the computed speed is close to the max walking speed, we use the max walking speed for the interpolation instead of the computed speed.
Quick check on the extrapolation.
Yes, we find 3 points.
2018-04-07 11:17:25,772:DEBUG:140735495942976:deleting 0 points from section points
2018-04-07 11:17:25,774:DEBUG:140735495942976:Found 3 results
and we extrapolated based on the speed of 1.74 m/s.
2018-04-07 11:17:25,778:DEBUG:140735495942976:Found first section, may need to extrapolate start point
2018-04-07 11:17:25,789:DEBUG:140735495942976:Adding distance 365.9192626980049 to original 66.70459984566997 to extend section start from [-122.26297302190915, 37.87055511505241] to [-122.25908662369412, 37.87174562558791]
2018-04-07 11:17:25,791:DEBUG:140735495942976:After subtracting time 210.05919510662173 from original 22.572930336 to cover additional distance 365.9192626980049 at speed 1.7419816471841274, new_start_ts = 1523041375.37
But that 1.74 m/s is from a very small set of points, at least one of which could be zig-zag. Let's add in a heuristic that says that if the extrapolated distance is >>> measured distance (in this case, 395 versus 66) and the underlying mode is walk or bike, and the measured speed is close to the cap (diff < 25%), set the speed to the max for the mode.
This is important to handle proper mode detection of short walking trips due to the large geofence radius on iOS.
We add this at the clean and resample stage instead of the mode inference stage because at the mode inference stage, we don't know much much we are extrapolating.
Well, technically, we can compare the first point in the resampled and raw data to figure it out. But then the start time is going to be off too. Let's put it in the cleaning + resampling stage for now.
Looking at trips on the test phones, iPhone3 looks pretty bad.
**********4 : 2018-04-06T16:20:42.818245-07:00 -> 2018-04-06T17:36:00.999960-07:00**********
2018-04-06T16:20:42.818245-07:00 2018-04-06T16:38:58.278862-07:00 MotionTypes.WALKING 1.2293202448810991
2018-04-06T16:20:42.818245-07:00 2018-04-06T16:38:58.278862-07:00 PredictedModeTypes.WALKING 1.2293202448810991
2018-04-06T16:38:58.278862-07:00 2018-04-06T17:14:20.373167-07:00 MotionTypes.IN_VEHICLE 5.0034408107066355
2018-04-06T16:38:58.278862-07:00 2018-04-06T17:14:20.373167-07:00 PredictedModeTypes.CAR 5.0034408107066355
2018-04-06T17:14:20.373167-07:00 2018-04-06T17:19:37.000098-07:00 MotionTypes.WALKING 0.9145166170592021
2018-04-06T17:14:20.373167-07:00 2018-04-06T17:19:37.000098-07:00 PredictedModeTypes.WALKING 0.9145166170592021
2018-04-06T17:20:40.000099-07:00 2018-04-06T17:34:26.999960-07:00 MotionTypes.IN_VEHICLE 3.7672128139726015
2018-04-06T17:20:40.000099-07:00 2018-04-06T17:34:26.999960-07:00 PredictedModeTypes.TRAIN 3.7672128139726015
2018-04-06T17:35:18.999961-07:00 2018-04-06T17:36:00.999960-07:00 MotionTypes.WALKING 1.2453066309970888
2018-04-06T17:35:18.999961-07:00 2018-04-06T17:36:00.999960-07:00 PredictedModeTypes.WALKING 1.2453066309970888
The first CAR trip should be train and the second TRAIN trip should be bike. Investigating further....
The issue with the first case is that the segmentation happens midway through the trip.
Investigating further, even after detecting flip flops and merging them, we have
0 [(0, 3, 7
1 (3, 8, <MotionTypes.BICYCLING: 1>
2 (8, 10, 7
3 (10, 71, <MotionTypes.IN_VEHICLE: 0>
7 (71, 85, 7>
16 (85, 106, 0
26 (106, 108, 7)]
However, both the BICYCLING and WALKING are skipped.
2018-04-07 20:10:19,783:INFO:140735495942976:Found 0 filtered points and 0 unfiltered points between 2018-04-06T16:29:54.785455-07:00 and 2018-04-06T16:35:02.563035-07:00 for type MotionTypes.BICYCLING, skipping...
2018-04-07 20:10:19,786:INFO:140735495942976:Found 0 filtered points and 0 unfiltered points between 2018-04-06T16:35:02.563035-07:00 and 2018-04-06T16:35:12.738195-07:00 for type MotionTypes.WALKING, skipping...
So then we have a big stop which we try to squish.
2018-04-07 20:10:22,219:DEBUG:140735495942976:stop distance = 2513 > 150, squishing it between 2018-04-06T16:20:42.818245-07:00 -> 2018-04-06T16:29:14.000026-07:00 and 2018-04-06T16:38:58.278862-07:00 -> 2018-04-06T16:58:57.001378-07:00
but because there are so few walking points, the next section is more dense, and we merge forwards. We can fix this by looking at the modes while squishing and merging towards the non-motorized section.
2018-04-07 20:10:22,259:DEBUG:140735495942976:next_section 2018-04-06T16:29:14.000026-07:00 is more dense than prev_section 2018-04-06T16:38:58.278862-07:00, merging forwards
The issue with the second case is that the sensed mode from the phone, after filtering out flip flops, was IN_VEHICLE. So with the current algorithm, there was no way that this would have ended up as a bicycling trip - it would have either been CAR or TRAIN.
Looking at the sensed modes around that section, it seems hard to argue that we shouldn't pick the IN_VEHICLE. The only really consistent mode (8 minutes) was IN_VEHICLE
everything else was a clear idx_diff = 1
kind of flip flop. Why should I then think that this is bicycling? The speed is low, true, but on embarcadero at 5pm, cars are going to be slow too. If we did trajectory matching, we would see that the route didn't completely match, but then we would just fall back to CAR.
We need to experiment with bicycling detection on iOS some more.
15 (84, 85, <MotionTypes.BICYCLING: 1> FF idx_diff
16 (85, 90, <MotionTypes.IN_VEHICLE: 0> 17:20 -> 17:28
17 (90, 90, 7 FF idx_diff
18 (90, 93, <MotionTypes.IN_VEHICLE: 0> FF Sanity checking False
19 (93, 94, 1 FF idx_diff
20 (94, 96, <MotionTypes.IN_VEHICLE: 0> FF Sanity checking False
21 (96, 97, 1 FF idx_diff
22 (97, 98, 7 FF idx_diff
23 (98, 99, <MotionTypes.BICYCLING: 1> FF idx_diff
24 (99, 105, <MotionTypes.IN_VEHICLE: 0> FF False
25 (105, 106, 1 FF idx_diff
There are some hints, like this
which hint that I am on a trail and not the street, but it is small and could easily be an error as well.
Note also that in this case, the test phone was in my backpack. Maybe we will get different results if it is in a pocket?
At any rate, I am going to skip the second issue for now.
After all these changes, iphone3 has some additional segmentation. This is not terrible, because it all goes to WALKING correctly anyway.
2018-02-26T11:30:42.848401-08:00 2018-02-26T11:47:57.000168-08:00 MotionTypes.WALKING 0.20537946495623852
2018-02-26T11:30:42.848401-08:00 2018-02-26T11:47:57.000168-08:00 PredictedModeTypes.WALKING 0.20537946495623852
2018-02-26T11:49:03.000168-08:00 2018-02-26T11:56:58.000060-08:00 MotionTypes.BICYCLING 1.14644567023953
2018-02-26T11:49:03.000168-08:00 2018-02-26T11:56:58.000060-08:00 PredictedModeTypes.WALKING 1.14644567023953
2018-02-26T11:58:25.000060-08:00 2018-02-26T12:02:46.000041-08:00 MotionTypes.WALKING 1.1535741105440154
2018-02-26T11:58:25.000060-08:00 2018-02-26T12:02:46.000041-08:00 PredictedModeTypes.WALKING 1.1535741105440154
Ah it's because there was a single BICYCLING
entry in the middle, but because we merge backward at the beginning and forward at the end, we end up with a difference of 2
2018-04-07 23:00:30,357:DEBUG:140735495942976:At idx 203, time 2018-02-26T11:57:46.374228-08:00, found new activity MotionTypes.BICYCLING compared to current MotionTypes.WALKING
2018-04-07 23:00:30,358:DEBUG:140735495942976:At idx 204, time 2018-02-26T11:57:48.917064-08:00, found new activity MotionTypes.WALKING compared to current MotionTypes.BICYCLING
2018-04-07 23:00:30,358:DEBUG:140735495942976:creating new section for MotionTypes.BICYCLING at 202 -> 204 with start_time 2018-02-26T11:48:39.702499-08:00 -> 2018-02-26T11:57:48.917064-08:00
Let's use idx = 2 for BICYCLING
to handle this case.
With that fix (0942774bd8802f20c4a5f84a131436b837f986fd) the segmentation is correct again.
**********0 : 2018-02-26T09:27:03-08:00 -> 2018-02-26T12:02:46.000041-08:00**********
2018-02-26T09:27:03-08:00 2018-02-26T09:32:32.000052-08:00 MotionTypes.WALKING 2.493409531761379
2018-02-26T09:27:03-08:00 2018-02-26T09:32:32.000052-08:00 PredictedModeTypes.BICYCLING 2.493409531761379
2018-02-26T09:36:59.697741-08:00 2018-02-26T10:22:04.000002-08:00 MotionTypes.IN_VEHICLE 6.302443355537658
2018-02-26T09:36:59.697741-08:00 2018-02-26T10:22:04.000002-08:00 PredictedModeTypes.TRAIN 6.302443355537658
2018-02-26T10:25:03.052444-08:00 2018-02-26T11:30:42.848401-08:00 MotionTypes.IN_VEHICLE 13.092253711686567
2018-02-26T10:25:03.052444-08:00 2018-02-26T11:30:42.848401-08:00 PredictedModeTypes.TRAIN 13.092253711686567
2018-02-26T11:30:42.848401-08:00 2018-02-26T12:02:46.000041-08:00 MotionTypes.WALKING 0.31497006975417674
2018-02-26T11:30:42.848401-08:00 2018-02-26T12:02:46.000041-08:00 PredictedModeTypes.WALKING 0.31497006975417674
Some segmentation regressions on iphone2.
On investigating them, the ground truth is
2018-04-08 01:13:36,114:DEBUG:140735495942976:creating new section for MotionTypes.IN_VEHICLE at 45 -> 120 with start_time 2018-02-26T10:22:32.513430-08:00 -> 2018-02-26T11:29:03.018860-08:00
2018-04-08 01:13:36,114:DEBUG:140735495942976:creating new section for MotionTypes.WALKING at 120 -> 120 with start_time 2018-02-26T11:29:03.018860-08:00 -> 2018-02-26T11:29:03.018860-08:00
2018-04-08 01:13:36,115:DEBUG:140735495942976:creating new section for MotionTypes.IN_VEHICLE at 120 -> 122 with start_time 2018-02-26T11:29:03.018860-08:00 -> 2018-02-26T11:33:22.436681-08:00
Pretty clear flip flop. Why doesn't the last section start at 11:33?
Both the WALKING and IN_VEHICLE are flip flops
2018-04-08 01:13:36,143:DEBUG:140735495942976:comparing 120, 120 to see if there is a flipflop
2018-04-08 01:13:36,143:DEBUG:140735495942976:in is_flip_flop: idx_diff = 0
2018-04-08 01:13:36,143:DEBUG:140735495942976:comparing 120, 122 to see if there is a flipflop
2018-04-08 01:13:36,143:DEBUG:140735495942976:in non-walking is_flip_flop: idx_diff = 2
And they are merged correctly.
2018-04-08 01:13:36,192:DEBUG:140735495942976:after merging entries, changes are [(0, 6, 7), (6, 122, <MotionTypes.IN_VEHICLE: 0>), (122, 136, 7)]
Ah, it is the stop squishing. It turns out that the end point of the in_vehicle section is pretty close to the end time, but the start point of the walking section is pretty far from the start time. And we now always merge from the IN_VEHICLE to the WALKING.
2018-04-08 01:13:36,198:DEBUG:140735495942976:Considering MotionTypes.IN_VEHICLE from 2018-02-26T09:34:57.987726-08:00 -> 2018-02-26T11:33:22.436681-08:00
section end point = Location({'_id': ObjectId('5ac1cb0ef6858fba82b73a36')
'accuracy': 65.0
'fmt_time': '2018-02-26T11:31:33.932659-08:00'
'loc': {'type': 'Point' 'coordinates': [-122.2684165534885 37.86940521405621]}
'user_id': UUID('49cbc158-1d84-45bf-bcdb-84c13550db17')
'vaccuracy': 67.019607543945312})
2018-04-08 01:13:36,202:DEBUG:140735495942976:Considering MotionTypes.WALKING from 2018-02-26T11:33:22.436681-08:00 -> 2018-02-26T12:00:51.991475-08:00
section start point = Location({'_id': ObjectId('5ac1cb0ef6858fba82b73a6c')
'fmt_time': '2018-02-26T11:51:08.964357-08:00'
'loc': {'type': 'Point' 'coordinates': [-122.26480638619722 37.871652147767946]}
'vaccuracy': 12.0})
This is a regression caused by https://github.com/e-mission/e-mission-server/issues/577#issuecomment-379520895
We can fix it by merging towards the section that is closest to the segmentation on the motion activity.
In this case, the motion activity segmentation was at 11:33. The two end points of the stop are 11:31 and 11:51. So we should set the stop end to 11:33. Similarly, in the prior case, the activity segmentation was at 16:29.
2018-04-07 20:10:19,783:INFO:140735495942976:Found 0 filtered points and 0 unfiltered points between 2018-04-06T16:29:54.785455-07:00 and 2018-04-06T16:35:02.563035-07:00 for type MotionTypes.BICYCLING, skipping...
15 2018-04-06T16:29:54.785455-07:00 7
16 2018-04-06T16:30:29.436247-07:00 9
17 2018-04-06T16:30:30.390100-07:00 1
18 2018-04-06T16:30:48.194715-07:00 9
and the stop end points were
5 2018-04-06T16:29:14.000026-07:00 {'type': 'Point', 'coordinates': [-122.2681376... 6 2018-04-06T16:38:58.278862-07:00 {'type': 'Point', 'coordinates': [-122.2714379
So we should clearly merge to that.
But wait a minute - if this was a transition from IN_VEHICLE to WALKING, why didn't we merge the stop backwards? - i.e. set enter and exit to enter
Because our model has been that when transitioning from IN_VEHICLE to WALKING, we will merge forward since the last big gap is anticipated to be at the end of the motorized gap.
The problem is that here, the big gap was at the beginning of the WALKING section.
At this point, almost everything works, except that the trip to the consulate is now classified as CAR. This is because the bike section is glommed onto the train section.
This is because almost the entire bike section is flip-flopping.
2018-04-08 23:44:12,060:DEBUG:140735495942976:flip_flop_list = [2, 3, 4, 5, 6, 7, 8, 9]
2018-04-08 23:44:12,061:DEBUG:140735495942976:flip_flop_streaks = [(2, 8)]
However, when deciding where to merge it, we compare WALKING and IN_VEHICLE and because it is too fast for WALKING, we pick IN_VEHICLE. But the solution is to really not merge at all but to retain it as its own section. Then in mode inference, we will classify it as bike.
2018-04-08 23:44:12,143:DEBUG:140735495942976:Median calculation from speeds = 5.0705894
11485555
2018-04-08 23:44:12,144:DEBUG:140735495942976:after is walking, but speed is 5, merge fo
rward, returning 1
There are a ton of small fixes needed to fix segmentation so that we can do GIS-based mode inference. Let's see if we can keep track of them in one issue so that we remember them all.