Closed shankari closed 5 years ago
Let's compare the behavior of the last trip on the 22nd (works) with the first trip on the 25th (doesn't work)
2016-09-04 21:25:08,764:DEBUG:------------------------------2016-07-22T17:13:50.187000-07:00------------------------------
...
2016-09-04 21:25:09,570:DEBUG:Inserting entry Entry({'data': Rawtrip({'end_place': ObjectId('57ccf3a5f6858fc9f48d5679'),
'start_ts': 1469232830.187,
'start_place': ObjectId('57ccedabf6858fbfb2338210'),
'duration': 465.8129999637604,
'distance': 587.4060548346308,
'start_loc': {"coordinates": [-122.2615781, 37.8700137], "type": "Point"},
'end_ts': 1469233296,
'start_fmt_time': u'2016-07-22T17:13:50.187000-07:00',
'end_loc': {"coordinates": [-122.26827, 37.8700362], "type": "Point"},
'source': 'DwellSegmentationTimeFilter',
'end_fmt_time': u'2016-07-22T17:21:36-07:00',
'_id': ObjectId('57ccf3a5f6858fc9f48d5678'),
'key': 'segmentation/raw_trip'})}) into timeseries
2016-09-04 21:25:09,763:INFO:++++++++++++++++++++Processing trip 57ccf3a5f6858fc9f48d5678 for user 0763de67-f61e-3f5d-90e7-518e69793954++++++++++++++++++++
2016-09-04 21:25:09,821:DEBUG:curr_query = {'$or': [{'metadata.key': 'background/motion_activity'}], 'user_id': UUID('0763de67-f61e-3f5d-90e7-518e69793954'), 'data.ts': {'$lte': 1469233296, '$gte': 1469232830.187}}, sort_key = data.ts
2016-09-04 21:25:09,829:DEBUG:curr activity = Motionactivity({'fmt_time': u'2016-07-22T17:14:03.083000-07:00', '_id': ObjectId('5793f338abbf384cadf90b4d'), u'type': 2})
...
2016-09-04 21:25:09,848:DEBUG:motion_df = _id confidence fmt_time \
0 5793f338abbf384cadf90b4d 85 2016-07-22T17:14:03.083000-07:00
...
2016-09-04 21:25:09,849:DEBUG:At 2016-07-22T17:14:03.083000-07:00, retained existing activity MotionTypes.ON_FOOT because of no change
2016-09-04 21:25:09,872:DEBUG:Considering MotionTypes.ON_FOOT from 2016-07-22T17:14:03.083000-07:00 -> 2016-07-22T17:20:45.172000-07:00
...
2016-09-04 21:25:09,935:DEBUG:Inserting entry Entry({'data': Section({'sensed_mode': 2,
'start_loc': {"coordinates": [-122.2615781, 37.8700137], "type": "Point"},
'end_ts': 1469233296,
'start_ts': 1469232830.187,
'start_fmt_time': u'2016-07-22T17:13:50.187000-07:00',
'end_loc': {"coordinates": [-122.26827, 37.8700362], "type": "Point"},
'source': 'SmoothedHighConfidenceMotion',
'end_fmt_time': u'2016-07-22T17:21:36-07:00',
'_id': ObjectId('57ccf3a5f6858fc9f48d5680'),
'key': 'segmentation/raw_section'})}) into timeseries
2016-09-04 21:25:10,187:INFO:^^^^^^^^^^^^^^^^^^^^Smoothing section 57ccf3a5f6858fc9f48d5680 for user 0763de67-f61e-3f5d-90e7-518e69793954^^^^^^^^^^^^^^^^^^^^
2016-09-04 21:25:10,229:DEBUG:Found 18 results
2016-09-04 21:25:10,266:DEBUG:For cluster 0 - 18, distance = 587.406054835, is_cluster = False
2016-09-04 21:25:10,266:DEBUG:smoothing: segment_list = [Segment(0, 18, 587.406054835)]
2016-09-04 21:25:10,266:DEBUG:After splitting, segment list is [Segment(0, 18, 587.406054835)] with size 1
2016-09-04 21:25:10,266:INFO:No jumps, nothing to filter
2016-09-04 21:25:10,268:DEBUG:deleted 0 points
2016-09-04 21:25:10,446:INFO:For stage PipelineStages.CLEAN_RESAMPLING, start_ts = 2016-07-23T00:05:57.818000
2016-09-04 21:25:10,543:DEBUG:Filtering trip Entry({u'data': {u'start_loc': {u'type': u'Point', u'coordinates': [-122.2615781, 37.8700137]},
u'start_ts': 1469232830.187,
u'start_fmt_time': u'2016-07-22T17:13:50.187000-07:00',
u'start_place': ObjectId('57ccedabf6858fbfb2338210'),
u'end_fmt_time': u'2016-07-22T17:21:36-07:00',
u'key': u'segmentation/raw_trip'}})
2016-09-04 21:25:10,586:DEBUG:Getting filtered points for section Entry({u'data': {'start_fmt_time': u'2016-07-22T17:13:50.187000-07:00', '_id': ObjectId('57ccf3a5f6858fc9f48d5680'), 'key': u'segmentation/raw_section'}})
2016-09-04 21:25:10,627:DEBUG:curr_query = {'$or': [{'metadata.key': 'background/filtered_location'}], 'user_id': UUID('0763de67-f61e-3f5d-90e7-518e69793954'), 'data.ts': {'$lte': 1469233296, '$gte': 1469232830.187}}, sort_key = data.ts
2016-09-04 21:25:10,631:DEBUG:deleting 0 points from section points
2016-09-04 21:25:10,635:DEBUG:Found 18 results
2016-09-04 21:25:10,639:DEBUG:After filtering, rows went from 18 -> 18, cols from 22 -> 21
2016-09-04 21:25:10,640:DEBUG:Found first section, need to extrapolate start point
2016-09-04 21:25:10,641:DEBUG:Start place is 57ccedabf6858fbfb2338210 and corresponding location is 5792bb92abbf384cadf84cf8
2016-09-04 21:25:10,654:DEBUG:Adding distance 134.814056604 to original 810.720768052 to extend section start from [-122.2615781, 37.8700137] to [-122.2602491, 37.8706214]
2016-09-04 21:25:10,656:DEBUG:After subtracting time 91.1670378874 from original 465.812999964 to cover additional distance 134.814056604 at speed 1.47875876773, new_start_ts = 1469232739.02
2016-09-04 21:25:10,672:DEBUG:Setting section start values of Cleanedsection({u'source': u'SmoothedHighConfidenceMotion', 'trip_id': ObjectId('57ccf3a6f6858fc9f48d5686')}) to first point
fmt_time 2016-07-22T17:12:19.019962-07:00
latitude 37.8706
longitude -122.26
ts 1.46923e+09
2016-09-04 21:25:10,678:DEBUG:Resampling entry list fmt_time ts longitude latitude
0 2016-07-22T17:12:19.019962-07:00 1.469233e+09 -122.260249 37.870621
1 2016-07-22T17:13:50.187000-07:00 1.469233e+09 -122.261578 37.870014
2 2016-07-22T17:14:08-07:00 1.469233e+09 -122.261491 37.870212
3 2016-07-22T17:14:26.195000-07:00 1.469233e+09 -122.261797 37.869955
4 2016-07-22T17:14:50-07:00 1.469233e+09 -122.261927 37.870569 of size 19
2016-09-04 21:25:10,682:DEBUG:After resampling, using 20 points from 1469232739.02 -> 1469233296.0
2016-09-04 21:25:10,733:DEBUG:with_speeds_df = idx ts fmt_time longitude latitude
0 0 1.469233e+09 2016-07-22T17:12:19.019962-07:00 -122.260249 37.870621
1 1 1.469233e+09 2016-07-22T17:12:49.019962-07:00 -122.260686 37.870421
2 2 1.469233e+09 2016-07-22T17:13:19.019962-07:00 -122.261124 37.870221
3 3 1.469233e+09 2016-07-22T17:13:49.019962-07:00 -122.261561 37.870021
4 4 1.469233e+09 2016-07-22T17:14:19.019962-07:00 -122.261676 37.870056
2016-09-04 21:25:10,735:INFO:removed 0 entries containing n/a
2016-09-04 21:25:10,759:DEBUG:Inserting entry Entry({'data': Recreatedlocation({
'loc': {"coordinates": [-122.2602491, 37.870621399999997], "type": "Point"},
'idx': 0,
'fmt_time': '2016-07-22T17:12:19.019962-07:00',
'ts': 1469232739.019962,
'_id': ObjectId('57ccf3a6f6858fc9f48d5689'),
'key': 'analysis/recreated_location'})}) into timeseries
2016-09-04 21:25:10,834:DEBUG:Copying start_ts 1469232739.02, start_fmt_time 2016-07-22T17:12:19.019962-07:00, start_local_dt from 57ccf3a6f6858fc9f48d5688 to Cleanedtrip
2016-09-04 21:25:10,835:DEBUG:Starting with element of type trip,
id 57ccf3a6f6858fc9f48d5688,
details Entry({'data': Cleanedsection(
{'start_loc': {u'type': u'Point', u'coordinates': [-122.2602491, 37.8706214]},
'end_ts': 1469233296.0,
'start_ts': 1469232739.0199621,
'start_fmt_time': '2016-07-22T17:12:19.019962-07:00',
'end_loc': {u'type': u'Point', u'coordinates': [-122.26827, 37.8700362]},
'sensed_mode': 2,
u'source': u'SmoothedHighConfidenceMotion',
'end_fmt_time': u'2016-07-22T17:21:36-07:00',
'_id': ObjectId('57ccf3a6f6858fc9f48d5688'),
'key': 'analysis/cleaned_section'})})
2016-09-04 21:25:11,308:DEBUG:for trip Entry({'data': Cleanedtrip({
'start_loc': {u'type': u'Point', u'coordinates': [-122.2602491, 37.8706214]},
'end_ts': 1469233296.0,
'start_ts': 1469232739.0199621,
'start_fmt_time': '2016-07-22T17:12:19.019962-07:00',
u'end_loc': {u'type': u'Point', u'coordinates': [-122.26827, 37.8700362]},
u'source': u'DwellSegmentationTimeFilter',
u'end_fmt_time': u'2016-07-22T17:21:36-07:00',
'_id': ObjectId('57ccf3a6f6858fc9f48d5686'),
'key': 'analysis/cleaned_trip'})}) start,
2016-09-04 21:25:09,023:DEBUG:------------------------------2016-07-25T17:12:27.853000-07:00------------------------------
2016-09-04 21:25:09,029:DEBUG:prev_point.ts = 1469233379.04, curr_point.ts = 1469491947.85, threshold = 300, large gap = 258568.811, ending trip
2016-09-04 21:25:09,030:INFO:Found trip end at 2016-07-22T17:21:36-07:00
2016-09-04 21:25:09,030:DEBUG:Setting new trip start point AttrDict({
u'loc': {u'type': u'Point', u'coordinates': [-122.0908901, 37.392124]},
u'ts': 1469491947.853,
u'latitude': 37.392124,
u'fmt_time': u'2016-07-25T17:12:27.853000-07:00',
'idx': 22,
u'longitude': -122.0908901,
'_id': ObjectId('5796b24fabbf384cadfa37e2')) with idx 22
2016-09-04 21:25:09,635:DEBUG:On android, found restart events _id curr_state fmt_time \
0 5793f338abbf384cadf90ba9 3 2016-07-22T17:33:54.634000-07:00
1 5796b24fabbf384cadfa37d9 3 2016-07-25T15:35:33.679000-07:00
2016-09-04 21:25:09,635:DEBUG:tracking has been restarted, returning True
2016-09-04 21:25:09,670:DEBUG:Inserting entry Entry({'data': Rawtrip({
'end_place': ObjectId('57ccf3a5f6858fc9f48d567b'),
'start_ts': 1469491947.853,
'start_place': ObjectId('57ccf3a5f6858fc9f48d567d'),
'start_loc': {"coordinates": [-122.0908901, 37.392124], "type": "Point"},
'end_ts': 1469492294.841,
'start_fmt_time': u'2016-07-25T17:12:27.853000-07:00',
'end_loc': {"coordinates": [-122.09519, 37.3915317], "type": "Point"},
'source': 'DwellSegmentationTimeFilter',
'end_fmt_time': u'2016-07-25T17:18:14.841000-07:00',
'_id': ObjectId('57ccf3a5f6858fc9f48d567a'),
'key': 'segmentation/raw_trip'})}) into timeseries
2016-09-04 21:25:09,936:INFO:++++++++++++++++++++Processing trip 57ccf3a5f6858fc9f48d567a for user 0763de67-f61e-3f5d-90e7-518e69793954++++++++++++++++++++
2016-09-04 21:25:10,009:DEBUG:motion_df = _id confidence fmt_time \
0 5796b250abbf384cadfa37e5 80 2016-07-25T17:13:06.444000-07:00
2016-09-04 21:25:10,010:DEBUG:At 2016-07-25T17:13:06.444000-07:00, retained existing activity MotionTypes.BICYCLING because of no change
2016-09-04 21:25:10,050:DEBUG:Inserting entry Entry({'data': Section({'sensed_mode': 1,
'start_loc': {"coordinates": [-122.0908901, 37.392124], "type": "Point"},
'end_ts': 1469492294.841,
'start_ts': 1469491947.853,
'start_fmt_time': u'2016-07-25T17:12:27.853000-07:00',
'end_loc': {"coordinates": [-122.09519, 37.3915317], "type": "Point"},
'source': 'SmoothedHighConfidenceMotion',
'end_fmt_time': u'2016-07-25T17:18:14.841000-07:00',
'_id': ObjectId('57ccf3a6f6858fc9f48d5681'),
'key': 'segmentation/raw_section'})}) into timeseries
2016-09-04 21:25:10,271:INFO:^^^^^^^^^^^^^^^^^^^^Smoothing section 57ccf3a6f6858fc9f48d5681 for user 0763de67-f61e-3f5d-90e7-518e69793954^^^^^^^^^^^^^^^^^^^^
2016-09-04 21:25:10,271:DEBUG:filter_jumps(0763de67-f61e-3f5d-90e7-518e69793954, 57ccf3a6f6858fc9f48d5681) called
2016-09-04 21:25:10,314:DEBUG:Found 15 results
2016-09-04 21:25:10,352:DEBUG:smoothing: segment_list = [Segment(0, 15, 385.539610671)]
2016-09-04 21:25:10,353:INFO:No jumps, nothing to filter
2016-09-04 21:25:10,354:DEBUG:deleted 0 points
2016-09-04 21:25:10,838:DEBUG:Filtering trip Entry({u'data': {
u'start_loc': {u'type': u'Point', u'coordinates': [-122.0908901, 37.392124]},
u'end_ts': 1469492294.841,
u'start_ts': 1469491947.853,
u'start_fmt_time': u'2016-07-25T17:12:27.853000-07:00',
u'end_loc': {u'type': u'Point', u'coordinates': [-122.09519, 37.3915317]},
u'source': u'DwellSegmentationTimeFilter',
u'start_place': ObjectId('57ccf3a5f6858fc9f48d567d'),
u'end_fmt_time': u'2016-07-25T17:18:14.841000-07:00',
u'_id': ObjectId('57ccf3a5f6858fc9f48d567a'),
u'key': u'segmentation/raw_trip'}})
2016-09-04 21:25:10,872:DEBUG:Getting filtered points for section Entry({u'data': {
'sensed_mode': 1,
u'start_loc': {u'type': u'Point', u'coordinates': [-122.0908901, 37.392124]},
u'end_ts': 1469492294.841,
u'start_ts': 1469491947.853,
u'start_fmt_time': u'2016-07-25T17:12:27.853000-07:00',
u'end_loc': {u'type': u'Point', u'coordinates': [-122.09519, 37.3915317]},
u'source': u'SmoothedHighConfidenceMotion',
u'end_fmt_time': u'2016-07-25T17:18:14.841000-07:00',
u'_id': ObjectId('57ccf3a6f6858fc9f48d5681'),
u'key': u'segmentation/raw_section'}})
2016-09-04 21:25:10,907:DEBUG:curr_query = {'$or': [{'metadata.key': 'background/filtered_location'}], 'user_id': UUID('0763de67-f61e-3f5d-90e7-518e69793954'), 'data.ts': {'$lte': 1469492294.841, '$gte': 1469491947.853}}, sort_key = data.ts
2016-09-04 21:25:10,915:DEBUG:Found 15 results
2016-09-04 21:25:10,919:DEBUG:Found first section, need to extrapolate start point
2016-09-04 21:25:10,923:DEBUG:Start place is 57ccf3a5f6858fc9f48d567d and corresponding location is 5796b24fabbf384cadfa37e2
2016-09-04 21:25:10,936:DEBUG:Adding distance 0.0 to original 1030.89182392 to extend section start from [-122.0908901, 37.392124] to [-122.0908901, 37.392124]
2016-09-04 21:25:10,938:DEBUG:After subtracting time 0.0 from original 346.988000154 to cover additional distance 0.0 at speed 3.27101683125, new_start_ts = 1469491947.85
2016-09-04 21:25:10,952:DEBUG:Setting section start values of Cleanedsection({u'source': u'SmoothedHighConfidenceMotion', 'trip_id': ObjectId('57ccf3a6f6858fc9f48d569e')}) to first point
fmt_time 2016-07-25T17:12:27.853000-07:00
latitude 37.3921
loc {u'type': u'Point', u'coordinates': [-122.0908...
longitude -122.091
ts 1.46949e+09
2016-09-04 21:25:10,953:DEBUG:Resampling data of size 16
2016-09-04 21:25:10,957:DEBUG:Resampling entry list fmt_time ts longitude latitude
0 2016-07-25T17:12:27.853000-07:00 1.469492e+09 -122.090890 37.392124
1 2016-07-25T17:12:27.853000-07:00 1.469492e+09 -122.090890 37.392124
2 2016-07-25T17:12:52-07:00 1.469492e+09 -122.092018 37.392394
3 2016-07-25T17:13:08.026000-07:00 1.469492e+09 -122.092310 37.392744
4 2016-07-25T17:13:36-07:00 1.469492e+09 -122.091964 37.392901 of size 16
2016-09-04 21:25:10,962:DEBUG:After resampling, using 13 points from 1469491947.85 -> 1469492294.84
2016-09-04 21:25:11,000:DEBUG:with_speeds_df = idx ts fmt_time longitude latitude
0 0 1.469492e+09 2016-07-25T17:12:27.853000-07:00 NaN NaN
1 1 1.469492e+09 2016-07-25T17:12:57.853000-07:00 -122.092124 37.392522
2 2 1.469492e+09 2016-07-25T17:13:27.853000-07:00 -122.092065 37.392856
3 3 1.469492e+09 2016-07-25T17:13:57.853000-07:00 -122.091765 37.392278
4 4 1.469492e+09 2016-07-25T17:14:27.853000-07:00 -122.091654 37.391356
2016-09-04 21:25:11,002:INFO:removed -2 entries containing n/a
Aha! From https://github.com/e-mission/e-mission-server/issues/385#issuecomment-244668071, we can see the problem. Basically, if this is not a restart, then we join the section to the previous place, which means that we add an extrapolated start point (2016-07-25T17:12:27.853000-07:00 -> 2016-07-22T17:12:19.020000-07:00
).
But in the case of a restart, we don't have the previous end place. So the start place and the section start are the same and interpolating doesn't change the start point ([-122.0908901, 37.392124] to [-122.0908901, 37.392124]
).
So when we add the new start point, it is a duplicate of the existing point
2016-09-04 21:25:10,957:DEBUG:Resampling entry list fmt_time ts longitude latitude
0 2016-07-25T17:12:27.853000-07:00 1.469492e+09 -122.090890 37.392124
1 2016-07-25T17:12:27.853000-07:00 1.469492e+09 -122.090890 37.392124
and when we resample, we end up with NaN
values for the first point
2016-09-04 21:25:11,000:DEBUG:with_speeds_df = idx ts fmt_time longitude latitude
0 0 1.469492e+09 2016-07-25T17:12:27.853000-07:00 NaN NaN
1 1 1.469492e+09 2016-07-25T17:12:57.853000-07:00 -122.092124 37.392522
And when we drop_na, we drop that value and, in effect, the first point.
2016-09-04 21:25:11,002:INFO:removed -2 entries containing n/a
Fix is simple - just don't add the extrapolated start point if it is the same.
This doesn't seem to have fixed it. I see a similar issue in the trips at Hawaii, and also see an issue at the trip end.
Example 1 | Example 2 |
---|---|
By adding a breakpoint to the client, we can get the ids of the affected trips
Looking at the first raw trip (57cda0fff6858f4c2989a89a). Exact start time is 2016-07-27T14:43:45.576116-10:00.
The resampled points don't have any n/a and nothing is stripped out as well.
2016-09-05 09:45:06,003:DEBUG:with_speeds_df = idx ts fmt_time longitude latitude
0 0 1.469667e+09 2016-07-27T14:43:45.576116-10:00 -157.820470 21.276773
1 1 1.469667e+09 2016-07-27T14:44:15.576116-10:00 -157.820711 21.276697
2 2 1.469667e+09 2016-07-27T14:44:45.576116-10:00 -157.820951 21.276621
3 3 1.469667e+09 2016-07-27T14:45:15.576116-10:00 -157.821191 21.276545
4 4 1.469667e+09 2016-07-27T14:45:45.576116-10:00 -157.821431 21.276468
2016-09-05 09:45:06,005:INFO:removed 0 entries containing n/a
We did insert a recreated location matching the first point, and the recreated location does match the first point [-157.8204702, 21.276773]
and the start_loc
matches it as well.
2016-09-05 09:45:06,040:DEBUG:Inserting entry Entry({'data': Recreatedlocation(
{'loc': {"coordinates": [-157.82047019999999, 21.276772999999999], "type": "Point"},
'idx': 0,
'fmt_time': '2016-07-27T14:43:45.576116-10:00',
'altitude': 34.9000244140625,
'ts': 1469666625.5761163,
'longitude': -157.8204702,
'mode': 2,
'section': ObjectId('57cda112f6858f4c2989adab'),
'latitude': 21.276773,
'_id': ObjectId('57cda112f6858f4c2989adac'),
'key': 'analysis/recreated_location'})}) into timeseries
But the start_place location is [-157.8223594, 21.2755391]
. Is this an extrapolation boo-boo? Too little extrapolation? The opposite of (82b03b7470fe6ca6ad8de8a5b20ad280c4d98dd7)?
Hmmm, before resampling, we had this dataframe. so the lat/lng for the first point is unchanged.
2016-09-05 09:45:05,954:DEBUG:Resampling data of size 10
2016-09-05 09:45:05,959:DEBUG:Resampling entry list fmt_time ts longitude latitude
0 2016-07-27T14:43:45.576116-10:00 1.469667e+09 -157.820470 21.276773
1 2016-07-27T14:46:01-10:00 1.469667e+09 -157.821555 21.276429
2 2016-07-27T14:46:05.949000-10:00 1.469667e+09 -157.821555 21.276429
3 2016-07-27T14:46:43-10:00 1.469667e+09 -157.821662 21.275970
4 2016-07-27T14:46:56.367000-10:00 1.469667e+09 -157.821769 21.275986 of size 10
Hm, when we extrapolated, we extended to [-157.8204702, 21.276773]
. But the start_place is at [-157.8223594, 21.2755391]
. Why didn't we try to extrapolate all the way there?
2016-09-05 09:45:05,920:DEBUG:Start place is 57cda0fff6858f4c2989a899 and corresponding location is 5799450cf311e474bfc6c893
2016-09-05 09:45:05,933:DEBUG:Adding distance 118.721953624 to original 177.498418819 to extend section start from [-157.821555, 21.2764293] to [-157.8204702, 21.276773]
2016-09-05 09:45:05,937:DEBUG:After subtracting time 135.423883626 from original 175.39199996 to cover additional distance 118.721953624 at speed 0.876669243602, new_start_ts = 1469666625.58
Here's the raw start place. Aaaand, it is at [-157.8204702, 21.276773]
. So why did we change the cleaned place to have a different location?
2016-09-05 09:44:48,030:DEBUG:Saving entry Entry({'data': Rawplace(
{'enter_fmt_time': u'2016-07-27T13:23:27-10:00',
'ending_trip': ObjectId('57cda0fff6858f4c2989a898'),
'starting_trip': ObjectId('57cda0fff6858f4c2989a89a'),
'duration': 4928.934000015259,
'exit_fmt_time': u'2016-07-27T14:45:35.934000-10:00',
'source': 'DwellSegmentationTimeFilter',
'location': {"coordinates": [-157.8204702, 21.276773], "type": "Point"},
'enter_ts': 1469661807,
'exit_ts': 1469666735.934}),
'_id': ObjectId('57cda0fff6858f4c2989a899'),
'key': 'segmentation/raw_place'})}) into timeseries
Aha! It's because we link the raw place to a cleaned place. And the location of the cleaned place is [-157.8208665, 21.2770023]
.
It must be within 100m or so, but the difference is noticeable.
2016-09-05 09:45:14,032:DEBUG:for trip Entry({'data': Cleanedtrip(
{'distance': 278.01309740105614,
'raw_trip': ObjectId('57cda0fff6858f4c2989a89a'),
'start_loc': {u'type': u'Point', u'coordinates': [-157.8204702, 21.276773]},
u'end_ts': 1469666936.392,
'start_ts': 1469666625.5761163,
'start_fmt_time': '2016-07-27T14:43:45.576116-10:00',
u'end_loc': {u'type': u'Point', u'coordinates': [-157.8223594, 21.2755391]},
u'source': u'DwellSegmentationTimeFilter',
u'end_fmt_time': u'2016-07-27T14:48:56.392000-10:00',
'_id': ObjectId('57cda111f6858f4c2989ada9'),
'key': 'analysis/cleaned_trip'})}) start,
converting Entry({'data': Cleanedplace({u'enter_fmt_time': u'2016-07-27T13:09:51.585000-10:00',
'display_name': u'Ala Wai Boulevard, Honolulu',
'ending_trip': ObjectId('57cda111f6858f4c2989ad8c'),
u'source': u'DwellSegmentationTimeFilter',
u'location': {u'type': u'Point', u'coordinates': [-157.8208665, 21.2770023]},
u'enter_ts': 1469660991.585,
'raw_places': [ObjectId('57cda0fff6858f4c2989a897'), ObjectId('57cda0fff6858f4c2989a897')]}),
'_id': ObjectId('57cda11af6858f4c2989af54'),
'key': 'analysis/cleaned_place'})}) to
Entry({u'data': {u'enter_fmt_time': u'2016-07-27T13:23:27-10:00',
u'exit_fmt_time': u'2016-07-27T14:45:35.934000-10:00',
u'ending_trip': ObjectId('57cda0fff6858f4c2989a898'),
u'starting_trip': ObjectId('57cda0fff6858f4c2989a89a'),
u'source': u'DwellSegmentationTimeFilter',
u'location': {u'type': u'Point', u'coordinates': [-157.8204702, 21.276773]},
u'enter_ts': 1469661807,
u'duration': 4928.934000015259,
u'exit_ts': 1469666735.934},
u'_id': ObjectId('57cda0fff6858f4c2989a899'),
u'key': u'segmentation/raw_place'}})
We can fix this by:
If we go with (1), we will end up with a gap at the end of the previous trip, so that's out.
If we go with (2), we need to fix a bunch of other things as well: a) trip start time b) section start time and location c) add a new recreated location at the new section start time
Once we do (2) and (a), (b) is not that hard. But (a) is hard because we have to essentially extrapolate (again) and then we have to handle over extrapolation and so on, which seems like a huge hassle.
I don't want to have to keep extrapolating all the time. And it is not even clear that extrapolation is correct. These are two places that were supposed to be the same, and it only noise that one of them was picked over the other.
So to make life easier, we will do something simple for (a). We will change the trip start time by 30 secs. This allows us to add a new recreated location 30 secs away and be consistent with everything else. And a fuzz factor of 30 secs is unlikely to be noticeable or make a huge difference wrt trip start times.
Looking at the trip with the gap at the end (57cda114f6858f4c2989ae8c)
We filter a bunch of points. Should check if that is legit, seems like a lot.
2016-09-05 09:45:08,848:DEBUG:deleting 13 points from section points
2016-09-05 09:45:08,858:DEBUG:Found 42 results
Extrapolated, but don't care.
2016-09-05 09:45:08,866:DEBUG:Found first section, need to extrapolate start point
2016-09-05 09:45:08,868:DEBUG:Start place is 57cda100f6858f4c2989a8ab and corresponding location is 57997fadf311e474bfc74660
2016-09-05 09:45:08,901:DEBUG:Adding distance 157.281199244 to original 813.870168546 to extend section start from [-157.8045337, 21.2631457] to [-157.8059752, 21.2635885]
2016-09-05 09:45:08,905:DEBUG:After subtracting time 241.597257482 from original 801.0 to cover additional distance 157.281199244 at speed 0.651005731123, new_start_ts = 1469675860.4
Last reconstructed location is [-157.7998833, 21.2663029]
which matches the last point. No n/a entries were removed.
2016-09-05 09:45:09,057:INFO:removed 0 entries containing n/a
....
2016-09-05 09:45:09,260:DEBUG:Inserting entry Entry({'data':
Recreatedlocation({'loc': {"coordinates": [-157.7998833, 21.266302899999999], "type": "Point"},
'idx': 35,
'fmt_time': '2016-07-27T17:35:03-10:00',
'altitude': 90.79998779296875,
'ts': 1469676903.0,
'longitude': -157.7998833,
'mode': 2,
'section': ObjectId('57cda115f6858f4c2989ae8e'),
'latitude': 21.2663029,
'speed': 1.641492966737403,
'heading': -32.37982516407363,
'distance': 37.093239049794626}),
'_id': ObjectId('57cda115f6858f4c2989aeb2'),
'key': 'analysis/recreated_location'})}) into timeseries
Raw section, trip and end place.
2016-09-05 09:45:08,780:DEBUG:Starting with element of type trip, id 57cda104f6858f4c2989a8d5, details Entry({u'data': {u'sensed_mode': 2,
u'start_loc': {u'type': u'Point', u'coordinates': [-157.8061887, 21.2624899]},
u'end_ts': 1469677096.611,
u'start_ts': 1469676085.625,
u'start_fmt_time': u'2016-07-27T17:21:25.625000-10:00',
u'end_loc': {u'type': u'Point', u'coordinates': [-157.8001479, 21.2678968]},
u'source': u'SmoothedHighConfidenceMotion',
u'end_fmt_time': u'2016-07-27T17:38:16.611000-10:00',
u'duration': 1010.9860000610352,
u'trip_id': ObjectId('57cda100f6858f4c2989a8ac'),
u'_id': ObjectId('57cda104f6858f4c2989a8d5'),
u'key': u'segmentation/raw_section'}})
2016-09-05 09:45:08,729:DEBUG:Filtering trip Entry({u'data': {u'distance': 867.9314413785075,
u'end_place': ObjectId('57cda100f6858f4c2989a8ad'),
u'start_loc': {u'type': u'Point', u'coordinates': [-157.8061887, 21.2624899]},
u'end_ts': 1469677096.611,
u'start_ts': 1469676085.625,
u'start_fmt_time': u'2016-07-27T17:21:25.625000-10:00',
u'end_loc': {u'type': u'Point', u'coordinates': [-157.8001479, 21.2678968]},
u'source': u'DwellSegmentationTimeFilter',
u'start_place': ObjectId('57cda100f6858f4c2989a8ab'),
u'end_fmt_time': u'2016-07-27T17:38:16.611000-10:00',
u'_id': ObjectId('57cda100f6858f4c2989a8ac'),
u'key': u'segmentation/raw_trip'}})
2016-09-05 09:44:48,510:DEBUG:Saving entry Entry({'data': Rawplace({'enter_fmt_time': u'2016-07-27T17:38:16.611000-10:00',
'ending_trip': ObjectId('57cda100f6858f4c2989a8ac'),
'starting_trip': ObjectId('57cda100f6858f4c2989a8ae'),
'duration': 486.38899993896484,
'exit_fmt_time': u'2016-07-27T17:46:23-10:00',
'source': 'DwellSegmentationTimeFilter',
'location': {"coordinates": [-157.8001479, 21.2678968], "type": "Point"},
'enter_ts': 1469677096.611,
'exit_ts': 1469677583}),
'_id': ObjectId('57cda100f6858f4c2989a8ad'),
'key': 'segmentation/raw_place'})}) into timeseries
Ok so it looks like this is an outcome of the smoothing. The original trip and section ended at 17:38. But the cleaned trip ends at 17:35, probably because we smoothed the trips at the end. Note that we smoothed 13/42 points. The enter time of the raw place is also 17:38 and I am not sure we fix the enter time of the cleaned place
2016-09-05 09:45:16,899:DEBUG:Found squished trip, linking raw start place 57cda100f6858f4c2989a8ad to new cleaned place 57cda11cf6858f4c2989af5c
2016-09-05 09:45:18,393:DEBUG:Inserting entry Entry({'data': Cleanedplace(
{u'enter_fmt_time': u'2016-07-27T17:38:16.611000-10:00',
'display_name': u'Diamond Head Road, Honolulu',
'exit_fmt_time': '2016-07-27T17:53:26.725442-10:00',
'ending_trip': ObjectId('57cda114f6858f4c2989ae8c'),
'starting_trip': ObjectId('57cda115f6858f4c2989aeb3'),
u'source': u'DwellSegmentationTimeFilter',
u'location': {u'type': u'Point', u'coordinates': [-157.8001479, 21.2678968]},
u'enter_ts': 1469677096.611,
'duration': 910.11444211006165,
'raw_places': [ObjectId('57cda100f6858f4c2989a8ad'),
ObjectId('57cda100f6858f4c2989a8ad')],
'exit_ts': 1469678006.7254422}),
'_id': ObjectId('57cda11cf6858f4c2989af5c'),
'key': 'analysis/cleaned_place'})}) into timeseries
Yup, that's it!
So now, how to fix it.
That's a bit more complicated. It depends on whether the next place is valid or not. If it is valid, we should extrapolate points to it If it is not valid, we should move it to the end of the section.
How do we know whether it is valid or not?
I think that we can make the assumption that a place is unlikely to be invalid. This is because for trip segmentation, we are not as blase as section segmentation. We either segment when there have been a bunch of points at the same location or if we have a large distance change at a slow speed.
Neither of these matches the error models that we have seen so far wrt jumps. While it is theoretically possible to have several points in the bad cluster, we have typically seen 3-4 (not 10) points in a bad/jumped cluster. And while jumps have large distance changes, they are typically at fast speeds (that's why they are called jumps).
So I claim that, given the error model for trip segmentation, it is unlikely that a place is incorrect. So if the last section has the last points filtered, we should just extrapolate them just like we do the start.
Note that in this case, although the last few points are stripped.
2016-09-05 09:44:55,709:INFO:^^^^^^^^^^^^^^^^^^^^Smoothing section 57cda104f6858f4c2989a8d5 for user 0763de67-f61e-3f5d-90e7-518e69793954^^^^^^^^^^^^^^^^^^^^
2016-09-05 09:44:55,709:DEBUG:filter_jumps(0763de67-f61e-3f5d-90e7-518e69793954, 57cda104f6858f4c2989a8d5) called
2016-09-05 09:44:55,746:DEBUG:curr_query = {'$or': [{'metadata.key': 'background/filtered_location'}], 'user_id': UUID('0763de67-f61e-3f5d-90e7-518e69793954'), 'data.ts': {'$lte': 1469677096.611, '$gte': 1469676085.625}}, sort_key = data.ts
2016-09-05 09:44:55,746:DEBUG:orig_ts_db_keys = ['background/filtered_location'], analysis_ts_db_keys = []
2016-09-05 09:44:55,746:DEBUG:finished querying values for ['background/filtered_location']
2016-09-05 09:44:55,746:DEBUG:finished querying values for []
2016-09-05 09:44:55,762:DEBUG:Found 42 results
2016-09-05 09:44:55,764:DEBUG:After de-duping, converted 42 points to 42
2016-09-05 09:44:55,764:DEBUG:len(section_points_df) = 42
2016-09-05 09:44:55,782:DEBUG:section_points_df.shape = (42, 22), with_speeds_df.shape = (42, 25)
2016-09-05 09:44:55,802:DEBUG:quartile values are 0.25 0.00000
0.75 2.28429
Name: speed, dtype: float64
2016-09-05 09:44:55,802:DEBUG:iqr 2.2842895374
2016-09-05 09:44:55,802:DEBUG:maxSpeed = 9.13715814959
2016-09-05 09:44:55,805:DEBUG:smoothing: last_point index 42 not in found points Int64Index([0, 1, 23, 24, 31], dtype='int64')
2016-09-05 09:44:55,805:DEBUG:smoothing: added new entry 42
2016-09-05 09:44:55,806:DEBUG:For cluster 0 - 1, distance = 0.0, is_cluster = True
2016-09-05 09:44:55,807:DEBUG:For cluster 1 - 23, distance = 558.664905843, is_cluster = False
2016-09-05 09:44:55,811:DEBUG:For cluster 23 - 24, distance = 0.0, is_cluster = True
2016-09-05 09:44:55,812:DEBUG:For cluster 24 - 31, distance = 101.43061836, is_cluster = False
2016-09-05 09:44:55,813:DEBUG:For cluster 31 - 42, distance = 16.6414659154, is_cluster = True
2016-09-05 09:44:55,813:DEBUG:smoothing: segment_list = [Segment(0, 1, 0.0), Segment(1, 23, 558.664905843), Segment(23, 24, 0.0), Segment(24, 31, 101.43061836), Segment(31, 42, 16.6414659154)]
2016-09-05 09:44:55,819:DEBUG:bad_segments = [Segment(0, 1, 0.0), Segment(23, 24, 0.0), Segment(31, 42, 16.6414659154)]
They are actually a cluster because of the end of the trip, and the place is actually valid.
Looking at the last raw trip (57cda100f6858f4c2989a8b0). Recall that mismatch is
[-157.8001479, 21.2678968]
versus [-157.8016326, 21.2695113]
Raw objects. Note that the raw place location is the same as the first point
[-157.8016326, 21.2695113]
.
2016-09-05 09:44:48,528:DEBUG:Inserting entry Entry({'data': Rawtrip({'end_place': ObjectId('57cda100f6858f4c2989a8b1'),
'start_ts': 1469678014,
'start_place': ObjectId('57cda100f6858f4c2989a8af'),
'duration': 952,
'distance': 2315.9019817564267,
'start_loc': {"coordinates": [-157.8019085, 21.2695518], "type": "Point"},
'end_ts': 1469678966,
'start_fmt_time': u'2016-07-27T17:53:34-10:00',
'end_loc': {"coordinates": [-157.8232634, 21.275698], "type": "Point"},
'source': 'DwellSegmentationTimeFilter',
'end_fmt_time': u'2016-07-27T18:09:26-10:00',
'_id': ObjectId('57cda100f6858f4c2989a8b0'),
'key': 'segmentation/raw_trip'})}) into timeseries
2016-09-05 09:44:48,549:DEBUG:Saving entry Entry({'data': Rawplace({'enter_fmt_time': u'2016-07-27T17:47:59-10:00',
'ending_trip': ObjectId('57cda100f6858f4c2989a8ae'),
'starting_trip': ObjectId('57cda100f6858f4c2989a8b0'),
'duration': 335,
'exit_fmt_time': u'2016-07-27T17:53:34-10:00',
'source': 'DwellSegmentationTimeFilter',
'location': {"coordinates": [-157.8016326, 21.2695113], "type": "Point"},
'enter_ts': 1469677679,
'exit_ts': 1469678014}),
'_id': ObjectId('57cda100f6858f4c2989a8af'),
'key': 'segmentation/raw_place'})}) into timeseries
2016-09-05 09:44:52,666:DEBUG:Inserting entry Entry({'data': Section({'sensed_mode': 0,
'start_loc': {"coordinates": [-157.8019085, 21.2695518], "type": "Point"},
'end_ts': 1469678966,
'start_ts': 1469678014,
'start_fmt_time': u'2016-07-27T17:53:34-10:00',
'end_loc': {"coordinates": [-157.8232634, 21.275698], "type": "Point"},
'source': 'SmoothedHighConfidenceMotion',
'end_fmt_time': u'2016-07-27T18:09:26-10:00',
'duration': 952,
'trip_id': ObjectId('57cda100f6858f4c2989a8b0'),
'_id': ObjectId('57cda104f6858f4c2989a8d6'),
'key': 'segmentation/raw_section'})}) into timeseries
2016-09-05 09:44:55,876:INFO:^^^^^^^^^^^^^^^^^^^^Smoothing section 57cda104f6858f4c2989a8d6 for user 0763de67-f61e-3f5d-90e7-518e69793954^^^^^^^^^^^^^^^^^^^^
2016-09-05 09:44:55,949:DEBUG:Found 38 results
2016-09-05 09:44:56,014:DEBUG:smoothing: segment_list = [Segment(0, 38, 2315.90198176)]
2016-09-05 09:44:56,014:INFO:No jumps, nothing to filter
2016-09-05 09:44:56,015:DEBUG:deleted 0 points
First recreated location is consistent with the trip start loc and the start place location
2016-09-05 09:45:09,621:DEBUG:Inserting entry Entry({'data':
Recreatedlocation({'loc': {"coordinates": [-157.8016326, 21.269511300000001], "type": "Point"},
'idx': 0,
'fmt_time': '2016-07-27T17:53:26.725442-10:00',
'ts': 1469678006.7254422,
'longitude': -157.8016326,
'mode': 0,
'section': ObjectId('57cda115f6858f4c2989aeb5'),
'latitude': 21.2695113,
'_id': ObjectId('57cda115f6858f4c2989aeb6'),
'key': 'analysis/recreated_location'})}) into timeseries
2016-09-05 09:45:16,899:DEBUG:for trip Entry({'data': Cleanedtrip({'distance': 3794.0904784587578,
'raw_trip': ObjectId('57cda100f6858f4c2989a8b0'),
'start_loc': {u'type': u'Point', u'coordinates': [-157.8016326, 21.2695113]},
u'end_ts': 1469678966.0,
'start_ts': 1469678006.7254422,
'start_fmt_time': '2016-07-27T17:53:26.725442-10:00',
u'end_loc': {u'type': u'Point', u'coordinates': [-157.8232634, 21.275698]},
u'source': u'DwellSegmentationTimeFilter',
u'end_fmt_time': u'2016-07-27T18:09:26-10:00',
'duration': 959.2745578289032,
'_id': ObjectId('57cda115f6858f4c2989aeb3'),
'key': 'analysis/cleaned_trip'})}) start,
converting Entry({'data': Cleanedplace(
{u'enter_fmt_time': u'2016-07-27T17:38:16.611000-10:00',
'display_name': u'Diamond Head Road, Honolulu',
'ending_trip': ObjectId('57cda114f6858f4c2989ae8c'),
u'source': u'DwellSegmentationTimeFilter',
u'location': {u'type': u'Point', u'coordinates': [-157.8001479, 21.2678968]},
u'enter_ts': 1469677096.611,
'raw_places': [ObjectId('57cda100f6858f4c2989a8ad'),
ObjectId('57cda100f6858f4c2989a8ad')]}),
'_id': ObjectId('57cda11cf6858f4c2989af5c'),
'key': 'analysis/cleaned_place'})})
to Entry({u'data': {u'enter_fmt_time': u'2016-07-27T17:47:59-10:00',
u'exit_fmt_time': u'2016-07-27T17:53:34-10:00',
u'ending_trip': ObjectId('57cda100f6858f4c2989a8ae'),
u'starting_trip': ObjectId('57cda100f6858f4c2989a8b0'),
u'source': u'DwellSegmentationTimeFilter',
u'location': {u'type': u'Point', u'coordinates': [-157.8016326, 21.2695113]},
u'enter_ts': 1469677679,
u'duration': 335,
u'exit_ts': 1469678014},
u'_id': ObjectId('57cda100f6858f4c2989a8af'),
u'key': u'segmentation/raw_place'}})
So this is the same as the first case (https://github.com/e-mission/e-mission-server/issues/385#issuecomment-244793203), except that in that case, the distance between the points was under the threshold, but here is was above. Let's take a quick check to see why the previous trip was stripped out.
>>> ecc.calDistance([-157.8208665, 21.2770023], [-157.8204702, 21.276773])
48.33483462934436
>>> ecc.calDistance([-157.8001479, 21.2678968], [-157.8016326, 21.2695113])
236.42702797373335
Because it was a zero section trip
2016-09-05 09:45:16,899:DEBUG:Found squished trip, linking raw start place 57cda100f6858f4c2989a8ad to new cleaned place 57cda11cf6858f4c2989af5c
2016-09-05 09:45:09,315:DEBUG:keys = (segmentation/raw_stop, segmentation/raw_section), len(places) = 0, len(trips) = 0
2016-09-05 09:45:09,315:DEBUG:places_or_stops = []
2016-09-05 09:45:09,315:DEBUG:trips_or_sections = []
2016-09-05 09:45:09,315:DEBUG:Starting with element of type none, id None, details None
2016-09-05 09:45:09,316:INFO:Found zero section trip Entry({u'data':
{u'distance': 31.60031422794774,
u'end_place': ObjectId('57cda100f6858f4c2989a8af'),
u'start_loc': {u'type': u'Point', u'coordinates': [-157.8013582, 21.2696353]},
u'end_ts': 1469677679,
u'start_ts': 1469677583,
u'start_fmt_time': u'2016-07-27T17:46:23-10:00',
u'end_loc': {u'type': u'Point', u'coordinates': [-157.8016326, 21.2695113]},
u'source': u'DwellSegmentationTimeFilter',
u'start_place': ObjectId('57cda100f6858f4c2989a8ad'),
u'end_fmt_time': u'2016-07-27T17:47:59-10:00',
u'duration': 96,
u'_id': ObjectId('57cda100f6858f4c2989a8ae'),
u'key': u'segmentation/raw_trip'}})
Because there are only three motion activites in that stretch, and two of them are invalid (TILTING = 5
and UNKNOWN = 4
). The only valid one (BICYCLING = 1
) is low confidence. So we are in the case in which we have zero high confidence motions, which we claimed that we would not handle (https://github.com/e-mission/e-mission-server/issues/356#issuecomment-240630757) because we had not seen an example of a real trip with no valid motion activity points. Maybe it is time to revisit that decision...
2016-09-05 09:44:52,395:INFO:++++++++++++++++++++Processing trip 57cda100f6858f4c2989a8ae for user 0763de67-f61e-3f5d-90e7-518e69793954++++++++++++++++++++
2016-09-05 09:44:52,468:DEBUG:After de-duping, converted 3 points to 3
2016-09-05 09:44:52,482:DEBUG:motion_df = _id confidence fmt_time \
0 57998db3f311e474bfc7a207 42 2016-07-27T17:46:29.941000-10:00
1 57998db3f311e474bfc7a208 100 2016-07-27T17:46:30.088000-10:00
2 57998db3f311e474bfc7a20e 60 2016-07-27T17:47:30.090000-10:00
type user_id
1 0763de67-f61e-3f5d-90e7-518e69793954
5 0763de67-f61e-3f5d-90e7-518e69793954
4 0763de67-f61e-3f5d-90e7-518e69793954
One of the side effects of #378 was that we have a short gap between the start of the trip and the first point right after the reboot. This does not happen for trips that are not associated with a reboot. It happens even for trips that consist of a single section.