e-mission / e-mission-docs

Repository for docs and issues. If you need help, please file an issue here. Public conversations are better for open source projects than private email.
https://e-mission.readthedocs.io/en/latest
BSD 3-Clause "New" or "Revised" License
15 stars 34 forks source link

First trip after reboot has a gap between the start and the first point #209

Closed shankari closed 5 years ago

shankari commented 8 years ago

One of the side effects of #378 was that we have a short gap between the start of the trip and the first point right after the reboot. This does not happen for trips that are not associated with a reboot. It happens even for trips that consist of a single section.

Gap section Non-gap section
simulator screen shot sep 4 2016 9 42 25 pm simulator screen shot sep 4 2016 9 43 24 pm
shankari commented 8 years ago

Let's compare the behavior of the last trip on the 22nd (works) with the first trip on the 25th (doesn't work)

Last trip on the 22nd

TRIP_SEGMENTATION

2016-09-04 21:25:08,764:DEBUG:------------------------------2016-07-22T17:13:50.187000-07:00------------------------------
...
2016-09-04 21:25:09,570:DEBUG:Inserting entry Entry({'data': Rawtrip({'end_place': ObjectId('57ccf3a5f6858fc9f48d5679'),
'start_ts': 1469232830.187,
'start_place': ObjectId('57ccedabf6858fbfb2338210'),
'duration': 465.8129999637604,
'distance': 587.4060548346308,
'start_loc': {"coordinates": [-122.2615781, 37.8700137], "type": "Point"},
'end_ts': 1469233296,
'start_fmt_time': u'2016-07-22T17:13:50.187000-07:00',
'end_loc': {"coordinates": [-122.26827, 37.8700362], "type": "Point"},
'source': 'DwellSegmentationTimeFilter',
'end_fmt_time': u'2016-07-22T17:21:36-07:00',
'_id': ObjectId('57ccf3a5f6858fc9f48d5678'),
'key': 'segmentation/raw_trip'})}) into timeseries

SECTION_SEGMENTATION

2016-09-04 21:25:09,763:INFO:++++++++++++++++++++Processing trip 57ccf3a5f6858fc9f48d5678 for user 0763de67-f61e-3f5d-90e7-518e69793954++++++++++++++++++++
2016-09-04 21:25:09,821:DEBUG:curr_query = {'$or': [{'metadata.key': 'background/motion_activity'}], 'user_id': UUID('0763de67-f61e-3f5d-90e7-518e69793954'), 'data.ts': {'$lte': 1469233296, '$gte': 1469232830.187}}, sort_key = data.ts
2016-09-04 21:25:09,829:DEBUG:curr activity = Motionactivity({'fmt_time': u'2016-07-22T17:14:03.083000-07:00', '_id': ObjectId('5793f338abbf384cadf90b4d'), u'type': 2})
...
2016-09-04 21:25:09,848:DEBUG:motion_df =                         _id  confidence                          fmt_time  \
0  5793f338abbf384cadf90b4d          85  2016-07-22T17:14:03.083000-07:00   
...
2016-09-04 21:25:09,849:DEBUG:At 2016-07-22T17:14:03.083000-07:00, retained existing activity MotionTypes.ON_FOOT because of no change
2016-09-04 21:25:09,872:DEBUG:Considering MotionTypes.ON_FOOT from 2016-07-22T17:14:03.083000-07:00 -> 2016-07-22T17:20:45.172000-07:00
...
2016-09-04 21:25:09,935:DEBUG:Inserting entry Entry({'data': Section({'sensed_mode': 2,
'start_loc': {"coordinates": [-122.2615781, 37.8700137], "type": "Point"},
'end_ts': 1469233296,
'start_ts': 1469232830.187,
'start_fmt_time': u'2016-07-22T17:13:50.187000-07:00',
'end_loc': {"coordinates": [-122.26827, 37.8700362], "type": "Point"},
'source': 'SmoothedHighConfidenceMotion',
'end_fmt_time': u'2016-07-22T17:21:36-07:00',
'_id': ObjectId('57ccf3a5f6858fc9f48d5680'),
'key': 'segmentation/raw_section'})}) into timeseries

JUMP_SMOOTHING

2016-09-04 21:25:10,187:INFO:^^^^^^^^^^^^^^^^^^^^Smoothing section 57ccf3a5f6858fc9f48d5680 for user 0763de67-f61e-3f5d-90e7-518e69793954^^^^^^^^^^^^^^^^^^^^
2016-09-04 21:25:10,229:DEBUG:Found 18 results
2016-09-04 21:25:10,266:DEBUG:For cluster 0 - 18, distance = 587.406054835, is_cluster = False
2016-09-04 21:25:10,266:DEBUG:smoothing: segment_list = [Segment(0, 18, 587.406054835)]
2016-09-04 21:25:10,266:DEBUG:After splitting, segment list is [Segment(0, 18, 587.406054835)] with size 1
2016-09-04 21:25:10,266:INFO:No jumps, nothing to filter
2016-09-04 21:25:10,268:DEBUG:deleted 0 points

CLEAN_RESAMPLING

2016-09-04 21:25:10,446:INFO:For stage PipelineStages.CLEAN_RESAMPLING, start_ts = 2016-07-23T00:05:57.818000
2016-09-04 21:25:10,543:DEBUG:Filtering trip Entry({u'data': {u'start_loc': {u'type': u'Point', u'coordinates': [-122.2615781, 37.8700137]},
u'start_ts': 1469232830.187,
u'start_fmt_time': u'2016-07-22T17:13:50.187000-07:00',
u'start_place': ObjectId('57ccedabf6858fbfb2338210'),
u'end_fmt_time': u'2016-07-22T17:21:36-07:00',
u'key': u'segmentation/raw_trip'}})

2016-09-04 21:25:10,586:DEBUG:Getting filtered points for section Entry({u'data': {'start_fmt_time': u'2016-07-22T17:13:50.187000-07:00', '_id': ObjectId('57ccf3a5f6858fc9f48d5680'), 'key': u'segmentation/raw_section'}})

2016-09-04 21:25:10,627:DEBUG:curr_query = {'$or': [{'metadata.key': 'background/filtered_location'}], 'user_id': UUID('0763de67-f61e-3f5d-90e7-518e69793954'), 'data.ts': {'$lte': 1469233296, '$gte': 1469232830.187}}, sort_key = data.ts
2016-09-04 21:25:10,631:DEBUG:deleting 0 points from section points
2016-09-04 21:25:10,635:DEBUG:Found 18 results

2016-09-04 21:25:10,639:DEBUG:After filtering, rows went from 18 -> 18, cols from 22 -> 21
2016-09-04 21:25:10,640:DEBUG:Found first section, need to extrapolate start point
2016-09-04 21:25:10,641:DEBUG:Start place is 57ccedabf6858fbfb2338210 and corresponding location is 5792bb92abbf384cadf84cf8
2016-09-04 21:25:10,654:DEBUG:Adding distance 134.814056604 to original 810.720768052 to extend section start from [-122.2615781, 37.8700137] to [-122.2602491, 37.8706214]
2016-09-04 21:25:10,656:DEBUG:After subtracting time 91.1670378874 from original 465.812999964 to cover additional distance 134.814056604 at speed 1.47875876773, new_start_ts = 1469232739.02

2016-09-04 21:25:10,672:DEBUG:Setting section start values of Cleanedsection({u'source': u'SmoothedHighConfidenceMotion', 'trip_id': ObjectId('57ccf3a6f6858fc9f48d5686')}) to first point
fmt_time                                 2016-07-22T17:12:19.019962-07:00
latitude                                                          37.8706
longitude                                                         -122.26
ts                                                            1.46923e+09

2016-09-04 21:25:10,678:DEBUG:Resampling entry list                            fmt_time            ts   longitude   latitude
0  2016-07-22T17:12:19.019962-07:00  1.469233e+09 -122.260249  37.870621
1  2016-07-22T17:13:50.187000-07:00  1.469233e+09 -122.261578  37.870014
2         2016-07-22T17:14:08-07:00  1.469233e+09 -122.261491  37.870212
3  2016-07-22T17:14:26.195000-07:00  1.469233e+09 -122.261797  37.869955
4         2016-07-22T17:14:50-07:00  1.469233e+09 -122.261927  37.870569 of size 19

2016-09-04 21:25:10,682:DEBUG:After resampling, using 20 points from 1469232739.02 -> 1469233296.0
2016-09-04 21:25:10,733:DEBUG:with_speeds_df =    idx            ts                          fmt_time   longitude   latitude
0    0  1.469233e+09  2016-07-22T17:12:19.019962-07:00 -122.260249  37.870621
1    1  1.469233e+09  2016-07-22T17:12:49.019962-07:00 -122.260686  37.870421
2    2  1.469233e+09  2016-07-22T17:13:19.019962-07:00 -122.261124  37.870221
3    3  1.469233e+09  2016-07-22T17:13:49.019962-07:00 -122.261561  37.870021
4    4  1.469233e+09  2016-07-22T17:14:19.019962-07:00 -122.261676  37.870056
2016-09-04 21:25:10,735:INFO:removed 0 entries containing n/a

2016-09-04 21:25:10,759:DEBUG:Inserting entry Entry({'data': Recreatedlocation({
'loc': {"coordinates": [-122.2602491, 37.870621399999997], "type": "Point"},
'idx': 0,
'fmt_time': '2016-07-22T17:12:19.019962-07:00',
'ts': 1469232739.019962,
'_id': ObjectId('57ccf3a6f6858fc9f48d5689'),
'key': 'analysis/recreated_location'})}) into timeseries

2016-09-04 21:25:10,834:DEBUG:Copying start_ts 1469232739.02, start_fmt_time 2016-07-22T17:12:19.019962-07:00, start_local_dt from 57ccf3a6f6858fc9f48d5688 to Cleanedtrip

2016-09-04 21:25:10,835:DEBUG:Starting with element of type trip,
id 57ccf3a6f6858fc9f48d5688,
details Entry({'data': Cleanedsection(
{'start_loc': {u'type': u'Point', u'coordinates': [-122.2602491, 37.8706214]},
'end_ts': 1469233296.0,
'start_ts': 1469232739.0199621,
'start_fmt_time': '2016-07-22T17:12:19.019962-07:00',
'end_loc': {u'type': u'Point', u'coordinates': [-122.26827, 37.8700362]},
'sensed_mode': 2,
u'source': u'SmoothedHighConfidenceMotion',
'end_fmt_time': u'2016-07-22T17:21:36-07:00',
'_id': ObjectId('57ccf3a6f6858fc9f48d5688'),
'key': 'analysis/cleaned_section'})})

2016-09-04 21:25:11,308:DEBUG:for trip Entry({'data': Cleanedtrip({
'start_loc': {u'type': u'Point', u'coordinates': [-122.2602491, 37.8706214]},
'end_ts': 1469233296.0,
'start_ts': 1469232739.0199621,
'start_fmt_time': '2016-07-22T17:12:19.019962-07:00',
u'end_loc': {u'type': u'Point', u'coordinates': [-122.26827, 37.8700362]},
u'source': u'DwellSegmentationTimeFilter',
u'end_fmt_time': u'2016-07-22T17:21:36-07:00',
'_id': ObjectId('57ccf3a6f6858fc9f48d5686'),
'key': 'analysis/cleaned_trip'})}) start,

First trip on the 25th

TRIP_SEGMENTATION

2016-09-04 21:25:09,023:DEBUG:------------------------------2016-07-25T17:12:27.853000-07:00------------------------------
2016-09-04 21:25:09,029:DEBUG:prev_point.ts = 1469233379.04, curr_point.ts = 1469491947.85, threshold = 300, large gap = 258568.811, ending trip
2016-09-04 21:25:09,030:INFO:Found trip end at 2016-07-22T17:21:36-07:00
2016-09-04 21:25:09,030:DEBUG:Setting new trip start point AttrDict({
u'loc': {u'type': u'Point', u'coordinates': [-122.0908901, 37.392124]},
u'ts': 1469491947.853,
u'latitude': 37.392124,
u'fmt_time': u'2016-07-25T17:12:27.853000-07:00',
'idx': 22,
u'longitude': -122.0908901,
'_id': ObjectId('5796b24fabbf384cadfa37e2')) with idx 22

2016-09-04 21:25:09,635:DEBUG:On android, found restart events                         _id  curr_state                          fmt_time  \
0  5793f338abbf384cadf90ba9           3  2016-07-22T17:33:54.634000-07:00   
1  5796b24fabbf384cadfa37d9           3  2016-07-25T15:35:33.679000-07:00   
2016-09-04 21:25:09,635:DEBUG:tracking has been restarted, returning True

2016-09-04 21:25:09,670:DEBUG:Inserting entry Entry({'data': Rawtrip({
'end_place': ObjectId('57ccf3a5f6858fc9f48d567b'),
'start_ts': 1469491947.853,
'start_place': ObjectId('57ccf3a5f6858fc9f48d567d'),
'start_loc': {"coordinates": [-122.0908901, 37.392124], "type": "Point"},
'end_ts': 1469492294.841,
'start_fmt_time': u'2016-07-25T17:12:27.853000-07:00',
'end_loc': {"coordinates": [-122.09519, 37.3915317], "type": "Point"},
'source': 'DwellSegmentationTimeFilter',
'end_fmt_time': u'2016-07-25T17:18:14.841000-07:00',
'_id': ObjectId('57ccf3a5f6858fc9f48d567a'),
'key': 'segmentation/raw_trip'})}) into timeseries

SECTION_SEGMENTATION

2016-09-04 21:25:09,936:INFO:++++++++++++++++++++Processing trip 57ccf3a5f6858fc9f48d567a for user 0763de67-f61e-3f5d-90e7-518e69793954++++++++++++++++++++
2016-09-04 21:25:10,009:DEBUG:motion_df =                         _id  confidence                          fmt_time  \
0  5796b250abbf384cadfa37e5          80  2016-07-25T17:13:06.444000-07:00   
2016-09-04 21:25:10,010:DEBUG:At 2016-07-25T17:13:06.444000-07:00, retained existing activity MotionTypes.BICYCLING because of no change

2016-09-04 21:25:10,050:DEBUG:Inserting entry Entry({'data': Section({'sensed_mode': 1,
'start_loc': {"coordinates": [-122.0908901, 37.392124], "type": "Point"},
'end_ts': 1469492294.841,
'start_ts': 1469491947.853,
'start_fmt_time': u'2016-07-25T17:12:27.853000-07:00',
'end_loc': {"coordinates": [-122.09519, 37.3915317], "type": "Point"},
'source': 'SmoothedHighConfidenceMotion',
'end_fmt_time': u'2016-07-25T17:18:14.841000-07:00',
'_id': ObjectId('57ccf3a6f6858fc9f48d5681'),
'key': 'segmentation/raw_section'})}) into timeseries

JUMP_SMOOTHING

2016-09-04 21:25:10,271:INFO:^^^^^^^^^^^^^^^^^^^^Smoothing section 57ccf3a6f6858fc9f48d5681 for user 0763de67-f61e-3f5d-90e7-518e69793954^^^^^^^^^^^^^^^^^^^^
2016-09-04 21:25:10,271:DEBUG:filter_jumps(0763de67-f61e-3f5d-90e7-518e69793954, 57ccf3a6f6858fc9f48d5681) called
2016-09-04 21:25:10,314:DEBUG:Found 15 results
2016-09-04 21:25:10,352:DEBUG:smoothing: segment_list = [Segment(0, 15, 385.539610671)]
2016-09-04 21:25:10,353:INFO:No jumps, nothing to filter
2016-09-04 21:25:10,354:DEBUG:deleted 0 points

CLEAN_RESAMPLE

2016-09-04 21:25:10,838:DEBUG:Filtering trip Entry({u'data': {
u'start_loc': {u'type': u'Point', u'coordinates': [-122.0908901, 37.392124]},
u'end_ts': 1469492294.841,
u'start_ts': 1469491947.853,
u'start_fmt_time': u'2016-07-25T17:12:27.853000-07:00',
u'end_loc': {u'type': u'Point', u'coordinates': [-122.09519, 37.3915317]},
u'source': u'DwellSegmentationTimeFilter',
u'start_place': ObjectId('57ccf3a5f6858fc9f48d567d'),
u'end_fmt_time': u'2016-07-25T17:18:14.841000-07:00',
u'_id': ObjectId('57ccf3a5f6858fc9f48d567a'),
u'key': u'segmentation/raw_trip'}})

2016-09-04 21:25:10,872:DEBUG:Getting filtered points for section Entry({u'data': {
'sensed_mode': 1,
u'start_loc': {u'type': u'Point', u'coordinates': [-122.0908901, 37.392124]},
u'end_ts': 1469492294.841,
u'start_ts': 1469491947.853,
u'start_fmt_time': u'2016-07-25T17:12:27.853000-07:00',
u'end_loc': {u'type': u'Point', u'coordinates': [-122.09519, 37.3915317]},
u'source': u'SmoothedHighConfidenceMotion',
u'end_fmt_time': u'2016-07-25T17:18:14.841000-07:00',
u'_id': ObjectId('57ccf3a6f6858fc9f48d5681'),
u'key': u'segmentation/raw_section'}})

2016-09-04 21:25:10,907:DEBUG:curr_query = {'$or': [{'metadata.key': 'background/filtered_location'}], 'user_id': UUID('0763de67-f61e-3f5d-90e7-518e69793954'), 'data.ts': {'$lte': 1469492294.841, '$gte': 1469491947.853}}, sort_key = data.ts
2016-09-04 21:25:10,915:DEBUG:Found 15 results
2016-09-04 21:25:10,919:DEBUG:Found first section, need to extrapolate start point
2016-09-04 21:25:10,923:DEBUG:Start place is 57ccf3a5f6858fc9f48d567d and corresponding location is 5796b24fabbf384cadfa37e2
2016-09-04 21:25:10,936:DEBUG:Adding distance 0.0 to original 1030.89182392 to extend section start from [-122.0908901, 37.392124] to [-122.0908901, 37.392124]
2016-09-04 21:25:10,938:DEBUG:After subtracting time 0.0 from original 346.988000154 to cover additional distance 0.0 at speed 3.27101683125, new_start_ts = 1469491947.85

2016-09-04 21:25:10,952:DEBUG:Setting section start values of Cleanedsection({u'source': u'SmoothedHighConfidenceMotion', 'trip_id': ObjectId('57ccf3a6f6858fc9f48d569e')}) to first point
fmt_time                                 2016-07-25T17:12:27.853000-07:00
latitude                                                          37.3921
loc                     {u'type': u'Point', u'coordinates': [-122.0908...
longitude                                                        -122.091
ts                                                            1.46949e+09

2016-09-04 21:25:10,953:DEBUG:Resampling data of size 16
2016-09-04 21:25:10,957:DEBUG:Resampling entry list                            fmt_time            ts   longitude   latitude
0  2016-07-25T17:12:27.853000-07:00  1.469492e+09 -122.090890  37.392124
1  2016-07-25T17:12:27.853000-07:00  1.469492e+09 -122.090890  37.392124
2         2016-07-25T17:12:52-07:00  1.469492e+09 -122.092018  37.392394
3  2016-07-25T17:13:08.026000-07:00  1.469492e+09 -122.092310  37.392744
4         2016-07-25T17:13:36-07:00  1.469492e+09 -122.091964  37.392901 of size 16

2016-09-04 21:25:10,962:DEBUG:After resampling, using 13 points from 1469491947.85 -> 1469492294.84
2016-09-04 21:25:11,000:DEBUG:with_speeds_df =    idx            ts                          fmt_time   longitude   latitude
0    0  1.469492e+09  2016-07-25T17:12:27.853000-07:00         NaN        NaN
1    1  1.469492e+09  2016-07-25T17:12:57.853000-07:00 -122.092124  37.392522
2    2  1.469492e+09  2016-07-25T17:13:27.853000-07:00 -122.092065  37.392856
3    3  1.469492e+09  2016-07-25T17:13:57.853000-07:00 -122.091765  37.392278
4    4  1.469492e+09  2016-07-25T17:14:27.853000-07:00 -122.091654  37.391356
2016-09-04 21:25:11,002:INFO:removed -2 entries containing n/a
shankari commented 8 years ago

Aha! From https://github.com/e-mission/e-mission-server/issues/385#issuecomment-244668071, we can see the problem. Basically, if this is not a restart, then we join the section to the previous place, which means that we add an extrapolated start point (2016-07-25T17:12:27.853000-07:00 -> 2016-07-22T17:12:19.020000-07:00).

But in the case of a restart, we don't have the previous end place. So the start place and the section start are the same and interpolating doesn't change the start point ([-122.0908901, 37.392124] to [-122.0908901, 37.392124]).

So when we add the new start point, it is a duplicate of the existing point

2016-09-04 21:25:10,957:DEBUG:Resampling entry list                            fmt_time            ts   longitude   latitude
0  2016-07-25T17:12:27.853000-07:00  1.469492e+09 -122.090890  37.392124
1  2016-07-25T17:12:27.853000-07:00  1.469492e+09 -122.090890  37.392124

and when we resample, we end up with NaN values for the first point

2016-09-04 21:25:11,000:DEBUG:with_speeds_df =    idx            ts                          fmt_time   longitude   latitude
0    0  1.469492e+09  2016-07-25T17:12:27.853000-07:00         NaN        NaN
1    1  1.469492e+09  2016-07-25T17:12:57.853000-07:00 -122.092124  37.392522

And when we drop_na, we drop that value and, in effect, the first point.

2016-09-04 21:25:11,002:INFO:removed -2 entries containing n/a

Fix is simple - just don't add the extrapolated start point if it is the same.

shankari commented 8 years ago

This doesn't seem to have fixed it. I see a similar issue in the trips at Hawaii, and also see an issue at the trip end.

Example 1 Example 2
simulator screen shot sep 5 2016 9 56 49 am simulator screen shot sep 5 2016 9 57 42 am

By adding a breakpoint to the client, we can get the ids of the affected trips

shankari commented 8 years ago

Looking at the first raw trip (57cda0fff6858f4c2989a89a). Exact start time is 2016-07-27T14:43:45.576116-10:00.

The resampled points don't have any n/a and nothing is stripped out as well.

2016-09-05 09:45:06,003:DEBUG:with_speeds_df =    idx            ts                          fmt_time   longitude   latitude
0    0  1.469667e+09  2016-07-27T14:43:45.576116-10:00 -157.820470  21.276773
1    1  1.469667e+09  2016-07-27T14:44:15.576116-10:00 -157.820711  21.276697
2    2  1.469667e+09  2016-07-27T14:44:45.576116-10:00 -157.820951  21.276621
3    3  1.469667e+09  2016-07-27T14:45:15.576116-10:00 -157.821191  21.276545
4    4  1.469667e+09  2016-07-27T14:45:45.576116-10:00 -157.821431  21.276468
2016-09-05 09:45:06,005:INFO:removed 0 entries containing n/a

We did insert a recreated location matching the first point, and the recreated location does match the first point [-157.8204702, 21.276773] and the start_loc matches it as well.

2016-09-05 09:45:06,040:DEBUG:Inserting entry Entry({'data': Recreatedlocation(
{'loc': {"coordinates": [-157.82047019999999, 21.276772999999999], "type": "Point"},
'idx': 0,
'fmt_time': '2016-07-27T14:43:45.576116-10:00',
'altitude': 34.9000244140625,
'ts': 1469666625.5761163,
'longitude': -157.8204702,
'mode': 2,
'section': ObjectId('57cda112f6858f4c2989adab'),
'latitude': 21.276773,
'_id': ObjectId('57cda112f6858f4c2989adac'),
'key': 'analysis/recreated_location'})}) into timeseries

But the start_place location is [-157.8223594, 21.2755391]. Is this an extrapolation boo-boo? Too little extrapolation? The opposite of (82b03b7470fe6ca6ad8de8a5b20ad280c4d98dd7)?

Hmmm, before resampling, we had this dataframe. so the lat/lng for the first point is unchanged.

2016-09-05 09:45:05,954:DEBUG:Resampling data of size 10
2016-09-05 09:45:05,959:DEBUG:Resampling entry list                            fmt_time            ts   longitude   latitude
0  2016-07-27T14:43:45.576116-10:00  1.469667e+09 -157.820470  21.276773
1         2016-07-27T14:46:01-10:00  1.469667e+09 -157.821555  21.276429
2  2016-07-27T14:46:05.949000-10:00  1.469667e+09 -157.821555  21.276429
3         2016-07-27T14:46:43-10:00  1.469667e+09 -157.821662  21.275970
4  2016-07-27T14:46:56.367000-10:00  1.469667e+09 -157.821769  21.275986 of size 10

Hm, when we extrapolated, we extended to [-157.8204702, 21.276773]. But the start_place is at [-157.8223594, 21.2755391]. Why didn't we try to extrapolate all the way there?

2016-09-05 09:45:05,920:DEBUG:Start place is 57cda0fff6858f4c2989a899 and corresponding location is 5799450cf311e474bfc6c893
2016-09-05 09:45:05,933:DEBUG:Adding distance 118.721953624 to original 177.498418819 to extend section start from [-157.821555, 21.2764293] to [-157.8204702, 21.276773]
2016-09-05 09:45:05,937:DEBUG:After subtracting time 135.423883626 from original 175.39199996 to cover additional distance 118.721953624 at speed 0.876669243602, new_start_ts = 1469666625.58

Here's the raw start place. Aaaand, it is at [-157.8204702, 21.276773]. So why did we change the cleaned place to have a different location?

2016-09-05 09:44:48,030:DEBUG:Saving entry Entry({'data': Rawplace(
{'enter_fmt_time': u'2016-07-27T13:23:27-10:00',
'ending_trip': ObjectId('57cda0fff6858f4c2989a898'),
'starting_trip': ObjectId('57cda0fff6858f4c2989a89a'),
'duration': 4928.934000015259,
'exit_fmt_time': u'2016-07-27T14:45:35.934000-10:00',
'source': 'DwellSegmentationTimeFilter',
'location': {"coordinates": [-157.8204702, 21.276773], "type": "Point"},
'enter_ts': 1469661807,
'exit_ts': 1469666735.934}),
'_id': ObjectId('57cda0fff6858f4c2989a899'),
'key': 'segmentation/raw_place'})}) into timeseries

Aha! It's because we link the raw place to a cleaned place. And the location of the cleaned place is [-157.8208665, 21.2770023]. It must be within 100m or so, but the difference is noticeable.

2016-09-05 09:45:14,032:DEBUG:for trip Entry({'data': Cleanedtrip(
{'distance': 278.01309740105614,
'raw_trip': ObjectId('57cda0fff6858f4c2989a89a'),
'start_loc': {u'type': u'Point', u'coordinates': [-157.8204702, 21.276773]},
u'end_ts': 1469666936.392,
'start_ts': 1469666625.5761163,
'start_fmt_time': '2016-07-27T14:43:45.576116-10:00',
u'end_loc': {u'type': u'Point', u'coordinates': [-157.8223594, 21.2755391]},
u'source': u'DwellSegmentationTimeFilter',
u'end_fmt_time': u'2016-07-27T14:48:56.392000-10:00',
'_id': ObjectId('57cda111f6858f4c2989ada9'),
'key': 'analysis/cleaned_trip'})}) start,

converting Entry({'data': Cleanedplace({u'enter_fmt_time': u'2016-07-27T13:09:51.585000-10:00',
'display_name': u'Ala Wai Boulevard, Honolulu',
'ending_trip': ObjectId('57cda111f6858f4c2989ad8c'),
u'source': u'DwellSegmentationTimeFilter',
u'location': {u'type': u'Point', u'coordinates': [-157.8208665, 21.2770023]},
u'enter_ts': 1469660991.585,
'raw_places': [ObjectId('57cda0fff6858f4c2989a897'), ObjectId('57cda0fff6858f4c2989a897')]}),
'_id': ObjectId('57cda11af6858f4c2989af54'),
'key': 'analysis/cleaned_place'})}) to

Entry({u'data': {u'enter_fmt_time': u'2016-07-27T13:23:27-10:00',
u'exit_fmt_time': u'2016-07-27T14:45:35.934000-10:00',
u'ending_trip': ObjectId('57cda0fff6858f4c2989a898'),
u'starting_trip': ObjectId('57cda0fff6858f4c2989a89a'),
u'source': u'DwellSegmentationTimeFilter',
u'location': {u'type': u'Point', u'coordinates': [-157.8204702, 21.276773]},
u'enter_ts': 1469661807,
u'duration': 4928.934000015259,
u'exit_ts': 1469666735.934},
u'_id': ObjectId('57cda0fff6858f4c2989a899'),
u'key': u'segmentation/raw_place'}})
shankari commented 8 years ago

We can fix this by:

  1. changing the location of the cleaned trip to match the raw trip, or
  2. changing the start of the trip to match the cleaned place

If we go with (1), we will end up with a gap at the end of the previous trip, so that's out.

If we go with (2), we need to fix a bunch of other things as well: a) trip start time b) section start time and location c) add a new recreated location at the new section start time

Once we do (2) and (a), (b) is not that hard. But (a) is hard because we have to essentially extrapolate (again) and then we have to handle over extrapolation and so on, which seems like a huge hassle.

I don't want to have to keep extrapolating all the time. And it is not even clear that extrapolation is correct. These are two places that were supposed to be the same, and it only noise that one of them was picked over the other.

So to make life easier, we will do something simple for (a). We will change the trip start time by 30 secs. This allows us to add a new recreated location 30 secs away and be consistent with everything else. And a fuzz factor of 30 secs is unlikely to be noticeable or make a huge difference wrt trip start times.

shankari commented 8 years ago

Looking at the trip with the gap at the end (57cda114f6858f4c2989ae8c)

We filter a bunch of points. Should check if that is legit, seems like a lot.

2016-09-05 09:45:08,848:DEBUG:deleting 13 points from section points
2016-09-05 09:45:08,858:DEBUG:Found 42 results

Extrapolated, but don't care.

2016-09-05 09:45:08,866:DEBUG:Found first section, need to extrapolate start point
2016-09-05 09:45:08,868:DEBUG:Start place is 57cda100f6858f4c2989a8ab and corresponding location is 57997fadf311e474bfc74660
2016-09-05 09:45:08,901:DEBUG:Adding distance 157.281199244 to original 813.870168546 to extend section start from [-157.8045337, 21.2631457] to [-157.8059752, 21.2635885]
2016-09-05 09:45:08,905:DEBUG:After subtracting time 241.597257482 from original 801.0 to cover additional distance 157.281199244 at speed 0.651005731123, new_start_ts = 1469675860.4

Last reconstructed location is [-157.7998833, 21.2663029] which matches the last point. No n/a entries were removed.

2016-09-05 09:45:09,057:INFO:removed 0 entries containing n/a
....
2016-09-05 09:45:09,260:DEBUG:Inserting entry Entry({'data':
Recreatedlocation({'loc': {"coordinates": [-157.7998833, 21.266302899999999], "type": "Point"},
'idx': 35,
'fmt_time': '2016-07-27T17:35:03-10:00',
'altitude': 90.79998779296875,
'ts': 1469676903.0,
'longitude': -157.7998833,
'mode': 2,
'section': ObjectId('57cda115f6858f4c2989ae8e'),
'latitude': 21.2663029,
'speed': 1.641492966737403,
'heading': -32.37982516407363,
'distance': 37.093239049794626}),
'_id': ObjectId('57cda115f6858f4c2989aeb2'),
'key': 'analysis/recreated_location'})}) into timeseries

Raw section, trip and end place.

2016-09-05 09:45:08,780:DEBUG:Starting with element of type trip, id 57cda104f6858f4c2989a8d5, details Entry({u'data': {u'sensed_mode': 2,
u'start_loc': {u'type': u'Point', u'coordinates': [-157.8061887, 21.2624899]},
u'end_ts': 1469677096.611,
u'start_ts': 1469676085.625,
u'start_fmt_time': u'2016-07-27T17:21:25.625000-10:00',
u'end_loc': {u'type': u'Point', u'coordinates': [-157.8001479, 21.2678968]},
u'source': u'SmoothedHighConfidenceMotion',
u'end_fmt_time': u'2016-07-27T17:38:16.611000-10:00',
u'duration': 1010.9860000610352,
u'trip_id': ObjectId('57cda100f6858f4c2989a8ac'),
u'_id': ObjectId('57cda104f6858f4c2989a8d5'),
u'key': u'segmentation/raw_section'}})

2016-09-05 09:45:08,729:DEBUG:Filtering trip Entry({u'data': {u'distance': 867.9314413785075,
u'end_place': ObjectId('57cda100f6858f4c2989a8ad'),
u'start_loc': {u'type': u'Point', u'coordinates': [-157.8061887, 21.2624899]},
u'end_ts': 1469677096.611,
u'start_ts': 1469676085.625,
u'start_fmt_time': u'2016-07-27T17:21:25.625000-10:00',
u'end_loc': {u'type': u'Point', u'coordinates': [-157.8001479, 21.2678968]},
u'source': u'DwellSegmentationTimeFilter',
u'start_place': ObjectId('57cda100f6858f4c2989a8ab'),
u'end_fmt_time': u'2016-07-27T17:38:16.611000-10:00',
u'_id': ObjectId('57cda100f6858f4c2989a8ac'),
u'key': u'segmentation/raw_trip'}})

2016-09-05 09:44:48,510:DEBUG:Saving entry Entry({'data': Rawplace({'enter_fmt_time': u'2016-07-27T17:38:16.611000-10:00',
'ending_trip': ObjectId('57cda100f6858f4c2989a8ac'),
'starting_trip': ObjectId('57cda100f6858f4c2989a8ae'),
'duration': 486.38899993896484,
'exit_fmt_time': u'2016-07-27T17:46:23-10:00',
'source': 'DwellSegmentationTimeFilter',
'location': {"coordinates": [-157.8001479, 21.2678968], "type": "Point"},
'enter_ts': 1469677096.611,
'exit_ts': 1469677583}),
'_id': ObjectId('57cda100f6858f4c2989a8ad'),
'key': 'segmentation/raw_place'})}) into timeseries

Ok so it looks like this is an outcome of the smoothing. The original trip and section ended at 17:38. But the cleaned trip ends at 17:35, probably because we smoothed the trips at the end. Note that we smoothed 13/42 points. The enter time of the raw place is also 17:38 and I am not sure we fix the enter time of the cleaned place

2016-09-05 09:45:16,899:DEBUG:Found squished trip, linking raw start place 57cda100f6858f4c2989a8ad to new cleaned place 57cda11cf6858f4c2989af5c

2016-09-05 09:45:18,393:DEBUG:Inserting entry Entry({'data': Cleanedplace(
{u'enter_fmt_time': u'2016-07-27T17:38:16.611000-10:00',
'display_name': u'Diamond Head Road, Honolulu',
'exit_fmt_time': '2016-07-27T17:53:26.725442-10:00',
'ending_trip': ObjectId('57cda114f6858f4c2989ae8c'),
'starting_trip': ObjectId('57cda115f6858f4c2989aeb3'),
u'source': u'DwellSegmentationTimeFilter',
u'location': {u'type': u'Point', u'coordinates': [-157.8001479, 21.2678968]},
u'enter_ts': 1469677096.611,
'duration': 910.11444211006165,
'raw_places': [ObjectId('57cda100f6858f4c2989a8ad'),
ObjectId('57cda100f6858f4c2989a8ad')],
'exit_ts': 1469678006.7254422}),
'_id': ObjectId('57cda11cf6858f4c2989af5c'),
'key': 'analysis/cleaned_place'})}) into timeseries

Yup, that's it!

shankari commented 8 years ago

So now, how to fix it.

That's a bit more complicated. It depends on whether the next place is valid or not. If it is valid, we should extrapolate points to it If it is not valid, we should move it to the end of the section.

How do we know whether it is valid or not?

I think that we can make the assumption that a place is unlikely to be invalid. This is because for trip segmentation, we are not as blase as section segmentation. We either segment when there have been a bunch of points at the same location or if we have a large distance change at a slow speed.

Neither of these matches the error models that we have seen so far wrt jumps. While it is theoretically possible to have several points in the bad cluster, we have typically seen 3-4 (not 10) points in a bad/jumped cluster. And while jumps have large distance changes, they are typically at fast speeds (that's why they are called jumps).

So I claim that, given the error model for trip segmentation, it is unlikely that a place is incorrect. So if the last section has the last points filtered, we should just extrapolate them just like we do the start.

shankari commented 8 years ago

Note that in this case, although the last few points are stripped.

2016-09-05 09:44:55,709:INFO:^^^^^^^^^^^^^^^^^^^^Smoothing section 57cda104f6858f4c2989a8d5 for user 0763de67-f61e-3f5d-90e7-518e69793954^^^^^^^^^^^^^^^^^^^^
2016-09-05 09:44:55,709:DEBUG:filter_jumps(0763de67-f61e-3f5d-90e7-518e69793954, 57cda104f6858f4c2989a8d5) called
2016-09-05 09:44:55,746:DEBUG:curr_query = {'$or': [{'metadata.key': 'background/filtered_location'}], 'user_id': UUID('0763de67-f61e-3f5d-90e7-518e69793954'), 'data.ts': {'$lte': 1469677096.611, '$gte': 1469676085.625}}, sort_key = data.ts
2016-09-05 09:44:55,746:DEBUG:orig_ts_db_keys = ['background/filtered_location'], analysis_ts_db_keys = []
2016-09-05 09:44:55,746:DEBUG:finished querying values for ['background/filtered_location']
2016-09-05 09:44:55,746:DEBUG:finished querying values for []
2016-09-05 09:44:55,762:DEBUG:Found 42 results
2016-09-05 09:44:55,764:DEBUG:After de-duping, converted 42 points to 42 
2016-09-05 09:44:55,764:DEBUG:len(section_points_df) = 42
2016-09-05 09:44:55,782:DEBUG:section_points_df.shape = (42, 22), with_speeds_df.shape = (42, 25)
2016-09-05 09:44:55,802:DEBUG:quartile values are 0.25    0.00000
0.75    2.28429
Name: speed, dtype: float64
2016-09-05 09:44:55,802:DEBUG:iqr 2.2842895374
2016-09-05 09:44:55,802:DEBUG:maxSpeed = 9.13715814959
2016-09-05 09:44:55,805:DEBUG:smoothing: last_point index 42 not in found points Int64Index([0, 1, 23, 24, 31], dtype='int64')
2016-09-05 09:44:55,805:DEBUG:smoothing: added new entry 42
2016-09-05 09:44:55,806:DEBUG:For cluster 0 - 1, distance = 0.0, is_cluster = True
2016-09-05 09:44:55,807:DEBUG:For cluster 1 - 23, distance = 558.664905843, is_cluster = False
2016-09-05 09:44:55,811:DEBUG:For cluster 23 - 24, distance = 0.0, is_cluster = True
2016-09-05 09:44:55,812:DEBUG:For cluster 24 - 31, distance = 101.43061836, is_cluster = False
2016-09-05 09:44:55,813:DEBUG:For cluster 31 - 42, distance = 16.6414659154, is_cluster = True
2016-09-05 09:44:55,813:DEBUG:smoothing: segment_list = [Segment(0, 1, 0.0), Segment(1, 23, 558.664905843), Segment(23, 24, 0.0), Segment(24, 31, 101.43061836), Segment(31, 42, 16.6414659154)]
2016-09-05 09:44:55,819:DEBUG:bad_segments = [Segment(0, 1, 0.0), Segment(23, 24, 0.0), Segment(31, 42, 16.6414659154)]

They are actually a cluster because of the end of the trip, and the place is actually valid.

All points

screen shot 2016-09-05 at 4 25 58 pm

Valid points

screen shot 2016-09-05 at 4 27 07 pm

Valid points, extended to final point (place)

screen shot 2016-09-05 at 4 27 29 pm
shankari commented 8 years ago

Looking at the last raw trip (57cda100f6858f4c2989a8b0). Recall that mismatch is [-157.8001479, 21.2678968] versus [-157.8016326, 21.2695113]

Raw objects. Note that the raw place location is the same as the first point [-157.8016326, 21.2695113].

2016-09-05 09:44:48,528:DEBUG:Inserting entry Entry({'data': Rawtrip({'end_place': ObjectId('57cda100f6858f4c2989a8b1'),
'start_ts': 1469678014,
'start_place': ObjectId('57cda100f6858f4c2989a8af'),
'duration': 952,
'distance': 2315.9019817564267,
'start_loc': {"coordinates": [-157.8019085, 21.2695518], "type": "Point"},
'end_ts': 1469678966,
'start_fmt_time': u'2016-07-27T17:53:34-10:00',
'end_loc': {"coordinates": [-157.8232634, 21.275698], "type": "Point"},
'source': 'DwellSegmentationTimeFilter',
'end_fmt_time': u'2016-07-27T18:09:26-10:00',
'_id': ObjectId('57cda100f6858f4c2989a8b0'),
'key': 'segmentation/raw_trip'})}) into timeseries

2016-09-05 09:44:48,549:DEBUG:Saving entry Entry({'data': Rawplace({'enter_fmt_time': u'2016-07-27T17:47:59-10:00',
'ending_trip': ObjectId('57cda100f6858f4c2989a8ae'),
'starting_trip': ObjectId('57cda100f6858f4c2989a8b0'),
'duration': 335,
'exit_fmt_time': u'2016-07-27T17:53:34-10:00',
'source': 'DwellSegmentationTimeFilter',
'location': {"coordinates": [-157.8016326, 21.2695113], "type": "Point"},
'enter_ts': 1469677679,
'exit_ts': 1469678014}),
'_id': ObjectId('57cda100f6858f4c2989a8af'),
'key': 'segmentation/raw_place'})}) into timeseries

2016-09-05 09:44:52,666:DEBUG:Inserting entry Entry({'data': Section({'sensed_mode': 0,
'start_loc': {"coordinates": [-157.8019085, 21.2695518], "type": "Point"},
'end_ts': 1469678966,
'start_ts': 1469678014,
'start_fmt_time': u'2016-07-27T17:53:34-10:00',
'end_loc': {"coordinates": [-157.8232634, 21.275698], "type": "Point"},
'source': 'SmoothedHighConfidenceMotion',
'end_fmt_time': u'2016-07-27T18:09:26-10:00',
'duration': 952,
'trip_id': ObjectId('57cda100f6858f4c2989a8b0'),
'_id': ObjectId('57cda104f6858f4c2989a8d6'),
'key': 'segmentation/raw_section'})}) into timeseries
2016-09-05 09:44:55,876:INFO:^^^^^^^^^^^^^^^^^^^^Smoothing section 57cda104f6858f4c2989a8d6 for user 0763de67-f61e-3f5d-90e7-518e69793954^^^^^^^^^^^^^^^^^^^^
2016-09-05 09:44:55,949:DEBUG:Found 38 results
2016-09-05 09:44:56,014:DEBUG:smoothing: segment_list = [Segment(0, 38, 2315.90198176)]
2016-09-05 09:44:56,014:INFO:No jumps, nothing to filter
2016-09-05 09:44:56,015:DEBUG:deleted 0 points

First recreated location is consistent with the trip start loc and the start place location

2016-09-05 09:45:09,621:DEBUG:Inserting entry Entry({'data': 
Recreatedlocation({'loc': {"coordinates": [-157.8016326, 21.269511300000001], "type": "Point"},
'idx': 0,
'fmt_time': '2016-07-27T17:53:26.725442-10:00',
'ts': 1469678006.7254422,
'longitude': -157.8016326,
'mode': 0,
'section': ObjectId('57cda115f6858f4c2989aeb5'),
'latitude': 21.2695113,
'_id': ObjectId('57cda115f6858f4c2989aeb6'),
'key': 'analysis/recreated_location'})}) into timeseries
2016-09-05 09:45:16,899:DEBUG:for trip Entry({'data': Cleanedtrip({'distance': 3794.0904784587578,
'raw_trip': ObjectId('57cda100f6858f4c2989a8b0'),
'start_loc': {u'type': u'Point', u'coordinates': [-157.8016326, 21.2695113]},
u'end_ts': 1469678966.0,
'start_ts': 1469678006.7254422,
'start_fmt_time': '2016-07-27T17:53:26.725442-10:00',
u'end_loc': {u'type': u'Point', u'coordinates': [-157.8232634, 21.275698]},
u'source': u'DwellSegmentationTimeFilter',
u'end_fmt_time': u'2016-07-27T18:09:26-10:00',
'duration': 959.2745578289032,
'_id': ObjectId('57cda115f6858f4c2989aeb3'),
'key': 'analysis/cleaned_trip'})}) start,

converting Entry({'data': Cleanedplace(
{u'enter_fmt_time': u'2016-07-27T17:38:16.611000-10:00',
'display_name': u'Diamond Head Road, Honolulu',
'ending_trip': ObjectId('57cda114f6858f4c2989ae8c'),
u'source': u'DwellSegmentationTimeFilter',
u'location': {u'type': u'Point', u'coordinates': [-157.8001479, 21.2678968]},
u'enter_ts': 1469677096.611,
'raw_places': [ObjectId('57cda100f6858f4c2989a8ad'),
ObjectId('57cda100f6858f4c2989a8ad')]}),
'_id': ObjectId('57cda11cf6858f4c2989af5c'),
'key': 'analysis/cleaned_place'})})

to Entry({u'data': {u'enter_fmt_time': u'2016-07-27T17:47:59-10:00',
u'exit_fmt_time': u'2016-07-27T17:53:34-10:00',
u'ending_trip': ObjectId('57cda100f6858f4c2989a8ae'),
u'starting_trip': ObjectId('57cda100f6858f4c2989a8b0'),
u'source': u'DwellSegmentationTimeFilter',
u'location': {u'type': u'Point', u'coordinates': [-157.8016326, 21.2695113]},
u'enter_ts': 1469677679,
u'duration': 335,
u'exit_ts': 1469678014},
u'_id': ObjectId('57cda100f6858f4c2989a8af'),
u'key': u'segmentation/raw_place'}})

So this is the same as the first case (https://github.com/e-mission/e-mission-server/issues/385#issuecomment-244793203), except that in that case, the distance between the points was under the threshold, but here is was above. Let's take a quick check to see why the previous trip was stripped out.

>>> ecc.calDistance([-157.8208665, 21.2770023], [-157.8204702, 21.276773])
48.33483462934436

>>> ecc.calDistance([-157.8001479, 21.2678968], [-157.8016326, 21.2695113])
236.42702797373335

Because it was a zero section trip

2016-09-05 09:45:16,899:DEBUG:Found squished trip, linking raw start place 57cda100f6858f4c2989a8ad to new cleaned place 57cda11cf6858f4c2989af5c

2016-09-05 09:45:09,315:DEBUG:keys = (segmentation/raw_stop, segmentation/raw_section), len(places) = 0, len(trips) = 0
2016-09-05 09:45:09,315:DEBUG:places_or_stops = [] 
2016-09-05 09:45:09,315:DEBUG:trips_or_sections = [] 
2016-09-05 09:45:09,315:DEBUG:Starting with element of type none, id None, details None
2016-09-05 09:45:09,316:INFO:Found zero section trip Entry({u'data':
{u'distance': 31.60031422794774,
u'end_place': ObjectId('57cda100f6858f4c2989a8af'),
u'start_loc': {u'type': u'Point', u'coordinates': [-157.8013582, 21.2696353]},
u'end_ts': 1469677679,
u'start_ts': 1469677583,
u'start_fmt_time': u'2016-07-27T17:46:23-10:00',
u'end_loc': {u'type': u'Point', u'coordinates': [-157.8016326, 21.2695113]},
u'source': u'DwellSegmentationTimeFilter',
u'start_place': ObjectId('57cda100f6858f4c2989a8ad'),
u'end_fmt_time': u'2016-07-27T17:47:59-10:00',
u'duration': 96,
u'_id': ObjectId('57cda100f6858f4c2989a8ae'),
u'key': u'segmentation/raw_trip'}}) 

Because there are only three motion activites in that stretch, and two of them are invalid (TILTING = 5 and UNKNOWN = 4). The only valid one (BICYCLING = 1) is low confidence. So we are in the case in which we have zero high confidence motions, which we claimed that we would not handle (https://github.com/e-mission/e-mission-server/issues/356#issuecomment-240630757) because we had not seen an example of a real trip with no valid motion activity points. Maybe it is time to revisit that decision...

2016-09-05 09:44:52,395:INFO:++++++++++++++++++++Processing trip 57cda100f6858f4c2989a8ae for user 0763de67-f61e-3f5d-90e7-518e69793954++++++++++++++++++++
2016-09-05 09:44:52,468:DEBUG:After de-duping, converted 3 points to 3 

2016-09-05 09:44:52,482:DEBUG:motion_df =                         _id  confidence                          fmt_time  \
0  57998db3f311e474bfc7a207          42  2016-07-27T17:46:29.941000-10:00   
1  57998db3f311e474bfc7a208         100  2016-07-27T17:46:30.088000-10:00   
2  57998db3f311e474bfc7a20e          60  2016-07-27T17:47:30.090000-10:00   

type                               user_id  
1  0763de67-f61e-3f5d-90e7-518e69793954  
5  0763de67-f61e-3f5d-90e7-518e69793954  
4  0763de67-f61e-3f5d-90e7-518e69793954