Closed shankari closed 5 years ago
when I pull the data and re-run the pipeline locally, I get a single trip from 11pm to 11pm. Let's see why we end up with a 0km trip.
The first thing to note is that we don't have a lot of points. For the entire day of the 27th, we have 6 location points.
>>> locs_df = esta.TimeSeries.get_time_series(UUID('789a6ea5-7d20-3bf7-8127-b7756ffe46eb')).get_data_df('background/filtered_location', time_query=None)
>>> locs_df[["fmt_time"]]
fmt_time
0 2016-07-27T14:28:39.046601-07:00
1 2016-07-27T18:13:21.472249-07:00
2 2016-07-27T18:13:30.589242-07:00
3 2016-07-27T18:14:05.555026-07:00
4 2016-07-27T22:16:50.862365-07:00
5 2016-07-27T23:00:21.145487-07:00
so it looks like we are displaying the trip from 23:00 to 23:00, which corresponds to a single point, which is why the distance is 0 and the % is NaN
Let's see how we handled those points
2016-08-21 13:19:36,354:DEBUG:------------------------------2016-07-27T14:28:39.046601-07:00------------------------------
2016-08-21 13:19:36,357:DEBUG:Setting new trip start point AttrDict
2016-08-21 13:19:36,358:DEBUG:------------------------------2016-07-27T18:13:21.472249-07:00------------------------------
2016-08-21 13:19:36,360:INFO:Found trip end at 2016-07-27T14:28:39.046601-07:00
2016-08-21 13:19:36,361:INFO:Points AttrDict() and AttrDict() are within the distance filter so part of the same trip
2016-08-21 13:19:36,362:DEBUG:------------------------------2016-07-27T18:13:30.589242-07:00------------------------------
2016-08-21 13:19:36,363:INFO:Points AttrDict() and AttrDict() are within the distance filter so part of the same trip
2016-08-21 13:19:36,363:DEBUG:------------------------------2016-07-27T18:14:05.555026-07:00------------------------------
2016-08-21 13:19:36,364:INFO:Points AttrDict() and AttrDict() are within the distance filter so part of the same trip
2016-08-21 13:19:36,365:DEBUG:------------------------------2016-07-27T22:16:50.862365-07:00------------------------------
2016-08-21 13:19:36,367:DEBUG:Setting new trip start point AttrDict() with idx 4
2016-08-21 13:19:36,367:DEBUG:------------------------------2016-07-27T23:00:21.145487-07:00------------------------------
2016-08-21 13:19:36,370:INFO:Found trip end at 2016-07-27T22:16:50.862365-07:00
2016-08-21 13:19:36,371:DEBUG:Setting new trip start point AttrDict() with idx 5
2016-08-21 13:19:36,378:DEBUG:stopped_moving_after_last = fmt_time transition
46 2016-07-27T23:20:22.468735-07:00 2
2016-08-21 13:19:36,378:DEBUG:Found 1 transitions after last point, ending trip...
So it looks like we did get a spurious trip, but it wasn't the only one. We actually got multiple spurious trips that day. Why is it that only this one survived the cleaning process?
Let's focus on the section segmentation.
As we saw in https://github.com/e-mission/e-mission-server/issues/369#issuecomment-241282303, there are three raw trips
2016-08-21 13:19:36,581:DEBUG:finished querying values for ['segmentation/raw_trip']
2016-08-21 13:19:36,583:DEBUG:Returning entry with length 3 result
First trip has no activities, one location (this one) and no transitions, so we don't create a section
2016-08-21 13:19:36,583:INFO:++++++++++++++++++++Processing trip 57ba0cd8f6858f8c24d90ee8 for user 789a6ea5-7d20-3bf7-8127-b7756ffe46eb++++++++++++++++++++
2016-08-21 13:19:36,737:DEBUG:curr_query = {'$or': [{'metadata.key': 'background/motion_activity'}], 'user_id': UUID('789a6ea5-7d20-3bf7-8127-b7756ffe46eb'), 'data.ts': {'$lte': 1469654919.046601, '$gte': 1469654919.046601}}, sort_key = data.ts
2016-08-21 13:19:36,758:DEBUG:Found 0 results
2016-08-21 13:19:36,759:DEBUG:curr_query = {'$or': [{'metadata.key': 'background/filtered_location'}], 'user_id': UUID('789a6ea5-7d20-3bf7-8127-b7756ffe46eb'), 'data.ts': {'$lte': 1469654919.046601, '$gte': 1469654919.046601}}, sort_key = data.ts
2016-08-21 13:19:36,780:DEBUG:Found 1 results
2016-08-21 13:19:36,780:DEBUG:curr_query = {'$or': [{'metadata.key': 'statemachine/transition'}], 'user_id': UUID('789a6ea5-7d20-3bf7-8127-b7756ffe46eb'), 'data.ts': {'$lte': 1469655219.046601, '$gte': 1469654919.046601}}, sort_key = data.ts
2016-08-21 13:19:36,785:DEBUG:Found 0 results
2016-08-21 13:19:36,786:DEBUG:there are no transitions, which means no visit transitions, not creating a section
For the second trip, we actually found two transitions, but they are not visit transitions, so we skipped the section
2016-08-21 13:19:36,843:INFO:++++++++++++++++++++Processing trip 57ba0cd8f6858f8c24d90eea for user 789a6ea5-7d20-3bf7-8127-b7756ffe46eb++++++++++++++++++++
2016-08-21 13:19:36,972:DEBUG:curr_query = {'$or': [{'metadata.key': 'background/motion_activity'}], 'user_id': UUID('789a6ea5-7d20-3bf7-8127-b7756ffe46eb'), 'data.ts': {'$lte': 1469683010.862365, '$gte': 1469683010.862365}}, sort_key = data.ts
2016-08-21 13:19:36,974:DEBUG:Found 0 results
2016-08-21 13:19:36,975:DEBUG:curr_query = {'$or': [{'metadata.key': 'background/filtered_location'}], 'user_id': UUID('789a6ea5-7d20-3bf7-8127-b7756ffe46eb'), 'data.ts': {'$lte': 1469683010.862365, '$gte': 1469683010.862365}}, sort_key = data.ts
2016-08-21 13:19:36,982:DEBUG:Found 1 results
2016-08-21 13:19:36,983:DEBUG:curr_query = {'$or': [{'metadata.key': 'statemachine/transition'}], 'user_id': UUID('789a6ea5-7d20-3bf7-8127-b7756ffe46eb'), 'data.ts': {'$lte': 1469683310.862365, '$gte': 1469683010.862365}}, sort_key = data.ts
2016-08-21 13:19:36,997:DEBUG:Found 2 results
2016-08-21 13:19:36,999:DEBUG:there are some transitions, but none of them are visit, not creating a section
For the third trip, we found 5 results, including a visit transition, so we created a section
2016-08-21 13:19:37,041:INFO:++++++++++++++++++++Processing trip 57ba0cd8f6858f8c24d90eec for user 789a6ea5-7d20-3bf7-8127-b7756ffe46eb++++++++++++++++++++
2016-08-21 13:19:37,189:DEBUG:curr_query = {'$or': [{'metadata.key': 'background/motion_activity'}], 'user_id': UUID('789a6ea5-7d20-3bf7-8127-b7756ffe46eb'), 'data.ts': {'$lte': 1469685621.145487, '$gte': 1469685621.145487}}, sort_key = data.ts
2016-08-21 13:19:37,192:DEBUG:Found 0 results
2016-08-21 13:19:37,193:DEBUG:curr_query = {'$or': [{'metadata.key': 'background/filtered_location'}], 'user_id': UUID('789a6ea5-7d20-3bf7-8127-b7756ffe46eb'), 'data.ts': {'$lte': 1469685621.145487, '$gte': 1469685621.145487}}, sort_key = data.ts
2016-08-21 13:19:37,199:DEBUG:Found 1 results
2016-08-21 13:19:37,199:DEBUG:curr_query = {'$or': [{'metadata.key': 'statemachine/transition'}], 'user_id': UUID('789a6ea5-7d20-3bf7-8127-b7756ffe46eb'), 'data.ts': {'$lte': 1469685921.145487, '$gte': 1469685621.145487}}, sort_key = data.ts
2016-08-21 13:19:37,205:DEBUG:Found 5 results
2016-08-21 13:19:37,210:DEBUG:found visit transition transition fmt_time
2 14 2016-07-27T23:00:21.746401-07:00, returning dummy section
So the first two were filtered out because they are zero section trips, and this one was retained.
So now the next question is - why did this section have a visit transition? Is that legit? We have historically not had spurious visit transitions....
Here are all the transitions for that day. VISIT_STARTED = 13, VISIT_ENDED = 14.
2016-07-27T00:42:18.925328-07:00 TransitionType.NOP
2016-07-27T00:55:05.410142-07:00 TransitionType.NOP
2016-07-27T10:01:13.903622-07:00 TransitionType.NOP
2016-07-27T18:13:28.449253-07:00 TransitionType.VISIT_ENDED
2016-07-27T18:13:31.703252-07:00 TransitionType.TRIP_STARTED
2016-07-27T18:13:32.834289-07:00 TransitionType.TRIP_STARTED
2016-07-27T18:33:22.452302-07:00 TransitionType.VISIT_STARTED
2016-07-27T18:33:22.699790-07:00 TransitionType.TRIP_END_DETECTED
2016-07-27T18:33:23.610171-07:00 TransitionType.END_TRIP_TRACKING
2016-07-27T18:33:23.730183-07:00 TransitionType.STOPPED_MOVING
2016-07-27T19:00:08.600376-07:00 TransitionType.RECEIVED_SILENT_PUSH
2016-07-27T19:00:10.912742-07:00 TransitionType.DATA_PUSHED
2016-07-27T19:00:11.201154-07:00 TransitionType.RECEIVED_SILENT_PUSH
2016-07-27T19:00:16.145747-07:00 TransitionType.DATA_PUSHED
2016-07-27T20:00:11.121397-07:00 TransitionType.RECEIVED_SILENT_PUSH
2016-07-27T20:00:15.382205-07:00 TransitionType.RECEIVED_SILENT_PUSH
2016-07-27T20:00:19.117580-07:00 TransitionType.DATA_PUSHED
2016-07-27T20:00:21.792919-07:00 TransitionType.DATA_PUSHED
2016-07-27T21:00:15.795799-07:00 TransitionType.RECEIVED_SILENT_PUSH
2016-07-27T21:00:16.288776-07:00 TransitionType.RECEIVED_SILENT_PUSH
2016-07-27T21:00:18.399556-07:00 TransitionType.DATA_PUSHED
2016-07-27T21:00:28.864443-07:00 TransitionType.DATA_PUSHED
2016-07-27T22:00:07.375249-07:00 TransitionType.RECEIVED_SILENT_PUSH
2016-07-27T22:00:11.745880-07:00 TransitionType.DATA_PUSHED
2016-07-27T22:00:14.811067-07:00 TransitionType.RECEIVED_SILENT_PUSH
2016-07-27T22:00:18.382728-07:00 TransitionType.DATA_PUSHED
2016-07-27T22:11:39.586959-07:00 TransitionType.NOP
2016-07-27T22:16:50.807860-07:00 TransitionType.VISIT_ENDED
2016-07-27T22:16:50.895447-07:00 TransitionType.TRIP_STARTED
2016-07-27T22:16:50.999150-07:00 TransitionType.TRIP_STARTED
2016-07-27T22:36:48.561479-07:00 TransitionType.VISIT_STARTED
2016-07-27T22:36:48.801773-07:00 TransitionType.TRIP_END_DETECTED
2016-07-27T22:36:49.640546-07:00 TransitionType.END_TRIP_TRACKING
2016-07-27T22:36:49.728598-07:00 TransitionType.STOPPED_MOVING
2016-07-27T23:00:08.730546-07:00 TransitionType.RECEIVED_SILENT_PUSH
2016-07-27T23:00:11.385571-07:00 TransitionType.DATA_PUSHED
2016-07-27T23:00:20.876433-07:00 TransitionType.EXITED_GEOFENCE
2016-07-27T23:00:20.961980-07:00 TransitionType.TRIP_STARTED
2016-07-27T23:00:21.268974-07:00 TransitionType.TRIP_STARTED
2016-07-27T23:00:21.420076-07:00 TransitionType.TRIP_RESTARTED
2016-07-27T23:00:21.746401-07:00 TransitionType.VISIT_ENDED
2016-07-27T23:00:23.611015-07:00 TransitionType.RECEIVED_SILENT_PUSH
2016-07-27T23:00:23.679150-07:00 TransitionType.NOP
2016-07-27T23:20:21.321171-07:00 TransitionType.VISIT_STARTED
2016-07-27T23:20:21.451673-07:00 TransitionType.TRIP_END_DETECTED
2016-07-27T23:20:22.381304-07:00 TransitionType.END_TRIP_TRACKING
2016-07-27T23:20:22.468735-07:00 TransitionType.STOPPED_MOVING
If we focus on the visit transitions, we get the following transitions.
2016-07-27T18:13:28.449253-07:00 TransitionType.VISIT_ENDED
2016-07-27T18:33:22.452302-07:00 TransitionType.VISIT_STARTED
2016-07-27T22:16:50.807860-07:00 TransitionType.VISIT_ENDED
2016-07-27T22:36:48.561479-07:00 TransitionType.VISIT_STARTED
2016-07-27T23:00:21.746401-07:00 TransitionType.VISIT_ENDED
2016-07-27T23:20:21.321171-07:00 TransitionType.VISIT_STARTED
It certainly looks like we got visited ended transitions earlier, and they overlapped at least partially with the trips that we detected. Here's how the trips to transitions match up.
???
2016-08-21 13:19:37,425:DEBUG:Considering trip 57ba0cd8f6858f8c24d90ee8: 2016-07-27T14:28:39.046601-07:00 -> 2016-07-27T14:28:39.046601-07:00
2016-07-27T18:13:28.449253-07:00 TransitionType.VISIT_ENDED
2016-07-27T18:33:22.452302-07:00 TransitionType.VISIT_STARTED
Glommed to previous trip
2016-07-27T22:16:50.807860-07:00 TransitionType.VISIT_ENDED
2016-07-27T22:36:48.561479-07:00 TransitionType.VISIT_STARTED
2016-08-21 13:19:37,425:DEBUG:Considering trip 57ba0cd8f6858f8c24d90eea: 2016-07-27T22:16:50.862365-07:00 -> 2016-07-27T22:16:50.862365-07:00
2016-07-27T23:00:21.746401-07:00 TransitionType.VISIT_ENDED
2016-07-27T23:20:21.321171-07:00 TransitionType.VISIT_STARTED
2016-08-21 13:19:37,426:DEBUG:Considering trip 57ba0cd8f6858f8c24d90eec: 2016-07-27T23:00:21.145487-07:00 -> 2016-07-27T23:00:21.145487-07:00
One difference between trip 2 and trip 3 is that in one case, the VISIT_ENDED was before the trip (22:16:50.807 versus 22:16:50.862) while in the other case the VISIT_ENDED was after the trip (23:00:21.746 versus 23:00:21.145). But in both cases, the transition was not within the trip boundaries, since the trip was basically 0 ms long. So why did both of them match some set of transitions?
since the trip was basically 0 ms long. So why did both of them match some set of transitions?
Because we don't actually search for the current boundaries of the trip. For trip 2, we search for
2016-08-21 13:19:36,983:DEBUG:curr_query = {'$or': [{'metadata.key': 'statemachine/transition'}], 'user_id': UUID('789a6ea5-7d20-3bf7-8127-b7756ffe46eb'),
'data.ts':
{'$lte': 1469683310.862365,
'$gte': 1469683010.862365}}, sort_key = data.ts
and for trip three, we search for
2016-08-21 13:19:37,199:DEBUG:curr_query = {'$or': [{'metadata.key': 'statemachine/transition'}], 'user_id': UUID('789a6ea5-7d20-3bf7-8127-b7756ffe46eb'),
'data.ts':
{'$lte': 1469685921.145487,
'$gte': 1469685621.145487}}, sort_key = data.ts
which, in turn, is because of
# Yech! This feels really hacky, but if we have a really short trip,
# then we may get the visit ending message after the trip has ended.
# So let's expand the time query by 5 minutes.
# This is based on the 10:06 -> 10:07 trip from the 22 Feb test case
time_query.endTs = time_query.endTs + 5 * 60
transition_df = timeseries.get_data_df('statemachine/transition', time_query)
if len(transition_df) == 0:
logging.debug("there are no transitions, which means no visit transitions, not creating a section")
return None
I don't want to add another special case for this because it looks like we already "fixed" this once.
One simple fix might be to skip the section if there is only one point in the trip (in addition to the existing zero point fix). But in order to do that, we need to check the 10:06 -> 10:07 case on 22 Feb to make sure that it doesn't regress. Note that a single point trip effectively has distance zero, so we might as well strip it out.
The concern with that is that we might again miss short trips, specially since we cannot specify the geofence radius for iOS and we end up with points that are fairly far from the start.
A better option might be to fix the issue of extending the first section to the start of the trip (#288), and then checking to see what the distance of the trip is, or whether all points are within some distance or something. That way, if there is a real trip (non zero distance), but we only get the last one or two points, we will still retain it...
Note also that there is something a lot more troubling here.
???
2016-08-21 13:19:37,425:DEBUG:Considering trip 57ba0cd8f6858f8c24d90ee8: 2016-07-27T14:28:39.046601-07:00 -> 2016-07-27T14:28:39.046601-07:00
There is no visit transition associated with it. In fact, there are no transitions between 10:00 and 18:00. So why did we get the point at 14:28? Is it a case of points being reported even when the tracking was turned off? Or did we get a geofence exit and start tracking and then not get any remote pushes for a while?
Wonder if sth like this is a cause of Randy's 40% drain...
A better option might be to fix the issue of extending the first section to the start of the trip (#288), and then checking to see what the distance of the trip is, or whether all points are within some distance or something. That way, if there is a real trip (non zero distance), but we only get the last one or two points, we will still retain it...
While re-running after fixing #288, we get
Trip | Section |
---|---|
That's probably because there is no existing starting place, so we didn't extrapolate the start point. But the start point and the end point are not the same. So why do we still have a non-zero distance?
So there is an additional distance of 500m! So much for the idea of skipping short trips.
2016-08-25 11:12:39,540:DEBUG:Found first section, need to extrapolate start point
2016-08-25 11:12:39,542:DEBUG:Start place is 57bf3516f6858f5286861a09 and corresponding location is 579999f5f311e474bfc7bdb3
2016-08-25 11:12:39,554:DEBUG:Adding distance 472.602256212 to original 0 to extend section start from [-122.0401661844198, 37.33632258143028] to [-122.0400771686081, 37.34057220633825]
2016-08-25 11:12:39,556:INFO:speeds for this section are 0 0
Name: speed, dtype: int64, median is 0.0, skipping
2016-08-25 11:12:39,557:DEBUG:After subtracting time 0 from original 0.0 to cover additional distance 472.602256212 at speed 0.0, new_start_ts = 1469685621.15
Here's the fake new location point
2016-08-25 11:12:39,557:DEBUG:After copy, new data = Location({u'loc': {u'type': u'Point', u'coordinates': [-122.0400771686081, 37.34057220633825]}, u'floor': 0, u'altitude': 66.85079956054688, u'vaccuracy': 3, u'longitude': -122.0400771686081, u'filter': u'distance', u'latitude': 37.34057220633825, u'heading': -1, u'sensed_speed': 0, u'accuracy': 10})
We do add the new fake point to have a dataset of size 2
2016-08-25 11:12:39,558:DEBUG:Resampling data of size 2
2016-08-25 11:12:39,568:DEBUG:Resampling entry list accuracy altitude filter floor fmt_time \
0 10 66.850800 distance 0 2016-07-27T23:00:21.145487-07:00
1 10 91.901825 distance 0 2016-07-27T23:00:21.145487-07:00
heading latitude loc \
0 -1 37.340572 {u'type': u'Point', u'coordinates': [-122.0400...
1 -1 37.336323 {u'type': u'Point', u'coordinates': [-122.0401...
local_dt longitude \
0 {u'hour': 23, u'month': 7, u'second': 21, u'we... -122.040077
1 {u'hour': 23, u'month': 7, u'second': 21, u'we... -122.040166
sensed_speed ts vaccuracy
0 0 1.469686e+09 3
1 0 1.469686e+09 32 of size 2
But then we resample at a 30 sec frequency, and because the two points have the same ts, they collapse back to one.
2016-08-25 11:12:39,569:DEBUG:tz_ranges = [{'timezone': u'America/Los_Angeles', 'end_ts': 1469685621.1454871, 'start_ts': 1469685621.1454871}]
I am really not sure how best to handle this. We can certainly strip out trips that have a single zero length section, but the goal was that if we actually moved from one place to another, then we would count it. This would improve timeline robustness and ensure that trips were not missed because of improper segmentation.
But in this case, we actually did detect movement of almost 500m, which is much bigger than any threshold we have had so far.
Found a similar issue for Gabe, only in his case, there is an actual section, although very small. Here's the compromise I am going to use. If the length of the trip is less than 100m, we skip it. Note that in Sunil's case above, the length of the trip is still 0 because we have no points. If we have a little < 100m perturbation in our timeline, that's going to have to be OK because we are not guaranteed that our trips end at the exact same location every time anyway.
@sunil07t reported that he had 7 spurious trips on the 27th although he didn't really go anywhere. This bug will track the investigation and any potential fix.