e-mission / e-mission-docs

Repository for docs and issues. If you need help, please file an issue here. Public conversations are better for open source projects than private email.
https://e-mission.readthedocs.io/en/latest
BSD 3-Clause "New" or "Revised" License
15 stars 34 forks source link

Spurious trips for Sunil on Jul 27 #204

Closed shankari closed 5 years ago

shankari commented 8 years ago

@sunil07t reported that he had 7 spurious trips on the 27th although he didn't really go anywhere. This bug will track the investigation and any potential fix.

shankari commented 8 years ago

when I pull the data and re-run the pipeline locally, I get a single trip from 11pm to 11pm. Let's see why we end up with a 0km trip.

simulator screen shot aug 21 2016 12 49 44 pm

shankari commented 8 years ago

The first thing to note is that we don't have a lot of points. For the entire day of the 27th, we have 6 location points.

>>> locs_df = esta.TimeSeries.get_time_series(UUID('789a6ea5-7d20-3bf7-8127-b7756ffe46eb')).get_data_df('background/filtered_location', time_query=None)
>>> locs_df[["fmt_time"]]

    fmt_time
0   2016-07-27T14:28:39.046601-07:00
1   2016-07-27T18:13:21.472249-07:00
2   2016-07-27T18:13:30.589242-07:00
3   2016-07-27T18:14:05.555026-07:00
4   2016-07-27T22:16:50.862365-07:00
5   2016-07-27T23:00:21.145487-07:00

so it looks like we are displaying the trip from 23:00 to 23:00, which corresponds to a single point, which is why the distance is 0 and the % is NaN

Let's see how we handled those points

2016-08-21 13:19:36,354:DEBUG:------------------------------2016-07-27T14:28:39.046601-07:00------------------------------
2016-08-21 13:19:36,357:DEBUG:Setting new trip start point AttrDict
2016-08-21 13:19:36,358:DEBUG:------------------------------2016-07-27T18:13:21.472249-07:00------------------------------
2016-08-21 13:19:36,360:INFO:Found trip end at 2016-07-27T14:28:39.046601-07:00
2016-08-21 13:19:36,361:INFO:Points AttrDict() and AttrDict() are within the distance filter so part of the same trip
2016-08-21 13:19:36,362:DEBUG:------------------------------2016-07-27T18:13:30.589242-07:00------------------------------
2016-08-21 13:19:36,363:INFO:Points AttrDict() and AttrDict() are within the distance filter so part of the same trip
2016-08-21 13:19:36,363:DEBUG:------------------------------2016-07-27T18:14:05.555026-07:00------------------------------
2016-08-21 13:19:36,364:INFO:Points AttrDict() and AttrDict() are within the distance filter so part of the same trip
2016-08-21 13:19:36,365:DEBUG:------------------------------2016-07-27T22:16:50.862365-07:00------------------------------
2016-08-21 13:19:36,367:DEBUG:Setting new trip start point AttrDict() with idx 4
2016-08-21 13:19:36,367:DEBUG:------------------------------2016-07-27T23:00:21.145487-07:00------------------------------
2016-08-21 13:19:36,370:INFO:Found trip end at 2016-07-27T22:16:50.862365-07:00
2016-08-21 13:19:36,371:DEBUG:Setting new trip start point AttrDict() with idx 5
2016-08-21 13:19:36,378:DEBUG:stopped_moving_after_last =                             fmt_time  transition
46  2016-07-27T23:20:22.468735-07:00           2
2016-08-21 13:19:36,378:DEBUG:Found 1 transitions after last point, ending trip...

So it looks like we did get a spurious trip, but it wasn't the only one. We actually got multiple spurious trips that day. Why is it that only this one survived the cleaning process?

shankari commented 8 years ago

Let's focus on the section segmentation.

As we saw in https://github.com/e-mission/e-mission-server/issues/369#issuecomment-241282303, there are three raw trips

2016-08-21 13:19:36,581:DEBUG:finished querying values for ['segmentation/raw_trip']
2016-08-21 13:19:36,583:DEBUG:Returning entry with length 3 result

First trip has no activities, one location (this one) and no transitions, so we don't create a section

2016-08-21 13:19:36,583:INFO:++++++++++++++++++++Processing trip 57ba0cd8f6858f8c24d90ee8 for user 789a6ea5-7d20-3bf7-8127-b7756ffe46eb++++++++++++++++++++
2016-08-21 13:19:36,737:DEBUG:curr_query = {'$or': [{'metadata.key': 'background/motion_activity'}], 'user_id': UUID('789a6ea5-7d20-3bf7-8127-b7756ffe46eb'), 'data.ts': {'$lte': 1469654919.046601, '$gte': 1469654919.046601}}, sort_key = data.ts
2016-08-21 13:19:36,758:DEBUG:Found 0 results

2016-08-21 13:19:36,759:DEBUG:curr_query = {'$or': [{'metadata.key': 'background/filtered_location'}], 'user_id': UUID('789a6ea5-7d20-3bf7-8127-b7756ffe46eb'), 'data.ts': {'$lte': 1469654919.046601, '$gte': 1469654919.046601}}, sort_key = data.ts
2016-08-21 13:19:36,780:DEBUG:Found 1 results

2016-08-21 13:19:36,780:DEBUG:curr_query = {'$or': [{'metadata.key': 'statemachine/transition'}], 'user_id': UUID('789a6ea5-7d20-3bf7-8127-b7756ffe46eb'), 'data.ts': {'$lte': 1469655219.046601, '$gte': 1469654919.046601}}, sort_key = data.ts
2016-08-21 13:19:36,785:DEBUG:Found 0 results
2016-08-21 13:19:36,786:DEBUG:there are no transitions, which means no visit transitions, not creating a section

For the second trip, we actually found two transitions, but they are not visit transitions, so we skipped the section

2016-08-21 13:19:36,843:INFO:++++++++++++++++++++Processing trip 57ba0cd8f6858f8c24d90eea for user 789a6ea5-7d20-3bf7-8127-b7756ffe46eb++++++++++++++++++++
2016-08-21 13:19:36,972:DEBUG:curr_query = {'$or': [{'metadata.key': 'background/motion_activity'}], 'user_id': UUID('789a6ea5-7d20-3bf7-8127-b7756ffe46eb'), 'data.ts': {'$lte': 1469683010.862365, '$gte': 1469683010.862365}}, sort_key = data.ts
2016-08-21 13:19:36,974:DEBUG:Found 0 results

2016-08-21 13:19:36,975:DEBUG:curr_query = {'$or': [{'metadata.key': 'background/filtered_location'}], 'user_id': UUID('789a6ea5-7d20-3bf7-8127-b7756ffe46eb'), 'data.ts': {'$lte': 1469683010.862365, '$gte': 1469683010.862365}}, sort_key = data.ts
2016-08-21 13:19:36,982:DEBUG:Found 1 results

2016-08-21 13:19:36,983:DEBUG:curr_query = {'$or': [{'metadata.key': 'statemachine/transition'}], 'user_id': UUID('789a6ea5-7d20-3bf7-8127-b7756ffe46eb'), 'data.ts': {'$lte': 1469683310.862365, '$gte': 1469683010.862365}}, sort_key = data.ts
2016-08-21 13:19:36,997:DEBUG:Found 2 results
2016-08-21 13:19:36,999:DEBUG:there are some transitions, but none of them are visit, not creating a section

For the third trip, we found 5 results, including a visit transition, so we created a section

2016-08-21 13:19:37,041:INFO:++++++++++++++++++++Processing trip 57ba0cd8f6858f8c24d90eec for user 789a6ea5-7d20-3bf7-8127-b7756ffe46eb++++++++++++++++++++
2016-08-21 13:19:37,189:DEBUG:curr_query = {'$or': [{'metadata.key': 'background/motion_activity'}], 'user_id': UUID('789a6ea5-7d20-3bf7-8127-b7756ffe46eb'), 'data.ts': {'$lte': 1469685621.145487, '$gte': 1469685621.145487}}, sort_key = data.ts
2016-08-21 13:19:37,192:DEBUG:Found 0 results
2016-08-21 13:19:37,193:DEBUG:curr_query = {'$or': [{'metadata.key': 'background/filtered_location'}], 'user_id': UUID('789a6ea5-7d20-3bf7-8127-b7756ffe46eb'), 'data.ts': {'$lte': 1469685621.145487, '$gte': 1469685621.145487}}, sort_key = data.ts
2016-08-21 13:19:37,199:DEBUG:Found 1 results
2016-08-21 13:19:37,199:DEBUG:curr_query = {'$or': [{'metadata.key': 'statemachine/transition'}], 'user_id': UUID('789a6ea5-7d20-3bf7-8127-b7756ffe46eb'), 'data.ts': {'$lte': 1469685921.145487, '$gte': 1469685621.145487}}, sort_key = data.ts
2016-08-21 13:19:37,205:DEBUG:Found 5 results
2016-08-21 13:19:37,210:DEBUG:found visit transition    transition                          fmt_time
2          14  2016-07-27T23:00:21.746401-07:00, returning dummy section

So the first two were filtered out because they are zero section trips, and this one was retained.

So now the next question is - why did this section have a visit transition? Is that legit? We have historically not had spurious visit transitions....

shankari commented 8 years ago

Here are all the transitions for that day. VISIT_STARTED = 13, VISIT_ENDED = 14.

2016-07-27T00:42:18.925328-07:00 TransitionType.NOP
2016-07-27T00:55:05.410142-07:00 TransitionType.NOP
2016-07-27T10:01:13.903622-07:00 TransitionType.NOP
2016-07-27T18:13:28.449253-07:00 TransitionType.VISIT_ENDED
2016-07-27T18:13:31.703252-07:00 TransitionType.TRIP_STARTED
2016-07-27T18:13:32.834289-07:00 TransitionType.TRIP_STARTED
2016-07-27T18:33:22.452302-07:00 TransitionType.VISIT_STARTED
2016-07-27T18:33:22.699790-07:00 TransitionType.TRIP_END_DETECTED
2016-07-27T18:33:23.610171-07:00 TransitionType.END_TRIP_TRACKING
2016-07-27T18:33:23.730183-07:00 TransitionType.STOPPED_MOVING
2016-07-27T19:00:08.600376-07:00 TransitionType.RECEIVED_SILENT_PUSH
2016-07-27T19:00:10.912742-07:00 TransitionType.DATA_PUSHED
2016-07-27T19:00:11.201154-07:00 TransitionType.RECEIVED_SILENT_PUSH
2016-07-27T19:00:16.145747-07:00 TransitionType.DATA_PUSHED
2016-07-27T20:00:11.121397-07:00 TransitionType.RECEIVED_SILENT_PUSH
2016-07-27T20:00:15.382205-07:00 TransitionType.RECEIVED_SILENT_PUSH
2016-07-27T20:00:19.117580-07:00 TransitionType.DATA_PUSHED
2016-07-27T20:00:21.792919-07:00 TransitionType.DATA_PUSHED
2016-07-27T21:00:15.795799-07:00 TransitionType.RECEIVED_SILENT_PUSH
2016-07-27T21:00:16.288776-07:00 TransitionType.RECEIVED_SILENT_PUSH
2016-07-27T21:00:18.399556-07:00 TransitionType.DATA_PUSHED
2016-07-27T21:00:28.864443-07:00 TransitionType.DATA_PUSHED
2016-07-27T22:00:07.375249-07:00 TransitionType.RECEIVED_SILENT_PUSH
2016-07-27T22:00:11.745880-07:00 TransitionType.DATA_PUSHED
2016-07-27T22:00:14.811067-07:00 TransitionType.RECEIVED_SILENT_PUSH
2016-07-27T22:00:18.382728-07:00 TransitionType.DATA_PUSHED
2016-07-27T22:11:39.586959-07:00 TransitionType.NOP
2016-07-27T22:16:50.807860-07:00 TransitionType.VISIT_ENDED
2016-07-27T22:16:50.895447-07:00 TransitionType.TRIP_STARTED
2016-07-27T22:16:50.999150-07:00 TransitionType.TRIP_STARTED
2016-07-27T22:36:48.561479-07:00 TransitionType.VISIT_STARTED
2016-07-27T22:36:48.801773-07:00 TransitionType.TRIP_END_DETECTED
2016-07-27T22:36:49.640546-07:00 TransitionType.END_TRIP_TRACKING
2016-07-27T22:36:49.728598-07:00 TransitionType.STOPPED_MOVING
2016-07-27T23:00:08.730546-07:00 TransitionType.RECEIVED_SILENT_PUSH
2016-07-27T23:00:11.385571-07:00 TransitionType.DATA_PUSHED
2016-07-27T23:00:20.876433-07:00 TransitionType.EXITED_GEOFENCE
2016-07-27T23:00:20.961980-07:00 TransitionType.TRIP_STARTED
2016-07-27T23:00:21.268974-07:00 TransitionType.TRIP_STARTED
2016-07-27T23:00:21.420076-07:00 TransitionType.TRIP_RESTARTED
2016-07-27T23:00:21.746401-07:00 TransitionType.VISIT_ENDED
2016-07-27T23:00:23.611015-07:00 TransitionType.RECEIVED_SILENT_PUSH
2016-07-27T23:00:23.679150-07:00 TransitionType.NOP
2016-07-27T23:20:21.321171-07:00 TransitionType.VISIT_STARTED
2016-07-27T23:20:21.451673-07:00 TransitionType.TRIP_END_DETECTED
2016-07-27T23:20:22.381304-07:00 TransitionType.END_TRIP_TRACKING
2016-07-27T23:20:22.468735-07:00 TransitionType.STOPPED_MOVING

If we focus on the visit transitions, we get the following transitions.

2016-07-27T18:13:28.449253-07:00 TransitionType.VISIT_ENDED
2016-07-27T18:33:22.452302-07:00 TransitionType.VISIT_STARTED
2016-07-27T22:16:50.807860-07:00 TransitionType.VISIT_ENDED
2016-07-27T22:36:48.561479-07:00 TransitionType.VISIT_STARTED
2016-07-27T23:00:21.746401-07:00 TransitionType.VISIT_ENDED
2016-07-27T23:20:21.321171-07:00 TransitionType.VISIT_STARTED

It certainly looks like we got visited ended transitions earlier, and they overlapped at least partially with the trips that we detected. Here's how the trips to transitions match up.

???
2016-08-21 13:19:37,425:DEBUG:Considering trip 57ba0cd8f6858f8c24d90ee8: 2016-07-27T14:28:39.046601-07:00 -> 2016-07-27T14:28:39.046601-07:00 

2016-07-27T18:13:28.449253-07:00 TransitionType.VISIT_ENDED
2016-07-27T18:33:22.452302-07:00 TransitionType.VISIT_STARTED
Glommed to previous trip

2016-07-27T22:16:50.807860-07:00 TransitionType.VISIT_ENDED
2016-07-27T22:36:48.561479-07:00 TransitionType.VISIT_STARTED
2016-08-21 13:19:37,425:DEBUG:Considering trip 57ba0cd8f6858f8c24d90eea: 2016-07-27T22:16:50.862365-07:00 -> 2016-07-27T22:16:50.862365-07:00 

2016-07-27T23:00:21.746401-07:00 TransitionType.VISIT_ENDED
2016-07-27T23:20:21.321171-07:00 TransitionType.VISIT_STARTED
2016-08-21 13:19:37,426:DEBUG:Considering trip 57ba0cd8f6858f8c24d90eec: 2016-07-27T23:00:21.145487-07:00 -> 2016-07-27T23:00:21.145487-07:00 

One difference between trip 2 and trip 3 is that in one case, the VISIT_ENDED was before the trip (22:16:50.807 versus 22:16:50.862) while in the other case the VISIT_ENDED was after the trip (23:00:21.746 versus 23:00:21.145). But in both cases, the transition was not within the trip boundaries, since the trip was basically 0 ms long. So why did both of them match some set of transitions?

shankari commented 8 years ago

since the trip was basically 0 ms long. So why did both of them match some set of transitions?

Because we don't actually search for the current boundaries of the trip. For trip 2, we search for

2016-08-21 13:19:36,983:DEBUG:curr_query = {'$or': [{'metadata.key': 'statemachine/transition'}], 'user_id': UUID('789a6ea5-7d20-3bf7-8127-b7756ffe46eb'),
'data.ts':
{'$lte': 1469683310.862365,
 '$gte': 1469683010.862365}}, sort_key = data.ts

and for trip three, we search for

2016-08-21 13:19:37,199:DEBUG:curr_query = {'$or': [{'metadata.key': 'statemachine/transition'}], 'user_id': UUID('789a6ea5-7d20-3bf7-8127-b7756ffe46eb'),
'data.ts':
{'$lte': 1469685921.145487,
 '$gte': 1469685621.145487}}, sort_key = data.ts

which, in turn, is because of

        # Yech! This feels really hacky, but if we have a really short trip,
        # then we may get the visit ending message after the trip has ended.
        # So let's expand the time query by 5 minutes.
        # This is based on the 10:06 -> 10:07 trip from the 22 Feb test case
        time_query.endTs = time_query.endTs + 5 * 60
        transition_df = timeseries.get_data_df('statemachine/transition', time_query)
        if len(transition_df) == 0:
            logging.debug("there are no transitions, which means no visit transitions, not creating a section")
            return None
shankari commented 8 years ago

I don't want to add another special case for this because it looks like we already "fixed" this once.

One simple fix might be to skip the section if there is only one point in the trip (in addition to the existing zero point fix). But in order to do that, we need to check the 10:06 -> 10:07 case on 22 Feb to make sure that it doesn't regress. Note that a single point trip effectively has distance zero, so we might as well strip it out.

The concern with that is that we might again miss short trips, specially since we cannot specify the geofence radius for iOS and we end up with points that are fairly far from the start.

A better option might be to fix the issue of extending the first section to the start of the trip (#288), and then checking to see what the distance of the trip is, or whether all points are within some distance or something. That way, if there is a real trip (non zero distance), but we only get the last one or two points, we will still retain it...

shankari commented 8 years ago

Note also that there is something a lot more troubling here.

???
2016-08-21 13:19:37,425:DEBUG:Considering trip 57ba0cd8f6858f8c24d90ee8: 2016-07-27T14:28:39.046601-07:00 -> 2016-07-27T14:28:39.046601-07:00 

There is no visit transition associated with it. In fact, there are no transitions between 10:00 and 18:00. So why did we get the point at 14:28? Is it a case of points being reported even when the tracking was turned off? Or did we get a geofence exit and start tracking and then not get any remote pushes for a while?

Wonder if sth like this is a cause of Randy's 40% drain...

shankari commented 8 years ago

A better option might be to fix the issue of extending the first section to the start of the trip (#288), and then checking to see what the distance of the trip is, or whether all points are within some distance or something. That way, if there is a real trip (non zero distance), but we only get the last one or two points, we will still retain it...

While re-running after fixing #288, we get

Trip Section
simulator screen shot aug 25 2016 11 21 02 am simulator screen shot aug 25 2016 11 21 33 am

That's probably because there is no existing starting place, so we didn't extrapolate the start point. But the start point and the end point are not the same. So why do we still have a non-zero distance?

shankari commented 8 years ago

So there is an additional distance of 500m! So much for the idea of skipping short trips.

2016-08-25 11:12:39,540:DEBUG:Found first section, need to extrapolate start point
2016-08-25 11:12:39,542:DEBUG:Start place is 57bf3516f6858f5286861a09 and corresponding location is 579999f5f311e474bfc7bdb3
2016-08-25 11:12:39,554:DEBUG:Adding distance 472.602256212 to original 0 to extend section start from [-122.0401661844198, 37.33632258143028] to [-122.0400771686081, 37.34057220633825]
2016-08-25 11:12:39,556:INFO:speeds for this section are 0    0
Name: speed, dtype: int64, median is 0.0, skipping
2016-08-25 11:12:39,557:DEBUG:After subtracting time 0 from original 0.0 to cover additional distance 472.602256212 at speed 0.0, new_start_ts = 1469685621.15

Here's the fake new location point

2016-08-25 11:12:39,557:DEBUG:After copy, new data = Location({u'loc': {u'type': u'Point', u'coordinates': [-122.0400771686081, 37.34057220633825]}, u'floor': 0, u'altitude': 66.85079956054688, u'vaccuracy': 3, u'longitude': -122.0400771686081, u'filter': u'distance', u'latitude': 37.34057220633825, u'heading': -1, u'sensed_speed': 0, u'accuracy': 10})

We do add the new fake point to have a dataset of size 2

2016-08-25 11:12:39,558:DEBUG:Resampling data of size 2
2016-08-25 11:12:39,568:DEBUG:Resampling entry list    accuracy   altitude    filter  floor                          fmt_time  \
0        10  66.850800  distance      0  2016-07-27T23:00:21.145487-07:00   
1        10  91.901825  distance      0  2016-07-27T23:00:21.145487-07:00   

   heading   latitude                                                loc  \
0       -1  37.340572  {u'type': u'Point', u'coordinates': [-122.0400...   
1       -1  37.336323  {u'type': u'Point', u'coordinates': [-122.0401...   

                                            local_dt   longitude  \
0  {u'hour': 23, u'month': 7, u'second': 21, u'we... -122.040077   
1  {u'hour': 23, u'month': 7, u'second': 21, u'we... -122.040166   

   sensed_speed            ts  vaccuracy  
0             0  1.469686e+09          3  
1             0  1.469686e+09         32   of size 2

But then we resample at a 30 sec frequency, and because the two points have the same ts, they collapse back to one.

2016-08-25 11:12:39,569:DEBUG:tz_ranges = [{'timezone': u'America/Los_Angeles', 'end_ts': 1469685621.1454871, 'start_ts': 1469685621.1454871}]
shankari commented 8 years ago

I am really not sure how best to handle this. We can certainly strip out trips that have a single zero length section, but the goal was that if we actually moved from one place to another, then we would count it. This would improve timeline robustness and ensure that trips were not missed because of improper segmentation.

But in this case, we actually did detect movement of almost 500m, which is much bigger than any threshold we have had so far.

shankari commented 8 years ago

Found a similar issue for Gabe, only in his case, there is an actual section, although very small. Here's the compromise I am going to use. If the length of the trip is less than 100m, we skip it. Note that in Sunil's case above, the length of the trip is still 0 because we have no points. If we have a little < 100m perturbation in our timeline, that's going to have to be OK because we are not guaranteed that our trips end at the exact same location every time anyway.