e-mission / e-mission-docs

Repository for docs and issues. If you need help, please file an issue here. Public conversations are better for open source projects than private email.
https://e-mission.readthedocs.io/en/latest
BSD 3-Clause "New" or "Revised" License
15 stars 34 forks source link

Missing trip back from Bella's house #201

Closed shankari closed 5 years ago

shankari commented 8 years ago

This may not be a fixable issue, but I want to take a look at it while we are on a roll here.

For the 21st, we now have all trips except the trip back from Bella's house. As seen in https://github.com/e-mission/e-mission-server/pull/355#issuecomment-240307827, we currently show a trip there ending at 3:54 and a trip back starting at 5:13. This is clearly bogus because I didn't spend an hour there.

According to a prior investigation in https://github.com/e-mission/e-mission-server/issues/302#issuecomment-230661852, there was supposed to have been this trip.

2016-06-22 00:00:21,381:DEBUG:Setting new trip start point...u'2016-06-21T16:00:40.251000-07:00'
2016-06-22 00:00:21,917:INFO:Found trip end at 2016-06-21T16:06:20.105000-07:00

but although there were 11 MotionActivity entries

2016-06-22 00:00:22,272:INFO:++++++++++++++++++++Processing trip 5769d51688f6637d8e8eebe2 for user 0763de67-f61e-3f5d-90e7-518e69793954++++++++++++++++++++
2016-06-22 00:00:22,316:DEBUG:curr_query = {'$or': [{'metadata.key': 'background/motion_activity'}], 'user_id': UUID('0763de67-f61e-3f5d-90e7-518e69793954'), 'data.ts': {'$lte': 1466550380.105, '$gte': 1466550040.251}}, sort_key = data.ts
2016-06-22 00:00:22,317:DEBUG:Found 11 results

there was only one unfiltered point, so there were no location points between that point and itself, so we skipped sections and because the trip has no sections, we skipped it.

2016-06-22 00:00:22,324:DEBUG:filtered points [1]
2016-06-22 00:00:22,341:DEBUG:Considering MotionTypes.BICYCLING from 2016-06-21T16:01:01.946000-07:00 -> 2016-06-21T16:01:01.946000-07:00
2016-06-22 00:00:22,343:WARNING:Found no location points between
Motionactivity({u'confidence': 69,
 u'fmt_time': u'2016-06-21T16:01:01.946000-07:00',
 u'ts': 1466550061.9460001,
 'metadata_write_ts': 1466550061.9460001,
 '_id': ObjectId('5769d0ae4a9bba50e3fb4bb7'),
 u'type': 1})
and 
Motionactivity({u'confidence': 69,
 u'fmt_time': u'2016-06-21T16:01:01.946000-07:00',
 u'ts': 1466550061.946,
 'metadata_write_ts': 1466550061.946,
 '_id': ObjectId('5769d0ae4a9bba50e3fb4bb7'),
 u'type': 1})
shankari commented 8 years ago

Given that this was such a short trip, I wondered if extending the duration of the motion activity search to include the long time upto the end of the tracking on the phone would help.

Extending the search upto 16:15 gives us some more points.

>>> filtered_motion_data_df = esda.get_data_df("background/motion_activity", UUID("0763de67-f61e-3f5d-90e7-518e69793954"), time_query=None)
>>> filtered_motion_data_df = filtered_motion_data_df[(filtered_motion_data_df.local_dt_day == 21) & (filtered_motion_data_df.local_dt_hour == 16) & (filtered_motion_data_df.local_dt_minute < 15)]
>>> filtered_motion_data_df.tail()[["confidence", "fmt_time", "type"]]

    confidence  fmt_time    type
322     100     2016-06-21T16:09:10.625000-07:00    5
323     100     2016-06-21T16:09:23.574000-07:00    5
324      58     2016-06-21T16:09:24.031000-07:00    3
325      77     2016-06-21T16:09:46.029000-07:00    3
326      92     2016-06-21T16:10:14.630000-07:00    3

But unfortunately, they are all of type 3 and 5, which are ignored. If we only look at the non-ignored points, they all occur before 16:06.

>>> filtered_motion_data_df[(filtered_motion_data_df.type != 3) & (filtered_motion_data_df.type != 4) & (filtered_motion_data_df.type != 5)][["confidence", "fmt_time", "type"]]

    confidence  fmt_time    type
307     69  2016-06-21T16:01:01.946000-07:00    1
308     38  2016-06-21T16:01:17.741000-07:00    1
311     48  2016-06-21T16:02:08.336000-07:00    1
312     46  2016-06-21T16:02:37.197000-07:00    1
313     35  2016-06-21T16:03:15.588000-07:00    1
314     42  2016-06-21T16:04:40.863000-07:00    2
316     60  2016-06-21T16:05:52.043000-07:00    2

and there is only one with a high enough threshold (> 60) to be considered.

shankari commented 8 years ago

It looks like there are two possible fixes here.

  1. We could use the fact that the start and end points of the trip are not within 100m of each other to rule out a spurious trip. Since it is not a spurious trip, we would add a section that corresponds to the entire duration, with the one high confidence mode if it exists, or UNKNOWN if there are no high confidence modes.
  2. We could filter in two steps. We could first filter out the invalid modes, and if the resulting modes are consistent, even if they are low confidence, we could retain the points
shankari commented 8 years ago

The problem with doing (1) is that we might miss short round trips with confused activities. Basically, the same as what happened now, but around the block.

The problem with (2) is that we don't have enough data points to construct a relevant model. What does it mean to be "consistent"? How many low confidence points should we retain? What happens if there are no high confidence points?

shankari commented 8 years ago

Looking at a random sampling of short trips, it looks like they do have data from the same activity.

>>> filtered_motion_data_df = esda.get_data_df("background/motion_activity", UUID("0763de67-f61e-3f5d-90e7-518e69793954"), time_query=None)
>>> filtered_motion_data_df = filtered_motion_data_df[(filtered_motion_data_df.local_dt_day == 20) & (filtered_motion_data_df.local_dt_hour == 8) & (filtered_motion_data_df.local_dt_minute < 59)]
>>> filtered_motion_data_df[(filtered_motion_data_df.type != 3) & (filtered_motion_data_df.type != 4) & (filtered_motion_data_df.type != 5)][["confidence", "fmt_time", "type"]]

    confidence  fmt_time    type
0   100     2016-06-20T08:28:19.622000-07:00    2
1   100     2016-06-20T08:28:19.622000-07:00    2
4   100     2016-06-20T08:28:53.764000-07:00    2
5   100     2016-06-20T08:28:53.764000-07:00    2
8     85    2016-06-20T08:29:26.428000-07:00    2
...
59   41     2016-06-20T08:40:05.477000-07:00    0
81   35     2016-06-20T08:46:23.004000-07:00    2
82   77     2016-06-20T08:46:57.619000-07:00    2
83  100     2016-06-20T08:47:26.394000-07:00    2
shankari commented 8 years ago

we should also look at spurious trips.

>>> filtered_motion_data_df = esda.get_data_df("background/motion_activity", UUID("0763de67-f61e-3f5d-90e7-518e69793954"), time_query=None)
>>> filtered_motion_data_df = filtered_motion_data_df[(filtered_motion_data_df.local_dt_day == 21) & (filtered_motion_data_df.local_dt_hour == 14) & (filtered_motion_data_df.local_dt_minute < 59)]
>>> filtered_motion_data_df.type

229    3
230    3
231    3
232    3
233    3
234    3
235    3
236    3
shankari commented 8 years ago

Ok so the first thing to do here is to understand that we may not be able to get every single trip, at least initially. So from the list above, it looks like we can make two fairly small fixes that will handle this case and then see if we need to tweak it further.

Note that neither of these will handle the case of very short (~ 5 min) round trips that have no high confidence motion activity points. This is because it is really hard to distinguish those from spurious trips. While it may be possible to filter and boost the low confidence motion activities to detect these as well, I will err on the side of caution until we have a mechanism to correct trips before experimenting with that.

shankari commented 8 years ago

I'm actually going to simplify this even further. I am not going to handle the case in which there is not even one high confidence section, but the trip is a real trip. The reasons are:

Since I haven't actually seen an example of a real trip with no valid motion activity points, I am going to punt on (2) above.

shankari commented 8 years ago

https://github.com/e-mission/e-mission-server/pull/377 added filtering trips by distance. Closing this for now.

shankari commented 8 years ago

wrt https://github.com/e-mission/e-mission-server/issues/356#issuecomment-240630757, found an example of a valid trip with zero valid motion activities at https://github.com/e-mission/e-mission-server/issues/385#issuecomment-244826007

This was a trip from the bus stop on the main road to the bus stop inside the community college. It might be important because if we ever wanted to do map matching, it would be useful to have the correct location for the first stop.

Also, wrt

even if we could figure out how to use the location of the previous place instead, it is not clear what threshold we should use. The threshold for segmentation is currently a parameter to the trip segmentation code. Should we have this as a general parameter? How should we specify it? ...

we have separate section segmentation methods for android and iOS, so we can also support separate thresholds.

shankari commented 8 years ago

Fixed in https://github.com/e-mission/e-mission-server/pull/382, commit https://github.com/shankari/e-mission-server/commit/c25997d56f47ec4e1952bf1cc9f6ef12937cb3f1