Closed shankari closed 5 years ago
Given that this was such a short trip, I wondered if extending the duration of the motion activity search to include the long time upto the end of the tracking on the phone would help.
Extending the search upto 16:15 gives us some more points.
>>> filtered_motion_data_df = esda.get_data_df("background/motion_activity", UUID("0763de67-f61e-3f5d-90e7-518e69793954"), time_query=None)
>>> filtered_motion_data_df = filtered_motion_data_df[(filtered_motion_data_df.local_dt_day == 21) & (filtered_motion_data_df.local_dt_hour == 16) & (filtered_motion_data_df.local_dt_minute < 15)]
>>> filtered_motion_data_df.tail()[["confidence", "fmt_time", "type"]]
confidence fmt_time type
322 100 2016-06-21T16:09:10.625000-07:00 5
323 100 2016-06-21T16:09:23.574000-07:00 5
324 58 2016-06-21T16:09:24.031000-07:00 3
325 77 2016-06-21T16:09:46.029000-07:00 3
326 92 2016-06-21T16:10:14.630000-07:00 3
But unfortunately, they are all of type 3 and 5, which are ignored. If we only look at the non-ignored points, they all occur before 16:06.
>>> filtered_motion_data_df[(filtered_motion_data_df.type != 3) & (filtered_motion_data_df.type != 4) & (filtered_motion_data_df.type != 5)][["confidence", "fmt_time", "type"]]
confidence fmt_time type
307 69 2016-06-21T16:01:01.946000-07:00 1
308 38 2016-06-21T16:01:17.741000-07:00 1
311 48 2016-06-21T16:02:08.336000-07:00 1
312 46 2016-06-21T16:02:37.197000-07:00 1
313 35 2016-06-21T16:03:15.588000-07:00 1
314 42 2016-06-21T16:04:40.863000-07:00 2
316 60 2016-06-21T16:05:52.043000-07:00 2
and there is only one with a high enough threshold (> 60) to be considered.
It looks like there are two possible fixes here.
The problem with doing (1) is that we might miss short round trips with confused activities. Basically, the same as what happened now, but around the block.
The problem with (2) is that we don't have enough data points to construct a relevant model. What does it mean to be "consistent"? How many low confidence points should we retain? What happens if there are no high confidence points?
Looking at a random sampling of short trips, it looks like they do have data from the same activity.
>>> filtered_motion_data_df = esda.get_data_df("background/motion_activity", UUID("0763de67-f61e-3f5d-90e7-518e69793954"), time_query=None)
>>> filtered_motion_data_df = filtered_motion_data_df[(filtered_motion_data_df.local_dt_day == 20) & (filtered_motion_data_df.local_dt_hour == 8) & (filtered_motion_data_df.local_dt_minute < 59)]
>>> filtered_motion_data_df[(filtered_motion_data_df.type != 3) & (filtered_motion_data_df.type != 4) & (filtered_motion_data_df.type != 5)][["confidence", "fmt_time", "type"]]
confidence fmt_time type
0 100 2016-06-20T08:28:19.622000-07:00 2
1 100 2016-06-20T08:28:19.622000-07:00 2
4 100 2016-06-20T08:28:53.764000-07:00 2
5 100 2016-06-20T08:28:53.764000-07:00 2
8 85 2016-06-20T08:29:26.428000-07:00 2
...
59 41 2016-06-20T08:40:05.477000-07:00 0
81 35 2016-06-20T08:46:23.004000-07:00 2
82 77 2016-06-20T08:46:57.619000-07:00 2
83 100 2016-06-20T08:47:26.394000-07:00 2
we should also look at spurious trips.
>>> filtered_motion_data_df = esda.get_data_df("background/motion_activity", UUID("0763de67-f61e-3f5d-90e7-518e69793954"), time_query=None)
>>> filtered_motion_data_df = filtered_motion_data_df[(filtered_motion_data_df.local_dt_day == 21) & (filtered_motion_data_df.local_dt_hour == 14) & (filtered_motion_data_df.local_dt_minute < 59)]
>>> filtered_motion_data_df.type
229 3
230 3
231 3
232 3
233 3
234 3
235 3
236 3
Ok so the first thing to do here is to understand that we may not be able to get every single trip, at least initially. So from the list above, it looks like we can make two fairly small fixes that will handle this case and then see if we need to tweak it further.
Note that neither of these will handle the case of very short (~ 5 min) round trips that have no high confidence motion activity points. This is because it is really hard to distinguish those from spurious trips. While it may be possible to filter and boost the low confidence motion activities to detect these as well, I will err on the side of caution until we have a mechanism to correct trips before experimenting with that.
I'm actually going to simplify this even further. I am not going to handle the case in which there is not even one high confidence section, but the trip is a real trip. The reasons are:
it is somewhat complicated to figure out where best to do this.
>>> trips[["start_fmt_time", "start_loc", "end_fmt_time", "end_loc"]]
0 2016-06-21T13:56:55.672000-07:00 {u'type': u'Point', u'coordinates': [-122.0865... 2016-06-21T14:07:35.443000-07:00 {u'type': u'Point', u'coordinates': [-122.0863...
>>> ecc.calDistance(trips.iloc[0].start_loc["coordinates"], trips.iloc[1].end_loc["coordinates"])
1016.5149888773434
Since I haven't actually seen an example of a real trip with no valid motion activity points, I am going to punt on (2) above.
https://github.com/e-mission/e-mission-server/pull/377 added filtering trips by distance. Closing this for now.
wrt https://github.com/e-mission/e-mission-server/issues/356#issuecomment-240630757, found an example of a valid trip with zero valid motion activities at https://github.com/e-mission/e-mission-server/issues/385#issuecomment-244826007
This was a trip from the bus stop on the main road to the bus stop inside the community college. It might be important because if we ever wanted to do map matching, it would be useful to have the correct location for the first stop.
Also, wrt
even if we could figure out how to use the location of the previous place instead, it is not clear what threshold we should use. The threshold for segmentation is currently a parameter to the trip segmentation code. Should we have this as a general parameter? How should we specify it? ...
we have separate section segmentation methods for android and iOS, so we can also support separate thresholds.
This may not be a fixable issue, but I want to take a look at it while we are on a roll here.
For the 21st, we now have all trips except the trip back from Bella's house. As seen in https://github.com/e-mission/e-mission-server/pull/355#issuecomment-240307827, we currently show a trip there ending at 3:54 and a trip back starting at 5:13. This is clearly bogus because I didn't spend an hour there.
According to a prior investigation in https://github.com/e-mission/e-mission-server/issues/302#issuecomment-230661852, there was supposed to have been this trip.
but although there were 11 MotionActivity entries
there was only one unfiltered point, so there were no location points between that point and itself, so we skipped sections and because the trip has no sections, we skipped it.