Open shankari opened 8 years ago
It really looks like this is the main issue left. It covers both https://github.com/e-mission/e-mission-server/issues/288#issuecomment-246049222 and https://github.com/e-mission/e-mission-server/issues/378#issuecomment-246036383 (assuming it was a real trip and not a turned off phone).
Basically, we know that there was a trip. We can extrapolate it to the end of the previous trip, but "if it is too far" (for some definition of far), it is not really the same section. We don't want to end up with walking trips of 40 or 55 km.
Clearly 40 or 55 is too large, but what is reasonable? We can't really use the % as an argument because we frequently use this for really short trips, like some of the iPhone ones that prompted the extrapolation in the first place.
It seems reasonable that if the speed of the extrapolated section is not consistent with the speed of the existing section, it should be split out. Doesn't have to depend on actual modes, just consistency. Basically, if it is not possible to extrapolate that section because the domain is different, let's make a new section...
Need to restructure a bunch of code to do that though. And we need to define consistency.
One way of defining consistency is to use the same outlier detection strategy that we used for the zigzags (basically, 75 percentile + MAJOR/MINOR iqr).
If only the code restructuring were so easy!
Code restructuring is really hard. We're set up for dropping/merging but not for expanding, and the dropping merging part has been pretty hard too. We may consider just setting the entire section as UNKNOWN if the initial speed is an outlier.
Let's try that and see how it works
Quick check on existing instances, as observed by the previous check.
2015-07-20T12:27:40.288000-07:00 -> 2015-07-20T12:28:08.270000-07:00
Trying to extrapolate 9053.42681795 > 2 * original 22.5800646151, resetting add_dist = 45.1601292302
-------
2015-07-22T17:54:21.295000-07:00 -> 2015-07-22T17:54:21.295000-07:00
Trying to extrapolate 152.078210444 > 2 * original 0, resetting add_dist = 0
-------
015-08-10T13:33:10.166000-07:00 -> 2015-08-10T13:38:19.079000-07:00
Trying to extrapolate 222.200772719 > 2 * original 61.2697758242, resetting add_dist = 122.539551648
-------
2015-08-15T08:01:36.117000-07:00 -> 2015-08-15T08:24:22.215000-07:00
Trying to extrapolate 21136.3439166 > 2 * original 263.981657593, resetting add_dist = 527.963315187
-------
2015-09-29T10:17:23-07:00 -> 2015-09-29T10:18:53-07:00
Trying to extrapolate 7587.72883461 > 2 * original 0.14422679578, resetting add_dist = 0.288453591559
-------
2015-11-06T14:57:15-08:00 -> 2015-11-06T15:05:38-08:00
Trying to extrapolate 55970.668353 > 2 * original 21.463919706, resetting add_dist = 42.927839412
-------
2015-11-06T18:39:34-08:00 -> 2015-11-06T20:54:31-08:00
Trying to extrapolate 55974.4351837 > 2 * original 10.3389658458, resetting add_dist = 20.6779316916
-------
2016-08-26T10:47:23-07:00 -> 2016-08-26T10:47:23-07:00
Trying to extrapolate 186.035903544 > 2 * original 0, resetting add_dist = 0
-------
2016-08-26T12:22:30-07:00 -> 2016-08-26T12:28:08-07:00
Trying to extrapolate 223.156442709 > 2 * original 21.908850429, resetting add_dist = 43.8177008579
-------
2016-08-26T18:01:04-07:00 -> 2016-08-26T18:10:03-07:00
Trying to extrapolate 35674.1330809 > 2 * original 479.985531524, resetting add_dist = 959.971063048
-------
2016-09-03T08:12:48.494000-07:00 -> 2016-09-03T08:16:04-07:00
Trying to extrapolate 2428.80580027 > 2 * original 1179.72856372, resetting add_dist = 2359.45712743
Example1 | Example 2 | Example 3 |
---|---|---|
Logs highlighting trips that are "weird". useful for checking and adding unit tests later. too_much_extrapolation.log.zip
Here are the examples above after the fix for setting the whole section as UNKNOWN and for https://github.com/e-mission/e-mission-server/issues/378#issuecomment-246036383.
Example 1 | Example 2 | Example 3 |
---|---|---|
Just to check on the differences on 6th Nov 2015 and 3 Sep 2016, let's check if there was some untracked time around that time.
Yup!
{'data': Untrackedtime({u'distance': 1901.1444322143084,
'end_place': ObjectId('57d37e86f6858f7be0293f02'),
u'start_loc': {u'type': u'Point', u'coordinates': [-122.0864147, 37.3908493]},
u'end_ts': 1472915568.494,
u'start_ts': 1472875421,
u'start_fmt_time': u'2016-09-02T21:03:41-07:00',
u'end_loc': {u'type': u'Point', u'coordinates': [-122.0873396, 37.3737677]},
u'source': u'DwellSegmentationTimeFilter',
'start_place': ObjectId('57d37e86f6858f7be0293f01'),
u'end_fmt_time': u'2016-09-03T08:12:48.494000-07:00',
u'duration': 40147.49399995804,
'_id': ObjectId('57d37c9cf6858f7be0293760'),
'key': 'analysis/cleaned_untracked'}
Nope, no untracked time. A really long time at a place instead. Let's check if that is legit.
2016-09-09 20:31:21,864:DEBUG:Inserting entry Entry({'data': Cleanedplace(
{u'enter_fmt_time': u'2015-11-06T20:54:31-08:00',
'display_name': u'South Shoreline Boulevard, Mountain View',
'exit_fmt_time': '2015-11-08T11:22:29-08:00',
'ending_trip': ObjectId('57d379bbf6858f7be0280841'),
'starting_trip': ObjectId('57d379bff6858f7be0280af9'),
u'source': u'DwellSegmentationTimeFilter',
u'location': {u'type': u'Point', u'coordinates': [-122.0862597, 37.3909335]},
u'enter_ts': 1446872071,
'duration': 138478,
'raw_places': [ObjectId('57d37245f6858f616234ee51'),
ObjectId('57d37245f6858f616234ee51'),
ObjectId('57d37245f6858f616234ee53')],
'exit_ts': 1447010549}),
'_id': ObjectId('57d37cfbf6858f7be02939dc'),
'key': 'analysis/cleaned_place'})}) into timeseries
Looks legit to me!
2016-09-09 19:19:43,775:DEBUG:------------------------------2015-11-06T20:54:01-08:00------------------------------
2016-09-09 19:19:43,782:DEBUG:Too few points to make a decision, continuing
2016-09-09 19:19:43,783:DEBUG:------------------------------2015-11-06T20:54:31-08:00------------------------------
2016-09-09 19:19:43,791:DEBUG:Too few points to make a decision, continuing
...
2016-09-09 19:19:43,837:DEBUG:------------------------------2015-11-06T20:57:01-08:00------------------------------
2016-09-09 19:19:43,849:DEBUG:last5MinsDistances.max() = 3.63353401878, last10PointsDistance.max() = 3.63353401878
2016-09-09 19:19:43,851:DEBUG:Appending last_trip_end_point AttrDict({u'loc': {u'type': u'Point', u'coordinates': [-122.0862597, 37.3909335]}, u'ts': 1446872071.0, u'fmt_time': u'2015-11-06T20:54:31-08:00', '_id': ObjectId('563d867d7d65cb39ee9a8d79')) with index 21220
2016-09-09 19:19:43,851:INFO:Found trip end at 2015-11-06T20:54:31-08:00
2016-09-09 19:19:43,853:DEBUG:------------------------------2015-11-08T11:20:59-08:00------------------------------
2016-09-09 19:19:43,855:DEBUG:Setting new trip start point AttrDict({u'loc': {u'type': u'Point', u'coordinates': [-122.0862367, 37.3909977]}, u'ts': 1447010459.0, u'fmt_time': u'2015-11-08T11:20:59-08:00', '_id': ObjectId('563fa3127d65cb39ee9a92ef')}) with idx 21227
And the location is practically identical. I guess I stayed overnight for two nights in Berkeley?
In [2]: ecc.calDistance([-122.0862597, 37.3909335], [-122.0862367, 37.3909977])
Out[2]: 7.422267207749954
this is now tracking the enhancement of splitting the section.
The 1:1 mapping between raw and filtered is really biting us - we can't insert a new UNKNOWN section to represent unknown interpolation, for example. We sometimes skip data while cleaning but not add. That is a limitation that needs to be addressed.
See https://github.com/e-mission/e-mission-server/issues/378#issuecomment-245962837 for an example