Open shankari opened 8 years ago
I forgot to update the server to handle android issues and a bunch of trips are marked as UNKNOWN. This might be a good excuse to actually fix the pipeline reset.
I think that it is as simple as the following.
To reset the pipeline to time t
for user u
t
for user u
, its exit time is t1
t1
t
for user u
, its exit time is t2
t2
u
to t1
u
to t2
That should work.
I thought that the above won't work, because it may break the mapping between cleaned and raw places. But actually, it will because the raw ts and the cleaned timestamp are not synchronized.
But it will also work to just use t1
for both. See scanned images below...
The initial implementation has all the reset code in the script. But that means that it cannot be unit tested. Let's move the code to the pipeline module so that it can be tested.
Let's see what functionality the module needs.
We want to support:
user options
reset to options
The resulting use case matrix looks like this:
all users | all platform users | input list of users | |
---|---|---|---|
reset to start | |||
reset to day |
Note that the all platform users case reduces to the input list of users case after retrieving the list of users for that platform.
So the remaining matrix is:
all users | input list of users | |
---|---|---|
reset to start | delete all analysis entries, remove all pipeline state | delete all analysis entries, remove all pipeline state for each user (or list of users using $in |
reset to day | reduces to list of users since last ts is different for each user |
follow algorithm above to determine the last ts for each user, remove entries after it and reset state to it |
This makes it clear that we need at least two main functions:
reset_user_to_start
reset_user_to_ts
While those two are sufficient if we reduce the all users, reset to start case to the input list of users, reset to start case, since just deleting everything is so performant, we may want to add another reset_all_users_to_start
method as an optimization.
And the matrix becomes:
all users | input list of users | |
---|---|---|
reset to start | reset_all_users_to_start |
reset_user_to_start |
reset to day | reset_user_to_ts |
reset_user_to_ts |
So the algorithm in https://github.com/e-mission/e-mission-server/issues/333#issuecomment-312167112 doesn't work perfectly because of the mismatch between the time that the trip end is detected and the trip actually ends (the dwell time).
So when we run the pipeline a second time, the trips could be off by ~ 5 minutes.
Here's an example.
Mismatch: "2016-07-22T16:59:46.998718-07:00" != "2016-07-22T16:51:35.782000-07:00"
2017-07-03 10:48:37,851:DEBUG:140735683756992:last_cleaned_place = Entry({u'user_id': UUID('86241484-c66a-4a67-be19-03a79ad8f9fe'),
u'_id': ObjectId('595a8371f6858fb1bb8fc639'),
u'data': {u'enter_fmt_time': u'2016-07-22T16:51:13-07:00',
u'exit_fmt_time': u'2016-07-22T16:59:46.998718-07:00',
u'ending_trip': ObjectId('595a836ff6858fb1bb8fc607'),
u'starting_trip': ObjectId('595a8370f6858fb1bb8fc626'),
u'source': u'DwellSegmentationTimeFilter',
u'location': {u'type': u'Point',
u'coordinates': [-122.2593545, 37.8688199]},
u'enter_ts': 1469231473,
u'duration': 513.9987180233002,
u'exit_ts': 1469231986.998718},
u'key': u'analysis/cleaned_place'}})
Resetting timestamp to 1469231473 = 2016-07-22T16:51:13-07:00
. So far so
good.
2017-07-03 10:48:38,027:INFO:140735683756992:For stage PipelineStages.TRIP_SEGMENTATION, start_ts = 2016-07-22T23:51:13
2017-07-03 10:48:38,091:DEBUG:140735683756992:------------------------------2016-07-22T16:51:35.782000-07:00------------------------------
2017-07-03 10:48:38,092:DEBUG:140735683756992:Appending currPoint because the current start point is None
2017-07-03 10:48:38,092:DEBUG:140735683756992:Setting new trip start point AttrDict(u'fmt_time': u'2016-07-22T16:51:35.782000-07:00') with idx 0
2017-07-03 10:48:38,142:DEBUG:140735683756992:------------------------------2016-07-22T16:54:08.368000-07:00------------------------------
2017-07-03 10:48:38,154:DEBUG:140735683756992:Too few points to make a decision, continuing
2017-07-03 10:48:38,154:DEBUG:140735683756992:------------------------------2016-07-22T17:01:21.964000-07:00------------------------------
2017-07-03 10:48:38,178:DEBUG:140735683756992:Too few points to make a decision, continuing
2017-07-03 10:48:38,578:DEBUG:140735683756992:------------------------------2016-07-22T17:08:10.604000-07:00------------------------------
2017-07-03 10:48:38,602:INFO:140735683756992:Found trip end at 2016-07-22T17:05:52.818000-07:00
What happened the first time?
2017-07-03 10:48:28,139:DEBUG:140735683756992:------------------------------2016-07-22T16:50:38.356000-07:00------------------------------
2017-07-03 10:48:28,157:DEBUG:140735683756992:last5MinsDistances.max() = 251.583191842, last10PointsDistance.max() = 251.583191842
2017-07-03 10:48:28,158:DEBUG:140735683756992:------------------------------2016-07-22T16:50:53.926000-07:00------------------------------
2017-07-03 10:48:28,184:DEBUG:140735683756992:last5MinsDistances.max() = 251.583191842, last10PointsDistance.max() = 176.953658227
2017-07-03 10:48:28,185:DEBUG:140735683756992:------------------------------2016-07-22T16:51:13-07:00------------------------------
2017-07-03 10:48:28,204:DEBUG:140735683756992:Too few points to make a decision, continuing
2017-07-03 10:48:28,205:DEBUG:140735683756992:------------------------------2016-07-22T16:51:35.782000-07:00------------------------------
2017-07-03 10:48:28,223:DEBUG:140735683756992:last5MinsDistances.max() = 190.03242523, last10PointsDistance.max() = 151.676167225
2017-07-03 10:48:28,223:DEBUG:140735683756992:------------------------------2016-07-22T16:51:59.776000-07:00------------------------------
...
2017-07-03 10:48:28,306:DEBUG:140735683756992:------------------------------2016-07-22T16:54:08.368000-07:00------------------------------
2017-07-03 10:48:28,331:INFO:140735683756992:Found trip end at 2016-07-22T16:51:13-07:00
2017-07-03 10:48:28,332:DEBUG:140735683756992:------------------------------2016-07-22T17:01:21.964000-07:00------------------------------
2017-07-03 10:48:28,340:DEBUG:140735683756992:prev_point is None, continuing trip
2017-07-03 10:48:28,340:DEBUG:140735683756992:Too few points to make a decision, continuing
2017-07-03 10:48:28,750:INFO:140735683756992:Found trip end at 2016-07-22T17:05:52.818000-07:00
As we can see, in the original run, we detected the trip end while processing
the entry at 16:54
, but the trip end was at 16:51
. So we end up with two
trips (16:40 - 16:51 and 17:01 - 17:05).
2017-07-03 10:48:29,177:DEBUG:140735683756992
start_loc = u'fmt_time': u'2016-07-22T16:40:52-07:00', u'ts': 1469230852
end_loc = u'fmt_time': u'2016-07-22T16:50:01-07:00', u'ts': 1469231401.358
2017-07-03 10:48:29,207:DEBUG:140735683756992
start_loc = u'fmt_time': u'2016-07-22T17:01:22-07:00', u'ts': 1469232081.964
end_loc = u'fmt_time': u'2016-07-22T17:05:53-07:00', u'ts': 1469232352.818
But the second time, because we reset the pipeline to 16:51
, we start the
trip at 16:51
and we end up with a single trip from 16:51 - 17:05.
2017-07-03 10:48:39,835:DEBUG:140735683756992
start_loc = u'fmt_time': u'2016-07-22T16:51:36-07:00', u'ts': 1469231495.782
end_loc = u'fmt_time': u'2016-07-22T17:05:53-07:00', u'ts': 1469232352.818
So the final result is:
BEFORE:
2016-07-22T09:26:08-07:00 - 2016-07-22T11:44:50-07:00
2016-07-22T16:38:06-07:00 - 2016-07-22T16:51:13-07:00
2016-07-22T16:59:47-07:00 - 2016-07-22T17:05:53-07:00
2016-07-22T17:12:19-07:00 - 2016-07-22T17:21:36-07:00
AFTER:
2016-07-22T09:26:08-07:00 - 2016-07-22T11:44:50-07:00
2016-07-22T16:38:06-07:00 - 2016-07-22T16:51:13-07:00
2016-07-22T16:51:36-07:00 - 2016-07-22T17:04:45-07:00
2016-07-22T17:12:19-07:00 - 2016-07-22T17:21:36-07:00
So it looks like there are two main options:
We just live with this.
But on the other hand, it doesn't retain the idempotency of the pipeline
We try to use last_place_enter_ts + delta. only question is: what should delta be?
I'm going to try with setting the delta to 5 mins and see if that helps...
Also note that for this to work, it needs to be 5 mins from the enter ts of the first raw place. Let's see if that works here.
Cleaned place = 595a8371f6858fb1bb8fc639 Raw place = 595a836df6858fb1bb8fc51c
which is
2017-07-03 10:48:29,228:DEBUG:140735683756992:Saving entry Entry(
{'data': Rawplace({'enter_fmt_time': u'2016-07-22T16:51:13-07:00',
'ending_trip': ObjectId('595a836df6858fb1bb8fc51b'),
'starting_trip': ObjectId('595a836df6858fb1bb8fc51d'),
'duration': 608.9639999866486,
'exit_fmt_time': u'2016-07-22T17:01:21.964000-07:00',
'enter_ts': 1469231473,
'exit_ts': 1469232081.964}),
'_id': ObjectId('595a836df6858fb1bb8fc51c'),
'user_id': UUID('86241484-c66a-4a67-be19-03a79ad8f9fe'),
'key': 'segmentation/raw_place'})}) into timeseries
Hm. Maybe it doesn't matter so much - the start time is the same, but the end time is different. On the other hand, there is no harm with using the start time from the raw place since we do perform some magic stuff for the end time of the trip and that would change the start time of the subsequent place.
That seems to work for the raw segmentation.
2017-07-03 14:21:21,552:DEBUG:140735683756992
start_loc = u'fmt_time': u'2016-07-22T16:40:23-07:00', u'ts': 1469230823.024
end_loc = u'fmt_time': u'2016-07-22T16:51:13-07:00', u'ts': 1469231473
2017-07-03 14:21:21,578:DEBUG:140735683756992
start_loc = u'fmt_time': u'2016-07-22T17:01:22-07:00', u'ts': 1469232081.964
end_loc = u'fmt_time': u'2016-07-22T17:05:53-07:00', u'ts': 1469232352.818
2017-07-03 14:21:29,568:DEBUG:140735683756992:reset_ts = 1469231473
2017-07-03 14:21:31,293:DEBUG:140735683756992
start_loc = u'fmt_time': u'2016-07-22T17:01:22-07:00', u'ts': 1469232081.964
end_loc = u'fmt_time': u'2016-07-22T17:05:53-07:00', u'ts': 1469232352.818
But the list of cleaned trips is now worse.
2016-07-22T09:26:08-07:00 2016-07-22T11:44:50-07:00
2016-07-22T16:38:06.693176-07:00 2016-07-22T16:51:13-07:00
2016-07-22T17:01:21.964000-07:00 2016-07-22T17:05:52.818000-07:00
2016-07-22T17:12:19.019962-07:00 2016-07-22T17:21:36-07:00
It looks like the extrapolation is not working well for the second round...
2017-07-03 14:21:21,600:DEBUG:140735683756992
Inserting entry Entry({'data': Rawtrip({'end_place': ObjectId('595ab551f6858fb93922a521'),
'start_ts': 1469232081.964,
'start_place': ObjectId('595ab551f6858fb93922a51f'),
'duration': 270.85400009155273,
'distance': 82.88829997716516,
'end_ts': 1469232352.818,
'start_fmt_time': u'2016-07-22T17:01:21.964000-07:00',
'source': 'DwellSegmentationTimeFilter',
'end_fmt_time': u'2016-07-22T17:05:52.818000-07:00',
'_id': ObjectId('595ab551f6858fb93922a520'),
'user_id': UUID('1fa90c82-e6b1-4a00-b1ef-edc9c20b428a'),
'key': 'segmentation/raw_trip'})}) into timeseries
2017-07-03 14:21:22,707:DEBUG:140735683756992:Considering trip 595ab551f6858fb93922a520: 2016-07-22T17:01:21.964000-07:00 -> 2016-07-22T17:05:52.818000-07:00
only one section:
2017-07-03 14:21:24,200:DEBUG:140735683756992
Starting with element of type trip,
id 595ab552f6858fb93922a52f,
details Entry({u'data': {u'sensed_mode': 2,
u'end_ts': 1469232352.818,
u'start_ts': 1469232081.964,
u'start_fmt_time': u'2016-07-22T17:01:21.964000-07:00',
u'source': u'SmoothedHighConfidenceMotion',
u'end_fmt_time': u'2016-07-22T17:05:52.818000-07:00',
u'duration': 270.85400009155273,
u'trip_id': ObjectId('595ab551f6858fb93922a520'),
u'_id': ObjectId('595ab552f6858fb93922a52f'),
u'user_id': UUID('1fa90c82-e6b1-4a00-b1ef-edc9c20b428a'),
u'key': u'segmentation/raw_section'}})
2017-07-03 14:21:24,206:DEBUG:140735683756992:deleting 0 points from section points
2017-07-03 14:21:24,209:DEBUG:140735683756992:Found 13 results
2017-07-03 14:21:24,211:DEBUG:140735683756992:After de-duping, converted 13 points to 13
2017-07-03 14:21:24,214:DEBUG:140735683756992:Found first section, may need to extrapolate start point
2017-07-03 14:21:24,215:DEBUG:140735683756992:At index 0, loc is 5792bb92abbf384cadf84cd7
2017-07-03 14:21:24,216:DEBUG:140735683756992:Raw place is 595ab551f6858fb93922a51f and corresponding location is 5792bb92abbf384cadf84cbf
2017-07-03 14:21:24,229:DEBUG:140735683756992:Adding distance 132.789740146 to original 631.078945882 to extend section start from [-122.2599829, 37.8699062] to [-122.2593545, 37.8688199]
2017-07-03 14:21:24,231:DEBUG:140735683756992:After subtracting time 94.9652820442 from original 270.854000092 to cover additional distance 132.789740146 at speed 1.39829774932, new_start_ts = 1469231987.0
2017-07-03 14:21:24,296:DEBUG:140735683756992:Resampling data of size 14
2017-07-03 14:21:24,302:DEBUG:140735683756992:Resampling entry list fmt_time ts longitude latitude
0 2016-07-22T16:59:46.998718-07:00 1.469232e+09 -122.259355 37.868820
1 2016-07-22T17:01:21.964000-07:00 1.469232e+09 -122.259983 37.869906
2 2016-07-22T17:01:53-07:00 1.469232e+09 -122.259550 37.870440
3 2016-07-22T17:01:57.839000-07:00 1.469232e+09 -122.259550 37.870440
4 2016-07-22T17:02:34-07:00 1.469232e+09 -122.259573 37.870521 of size 14
2017-07-03 14:21:31,298:DEBUG:140735683756992
Inserting entry Entry({'data': Rawtrip({'end_place': ObjectId('595ab55bf6858fb93922a689'),
'start_ts': 1469232081.964,
'start_place': ObjectId('595ab55bf6858fb93922a687'),
'duration': 270.85400009155273,
'distance': 82.88829997716516,
'end_ts': 1469232352.818,
'start_fmt_time': u'2016-07-22T17:01:21.964000-07:00',
'source': 'DwellSegmentationTimeFilter',
'end_fmt_time': u'2016-07-22T17:05:52.818000-07:00',
'_id': ObjectId('595ab55bf6858fb93922a688'),
'user_id': UUID('1fa90c82-e6b1-4a00-b1ef-edc9c20b428a'),
'key': 'segmentation/raw_trip'})}) into timeseries
2017-07-03 14:21:31,823:DEBUG:140735683756992:Considering trip 595ab55bf6858fb93922a688: 2016-07-22T17:01:21.964000-07:00 -> 2016-07-22T17:05:52.818000-07:00
only one section:
2017-07-03 14:21:31,867:DEBUG:140735683756992
Starting with element of type trip,
id 595ab55bf6858fb93922a692,
details Entry({u'data': {u'sensed_mode': 2,
u'end_ts': 1469232352.818,
u'start_ts': 1469232081.964,
u'start_fmt_time': u'2016-07-22T17:01:21.964000-07:00',
u'source': u'SmoothedHighConfidenceMotion',
u'end_fmt_time': u'2016-07-22T17:05:52.818000-07:00',
u'duration': 270.85400009155273,
u'trip_id': ObjectId('595ab55bf6858fb93922a688'),
u'_id': ObjectId('595ab55bf6858fb93922a692'),
u'user_id': UUID('1fa90c82-e6b1-4a00-b1ef-edc9c20b428a'),
u'key': u'segmentation/raw_section'}})
2017-07-03 14:21:31,873:DEBUG:140735683756992:deleting 0 points from section points
2017-07-03 14:21:31,877:DEBUG:140735683756992:Found 13 results
2017-07-03 14:21:31,879:DEBUG:140735683756992:After de-duping, converted 13 points to 13
2017-07-03 14:21:31,881:DEBUG:140735683756992:Found first section, may need to extrapolate start point
2017-07-03 14:21:31,882:DEBUG:140735683756992:At index 0, loc is 5792bb92abbf384cadf84cd7
2017-07-03 14:21:31,884:DEBUG:140735683756992:Raw place is 595ab55bf6858fb93922a687 and corresponding location is 595ab55bf6858fb93922a69b
2017-07-03 14:21:31,900:DEBUG:140735683756992:Adding distance 0.0 to original 631.078945882 to extend section start from [-122.2599829, 37.8699062] to [-122.2599829, 37.8699062]
Ok so this is because the place and the start of the section are the same.
And I think that is because during the raw segmentation, we weren't able to link to the prior chain, e.g.
2017-07-03 14:21:31,295:DEBUG:140735683756992:last_place.enter_ts = None
2017-07-03 14:21:31,295:DEBUG:140735683756992:start of a chain, unable to check for restart from previous trip end, assuming not restarted
Let's look at the appropriate context. Here's the code.
ts = esta.TimeSeries.get_time_series(user_id)
last_place_entry = esdp.get_last_place_entry(esda.RAW_PLACE_KEY, user_id)
if last_place_entry is None:
last_place = start_new_chain(user_id)
....
def start_new_chain(uuid):
"""
Can't find the place that is the end of an existing chain, so we need to
create a new one. This might correspond to the start of tracking, or to an
improperly terminated chain.
"""
start_place = ecwrp.Rawplace()
logging.debug("Starting tracking, created new start of chain %s" % start_place)
return start_place
And sure enough, that's what we see
2017-07-03 14:21:31,289:DEBUG:140735683756992:Iterated over all points, just_ended = True, len(transition_df) = 10
2017-07-03 14:21:31,290:DEBUG:140735683756992:last place doc = None
2017-07-03 14:21:31,290:DEBUG:140735683756992:Starting tracking, created new start of chain Rawplace({})
2017-07-03 14:21:31,292:DEBUG:140735683756992:segmentation_point_list has length 4
Why is last place doc = None
?
Note that from the first run, the start raw place was
595ab551f6858fb93922a51f
.
We should NOT delete it, simply resetting it instead.
Here are the raw places that were reset.
2017-07-03 14:21:29,568:DEBUG:140735683756992
last_raw_place = Entry({u'data':
{u'enter_fmt_time': u'2016-07-22T16:51:13-07:00',
u'exit_fmt_time': u'2016-07-22T17:01:21.964000-07:00',
u'ending_trip': ObjectId('595ab551f6858fb93922a51e'),
u'starting_trip': ObjectId('595ab551f6858fb93922a520'),
u'source': u'DwellSegmentationTimeFilter',
u'enter_ts': 1469231473,
u'duration': 608.9639999866486,
u'exit_ts': 1469232081.964},
u'_id': ObjectId('595ab551f6858fb93922a51f'),
u'user_id': UUID('1fa90c82-e6b1-4a00-b1ef-edc9c20b428a'),
u'key': u'segmentation/raw_place'}})
2017-07-03 14:21:29,568:DEBUG:140735683756992
first_raw_place = Entry({u'data':
{u'enter_fmt_time': u'2016-07-22T16:51:13-07:00',
u'exit_fmt_time': u'2016-07-22T17:01:21.964000-07:00',
u'ending_trip': ObjectId('595ab551f6858fb93922a51e'),
u'starting_trip': ObjectId('595ab551f6858fb93922a520'),
u'source': u'DwellSegmentationTimeFilter',
u'enter_ts': 1469231473,
u'duration': 608.9639999866486,
u'exit_ts': 1469232081.964},
u'_id': ObjectId('595ab551f6858fb93922a51f'),
u'user_id': UUID('1fa90c82-e6b1-4a00-b1ef-edc9c20b428a'),
u'key': u'segmentation/raw_place'}})
Note that they are both the same, and both have _id = 595ab551f6858fb93922a51f
.
Then, we reset the place
2017-07-03 14:21:29,696:DEBUG:140735683756992:match query = {'_id': ObjectId('595ab551f6858fb93922a51f')}
2017-07-03 14:21:29,696:DEBUG:140735683756992:reset_query = {'$unset': {'exit_local_dt': '', 'exit_fmt_time': '', 'starting_time': '', 'duration': '', 'raw_places': '', 'exit_ts': ''}}
But then it is None. The reset is not working for some reason. WTF!
2017-07-03 14:21:29,737:DEBUG:140735683756992:after update, entry is None
And the really weird thing is that the place actually still exists.
2017-07-03 14:21:21,600:DEBUG:140735683756992
Inserting entry Entry({'data': Rawtrip({'end_place': ObjectId('595ab551f6858fb93922a521'),
'start_ts': 1469232081.964,
'start_place': ObjectId('595ab551f6858fb93922a51f'),
'duration': 270.85400009155273,
'distance': 82.88829997716516,
'end_ts': 1469232352.818,
'start_fmt_time': u'2016-07-22T17:01:21.964000-07:00',
'source': 'DwellSegmentationTimeFilter',
'end_fmt_time': u'2016-07-22T17:05:52.818000-07:00',
'_id': ObjectId('595ab551f6858fb93922a520'),
'user_id': UUID('1fa90c82-e6b1-4a00-b1ef-edc9c20b428a'),
'key': 'segmentation/raw_trip'})}) into timeseries
and the ids for the trip's start_place and place's starting trip and start/end times match up just fine.
{u'_id': ObjectId('595ab551f6858fb93922a51f'),
u'data': {u'duration': 608.9639999866486,
u'ending_trip': ObjectId('595ab551f6858fb93922a51e'),
u'enter_fmt_time': u'2016-07-22T16:51:13-07:00',
u'enter_ts': 1469231473,
u'exit_fmt_time': u'2016-07-22T17:01:21.964000-07:00',
u'exit_ts': 1469232081.964,
u'location': {u'coordinates': [-122.2593545, 37.8688199], u'type': u'Point'},
u'source': u'DwellSegmentationTimeFilter',
u'starting_trip': ObjectId('595ab551f6858fb93922a520')},
u'write_ts': 1499116881.553992},
u'user_id': UUID('1fa90c82-e6b1-4a00-b1ef-edc9c20b428a')}
So that must mean that the initial query is not correct.
Ok after fixing the unset, we now have a last_place_doc with blank enter information.
2017-07-03 16:03:45,558:DEBUG:140735683756992
last place doc = {u'user_id': UUID('89e0955f-0e3e-4f8d-91e5-e5c84fa0c2ee'),
u'_id': ObjectId('595acd45f6858fbd4e669f10'),
u'data': {u'enter_fmt_time': u'2016-07-22T16:51:13-07:00',
u'ending_trip': ObjectId('595acd45f6858fbd4e669f0f'),
u'source': u'DwellSegmentationTimeFilter',
u'enter_ts': 1469231473},
u'key': u'segmentation/raw_place'}}
2017-07-03 16:03:45,560:DEBUG:140735683756992
start_loc = u'fmt_time': u'2016-07-22T17:01:22-07:00', u'ts': 1469232081.964
end_loc = u'fmt_time': u'2016-07-22T17:05:52-07:00', u'ts': 1469232352.818
2017-07-03 16:03:45,583:DEBUG:140735683756992
Inserting entry Entry(
{'data': Rawtrip({'end_place': ObjectId('595acd51f6858fbd4e66a079'),
'start_ts': 1469232081.964,
'start_place': ObjectId('595acd45f6858fbd4e669f10'),
'duration': 270.85400009155273,
'distance': 82.88829997716516,
'end_ts': 1469232352.818,
'start_fmt_time': u'2016-07-22T17:01:21.964000-07:00',
'source': 'DwellSegmentationTimeFilter',
'end_fmt_time': u'2016-07-22T17:05:52.818000-07:00',
'_id': ObjectId('595acd51f6858fbd4e66a078'),
'user_id': UUID('89e0955f-0e3e-4f8d-91e5-e5c84fa0c2ee'),
'key': 'segmentation/raw_trip'})}) into timeseries
2017-07-03 16:03:46,192:DEBUG:140735683756992:Considering trip 595acd51f6858fbd4e66a078: 2016-07-22T17:01:21.964000-07:00 -> 2016-07-22T17:05:52.818000-07:00
2017-07-03 16:03:46,248:DEBUG:140735683756992:Saving entries into cleaned section Cleanedsection({u'source': u'SmoothedHighConfidenceMotion', 'trip_id': ObjectId('595acd52f6858fbd4e66a08a')})
2017-07-03 16:03:46,253:DEBUG:140735683756992:deleting 0 points from section points
2017-07-03 16:03:46,258:DEBUG:140735683756992:Found 13 results
2017-07-03 16:03:46,261:DEBUG:140735683756992:After de-duping, converted 13 points to 13
2017-07-03 16:03:46,265:DEBUG:140735683756992:After filtering, rows went from 13 -> 13, cols from 22 -> 21
2017-07-03 16:03:46,265:DEBUG:140735683756992:Found first section, may need to extrapolate start point
2017-07-03 16:03:46,267:DEBUG:140735683756992:At index 0, loc is 5792bb92abbf384cadf84cd7
2017-07-03 16:03:46,267:DEBUG:140735683756992:get_entry_at_ts query = {'metadata.key': 'background/filtered_location', 'user_id': UUID('89e0955f-0e3e-4f8d-91e5-e5c84fa0c2ee'), 'data.ts': 1469231473}
2017-07-03 16:03:46,268:DEBUG:140735683756992:Raw place is 595acd45f6858fbd4e669f10 and corresponding location is 5792bb92abbf384cadf84cbf
2017-07-03 16:03:46,285:DEBUG:140735683756992:Adding distance 132.789740146 to original 631.078945882 to extend section start from [-122.2599829, 37.8699062] to [-122.2593545, 37.8688199]
2017-07-03 16:03:46,288:DEBUG:140735683756992:After subtracting time 94.9652820442 from original 270.854000092 to cover additional distance 132.789740146 at speed 1.39829774932, new_start_ts = 1469231987.0
Corresponding cleaned trip is
2017-07-03 16:03:46,489:DEBUG:140735683756992:For raw trip 595acd51f6858fbd4e66a078, found filtered trip 595acd52f6858fbd4e66a08a
Failed because of no raw_places
array.
Need to set that to empty instead of unsetting it
2017-07-03 16:03:47,290:DEBUG:140735683756992:for trip Entry({'data': Cleanedtrip({'distance': 562.52852529365282, 'raw_trip': ObjectId('595acd51f6858fbd4e66a078'), 'start_loc': {u'type': u'Point', u'coordinates': [-122.2593545, 37.8688199]}, u'end_ts': 1469232352.8180001, 'start_ts': 1469231986.998718, 'start_fmt_time': '2016-07-22T16:59:46.998718-07:00', u'end_loc': {u'type': u'Point', u'coordinates': [-122.2602491, 37.8706214]}, u'source': u'DwellSegmentationTimeFilter', u'end_fmt_time': '2016-07-22T17:05:52.818000-07:00', u'end_local_dt': LocalDate({'hour': 17, 'month': 7, 'second': 52, 'weekday': 4, 'year': 2016, 'timezone': u'America/Los_Angeles', 'day': 22, 'minute': 5}), 'duration': 365.81928205490112, 'start_local_dt': LocalDate({'hour': 16, 'month': 7, 'second': 46, 'weekday': 4, 'year': 2016, 'timezone': u'America/Los_Angeles', 'day': 22, 'minute': 59})}), '_id': ObjectId('595acd52f6858fbd4e66a08a'), 'user_id': UUID('89e0955f-0e3e-4f8d-91e5-e5c84fa0c2ee'), 'metadata': Metadata({'write_fmt_time': '2017-07-03T16:03:46.246494-07:00', 'write_ts': 1499123026.246494, 'time_zone': 'America/Los_Angeles', 'platform': 'server', 'write_local_dt': LocalDate({'hour': 16, 'month': 7, 'second': 46, 'weekday': 0, 'year': 2017, 'timezone': 'America/Los_Angeles', 'day': 3, 'minute': 3}), 'key': 'analysis/cleaned_trip'})}) start, converting Entry({u'data': {u'enter_fmt_time': u'2016-07-22T16:51:13-07:00', u'display_name': u'Southside, Berkeley', u'enter_local_dt': {u'hour': 16, u'month': 7, u'second': 13, u'weekday': 4, u'year': 2016, u'timezone': u'America/Los_Angeles', u'day': 22, u'minute': 51}, u'ending_trip': ObjectId('595acd48f6858fbd4e669ffb'), u'source': u'DwellSegmentationTimeFilter', u'location': {u'type': u'Point', u'coordinates': [-122.2593545, 37.8688199]}, u'enter_ts': 1469231473}, u'_id': ObjectId('595acd49f6858fbd4e66a02d'), u'user_id': UUID('89e0955f-0e3e-4f8d-91e5-e5c84fa0c2ee'), u'metadata': {u'write_fmt_time': u'2017-07-03T16:03:37.654076-07:00', u'write_ts': 1499123017.654076, u'time_zone': u'America/Los_Angeles', u'platform': u'server', u'write_local_dt': {u'hour': 16, u'month': 7, u'second': 37, u'weekday': 0, u'year': 2017, u'timezone': u'America/Los_Angeles', u'day': 3, u'minute': 3}, u'key': u'analysis/cleaned_place'}}) to Entry({u'data': {u'enter_fmt_time': u'2016-07-22T16:51:13-07:00', u'exit_local_dt': {u'hour': 17, u'month': 7, u'second': 21, u'weekday': 4, u'year': 2016, u'timezone': u'America/Los_Angeles', u'day': 22, u'minute': 1}, u'exit_fmt_time': u'2016-07-22T17:01:21.964000-07:00', u'enter_local_dt': {u'hour': 16, u'month': 7, u'second': 13, u'weekday': 4, u'year': 2016, u'timezone': u'America/Los_Angeles', u'day': 22, u'minute': 51}, u'ending_trip': ObjectId('595acd45f6858fbd4e669f0f'), u'starting_trip': ObjectId('595acd51f6858fbd4e66a078'), u'source': u'DwellSegmentationTimeFilter', u'location': {u'type': u'Point', u'coordinates': [-122.2593545, 37.8688199]}, u'enter_ts': 1469231473, u'duration': 608.9639999866486, u'exit_ts': 1469232081.964}, u'_id': ObjectId('595acd45f6858fbd4e669f10'), u'user_id': UUID('89e0955f-0e3e-4f8d-91e5-e5c84fa0c2ee'), u'metadata': {u'write_fmt_time': u'2017-07-03T16:03:33.441360-07:00', u'write_ts': 1499123013.44136, u'time_zone': u'America/Los_Angeles', u'platform': u'server', u'write_local_dt': {u'hour': 16, u'month': 7, u'second': 33, u'weekday': 0, u'year': 2017, u'timezone': u'America/Los_Angeles', u'day': 3, u'minute': 3}, u'key': u'segmentation/raw_place'}})
2017-07-03 16:03:47,290:ERROR:140735683756992:Cleaning and resampling failed for user 89e0955f-0e3e-4f8d-91e5-e5c84fa0c2ee
Traceback (most recent call last):
File "/Users/shankari/e-mission/e-mission-server/emission/analysis/intake/cleaning/clean_and_resample.py", line 72, in clean_and_resample
last_raw_place = save_cleaned_segments_for_ts(user_id, time_query.startTs, time_query.endTs)
File "/Users/shankari/e-mission/e-mission-server/emission/analysis/intake/cleaning/clean_and_resample.py", line 98, in save_cleaned_segments_for_ts
return save_cleaned_segments_for_timeline(user_id, tl)
File "/Users/shankari/e-mission/e-mission-server/emission/analysis/intake/cleaning/clean_and_resample.py", line 124, in save_cleaned_segments_for_timeline
(last_cleaned_place, filtered_tl) = create_and_link_timeline(tl, user_id, trip_map)
File "/Users/shankari/e-mission/e-mission-server/emission/analysis/intake/cleaning/clean_and_resample.py", line 886, in create_and_link_timeline
link_trip_start(ts, curr_cleaned_trip, curr_cleaned_start_place, raw_start_place)
File "/Users/shankari/e-mission/e-mission-server/emission/analysis/intake/cleaning/clean_and_resample.py", line 957, in link_trip_start
cleaned_start_place_data.append_raw_place(raw_start_place.get_id())
File "/Users/shankari/e-mission/e-mission-server/emission/core/wrapper/cleanedplace.py", line 15, in append_raw_place
self["raw_places"].append(place_id)
KeyError: 'raw_places'
After these fixes, the trip reset function broke. In particular, this trip (which is the untracked trip)
2017-07-03 17:17:58,134:ERROR:140735683756992:
Found error 'NoneType' object has no attribute 'get_id' while processing trip Entry({u'data': {u'distance': 55389.54955322857,
u'end_place': ObjectId('595adeb2f6858fc0673d740e'),
u'end_ts': 1469491947.853,
u'start_ts': 1469233301,
u'start_fmt_time': u'2016-07-22T17:21:36-07:00',
u'source': u'DwellSegmentationTimeFilter',
u'start_place': ObjectId('595adeb2f6858fc0673d740d'),
u'end_fmt_time': u'2016-07-25T17:12:27.853000-07:00',
u'duration': 258646.8529999256,
u'_id': ObjectId('595adeb1f6858fc0673d73ed'),
u'user_id': UUID('2c808701-345c-4aea-92ac-aee03bc53e5e'),
u'key': u'analysis/cleaned_untracked'}})
has no end place.
In [102]: edb.get_analysis_timeseries_db().find({"_id": boi.ObjectId("595adeb2f6858fc0673d740e")}).count()
Out[102]: 0
Does it have a start place?
In [103]: edb.get_analysis_timeseries_db().find({"_id": boi.ObjectId("595adeb2f6858fc0673d740d")}).count()
Out[103]: 1
Yes. And it was
{u'_id': ObjectId('595adeb2f6858fc0673d740d'),
u'data': {u'display_name': u'Shattuck Avenue, Berkeley',
u'duration': 258651.8529999256,
u'ending_trip': ObjectId('595adeb1f6858fc0673d73d6'),
u'enter_fmt_time': u'2016-07-22T17:21:36-07:00',
u'enter_ts': 1469233296,
u'exit_fmt_time': u'2016-07-25T17:12:27.853000-07:00',
u'exit_ts': 1469491947.853,
u'location': {u'coordinates': [-122.26827, 37.8700362], u'type': u'Point'},
u'raw_places': [ObjectId('595adeb4f6858fc0673d7414')],
u'source': u'DwellSegmentationTimeFilter',
u'starting_trip': ObjectId('595adeb4f6858fc0673d741b')},
u'user_id': UUID('2c808701-345c-4aea-92ac-aee03bc53e5e')}
That is almost certainly the place that overlaps the reset. But why do we have that if we have the untracked time too?
We have
u'start_fmt_time': u'2016-07-22T17:21:36-07:00',
u'end_fmt_time': u'2016-07-25T17:12:27.853000-07:00',
u'key': u'analysis/cleaned_untracked'}})
u'enter_fmt_time': u'2016-07-22T17:21:36-07:00',
u'exit_fmt_time': u'2016-07-25T17:12:27.853000-07:00',
u'key': u'analysis/cleaned_place',
How can we have both untracked time and cleaned place that overlap with each other?
timeline appears to be:
trip (595adeb1f6858fc0673d73d6)
2016-07-22T17:21:36-07:00
place (595adeb2f6858fc0673d740d)
2016-07-25T17:12:27.853000-07:00
trip (595adeb4f6858fc0673d741b)
2016-07-25T17:18:14.841000-07:00
So somehow, the untracked_time trip got replaced by a place after the reset
2017-07-03 17:17:54,168:DEBUG:140735683756992:
for trip Entry({'data': Untrackedtime({u'distance': 55389.54955322857,
u'end_ts': 1469491947.853,
u'start_ts': 1469233301,
u'start_fmt_time': u'2016-07-22T17:21:36-07:00',
u'source': u'DwellSegmentationTimeFilter',
u'end_fmt_time': u'2016-07-25T17:12:27.853000-07:00',
u'duration': 258646.8529999256,
'_id': ObjectId('595adeb1f6858fc0673d73ed'),
'user_id': UUID('2c808701-345c-4aea-92ac-aee03bc53e5e'),
'key': 'analysis/cleaned_untracked'})})
converting Entry({'data': Cleanedplace(
{u'enter_fmt_time': u'2016-07-22T17:21:36-07:00',
u'source': u'DwellSegmentationTimeFilter',
'raw_places': [ObjectId('595adeb0f6858fc0673d73c9')],
'display_name': u'Shattuck Avenue, Berkeley',
u'enter_ts': 1469233296,
'ending_trip': ObjectId('595adeb1f6858fc0673d73d6')}),
'_id': ObjectId('595adeb2f6858fc0673d740d'),
'user_id': UUID('2c808701-345c-4aea-92ac-aee03bc53e5e'),
'key': 'analysis/cleaned_place'})})
2017-07-03 17:17:54,790:DEBUG:140735683756992:Considering place 595adeb2f6858fc0673d740d: 2016-07-22T17:21:36-07:00 -> 2016-07-22T17:21:36-07:00
...
2017-07-03 17:17:54,794:DEBUG:140735683756992:for trip = 595adeb1f6858fc0673d73ed, adding start_place 595adeb2f6858fc0673d740d and end_place 595adeb2f6858fc0673d740e
During compare
"enter_fmt_time": "2016-07-22T17:21:36-07:00",
"exit_fmt_time": "2016-07-22T17:21:36-07:00",
During reset
{u'_id': ObjectId('595adeb2f6858fc0673d740d'), u'data': {u'enter_fmt_time': u'2016-07-22T17:21:36-07:00', u'exit_fmt_time': u'2016-07-22T17:21:36-07:00'}},
2017-07-03 17:17:55,359:DEBUG:140735683756992
last_cleaned_place = Entry({u'user_id': UUID('2c808701-345c-4aea-92ac-aee03bc53e5e'),
u'_id': ObjectId('595adeb2f6858fc0673d740d'),
u'data': {u'enter_fmt_time': u'2016-07-22T17:21:36-07:00',
u'display_name': u'Shattuck Avenue, Berkeley',
u'exit_fmt_time': u'2016-07-22T17:21:36-07:00',
u'ending_trip': ObjectId('595adeb1f6858fc0673d73d6'),
u'starting_trip': ObjectId('595adeb1f6858fc0673d73ed'),
u'source': u'DwellSegmentationTimeFilter',
u'enter_ts': 1469233296,
u'duration': 5,
u'raw_places': [ObjectId('595adeb0f6858fc0673d73c9'),
ObjectId('595adeb0f6858fc0673d73c9')],
u'exit_ts': 1469233301},
u'key': u'analysis/cleaned_place'}})
2017-07-03 17:17:55,359
DEBUG:140735683756992:last_raw_place = Entry({u'data':
{u'enter_fmt_time': u'2016-07-22T17:21:36-07:00',
u'exit_fmt_time': u'2016-07-22T17:21:36-07:00',
u'ending_trip': ObjectId('595adeb0f6858fc0673d73c8'),
u'starting_trip': ObjectId('595adeb0f6858fc0673d73cc'),
u'source': u'DwellSegmentationTimeFilter',
u'enter_ts': 1469233296,
u'duration': 5,
u'exit_ts': 1469233301},
u'_id': ObjectId('595adeb0f6858fc0673d73c9'),
u'user_id': UUID('2c808701-345c-4aea-92ac-aee03bc53e5e'),
u'key': u'segmentation/raw_place'}})
2017-07-03 17:17:55,
360:DEBUG:140735683756992:first_raw_place = Entry(
{u'data': {u'enter_fmt_time': u'2016-07-22T17:21:36-07:00',
u'exit_fmt_time': u'2016-07-22T17:21:36-07:00',
u'ending_trip': ObjectId('595adeb0f6858fc0673d73c8'),
u'starting_trip': ObjectId('595adeb0f6858fc0673d73cc'),
u'source': u'DwellSegmentationTimeFilter',
u'enter_ts': 1469233296,
u'duration': 5,
u'exit_ts': 1469233301},
u'_id': ObjectId('595adeb0f6858fc0673d73c9'),
u'user_id': UUID('2c808701-345c-4aea-92ac-aee03bc53e5e'),
u'key': u'segmentation/raw_place'}})
Now we reset to 2016-07-22T17:21:41-07:00
2017-07-03 17:17:55,360:DEBUG:140735683756992:last_place_enter_ts = 1469233296
2017-07-03 17:17:55,360:DEBUG:140735683756992:reset_ts = 1469233296
And as expected, the exit entries are removed and raw_places
is set to []
2017-07-03 17:17:55,477:DEBUG:140735683756992:after update,
entry is {u'user_id': UUID('2c808701-345c-4aea-92ac-aee03bc53e5e'),
u'_id': ObjectId('595adeb2f6858fc0673d740d'),
u'data': {u'enter_fmt_time': u'2016-07-22T17:21:36-07:00',
u'display_name': u'Shattuck Avenue, Berkeley',
u'ending_trip': ObjectId('595adeb1f6858fc0673d73d6'),
u'source': u'DwellSegmentationTimeFilter',
u'enter_ts': 1469233296,
u'raw_places': []},
u'key': u'analysis/cleaned_place'}}
Note that because this is a weird place (enter almost == exit, exit - enter = 5 secs), and the reset_ts has a fuzz factor associated with it, searching for entries that are greater than the reset_ts will not delete the untracked time trip.
We should fix that, but it still doesn't explain why the the place expanded after the reset.
Bunch of transitions, including the all-important BOOTED
s
0 2016-07-22T17:33:54.634000-07:00 4 (BOOTED)
1 2016-07-25T15:35:33.679000-07:00 4 (BOOTED)
2 2016-07-25T15:36:34.447000-07:00 0 (INITIALIZE)
3 2016-07-25T17:12:31.952000-07:00 1 (EXITED_GEOFENCE)
4 2016-07-25T17:21:04.399000-07:00 2 (STOPPED_MOVING)
5 2016-07-25T17:25:06.900000-07:00 1 (EXITED_GEOFENCE)
6 2016-07-25T17:33:42.862000-07:00 2 (STOPPED_MOVING)
Found the restarts
2017-07-03 17:17:56,367:DEBUG:140735683756992:In range 1469233296 -> 1469491947.85 found transitions fmt_time curr_state transition
0 2016-07-22T17:33:54.634000-07:00 3 4
1 2016-07-25T15:35:33.679000-07:00 3 4
2 2016-07-25T15:36:34.447000-07:00 1 0
2017-07-03 17:17:56,379:DEBUG:140735683756992:On android, found restart events _id curr_state fmt_time \
0 5793f338abbf384cadf90ba9 3 2016-07-22T17:33:54.634000-07:00
1 5796b24fabbf384cadfa37d9 3 2016-07-25T15:35:33.679000-07:00
2017-07-03 17:17:56,379:DEBUG:140735683756992:tracking has been restarted, returning True
And we re-insert raw untracked time.
2017-07-03 17:17:56,386:DEBUG:140735683756992
Inserting entry Entry({'data': Untrackedtime({'end_place': ObjectId('595adeb4f6858fc0673d7414'),
'start_ts': 1469233301,
'start_place': ObjectId('595adeb0f6858fc0673d73c9'),
'duration': 258646.8529999256,
'distance': 55389.54955322857,
'end_ts': 1469491947.853,
'start_fmt_time': u'2016-07-22T17:21:36-07:00',
'source': 'DwellSegmentationTimeFilter',
'end_fmt_time': u'2016-07-25T17:12:27.853000-07:00',
'_id': ObjectId('595adeb4f6858fc0673d7413'),
'user_id': UUID('2c808701-345c-4aea-92ac-aee03bc53e5e'),
'metadata': Metadata({'write_fmt_time': '2017-07-03T17:17:56.380142-07:00',
'write_ts': 1499127476.380142,
'time_zone': 'America/Los_Angeles',
'platform': 'server',
'key': 'segmentation/raw_untracked'})}) into timeseries
And the exit is filled in correctly.
2017-07-03 17:17:56,390:DEBUG:140735683756992
Resetting last_place_entry from Entry({u'data': Rawplace(
{u'enter_fmt_time': u'2016-07-22T17:21:36-07:00',
u'ending_trip': ObjectId('595adeb0f6858fc0673d73c8'),
'starting_trip': ObjectId('595adeb4f6858fc0673d7413'),
'duration': 5,
'exit_fmt_time': u'2016-07-22T17:21:36-07:00',
u'source': u'DwellSegmentationTimeFilter',
u'enter_ts': 1469233296,
'exit_ts': 1469233301}),
u'_id': ObjectId('595adeb0f6858fc0673d73c9'),
u'user_id': UUID('2c808701-345c-4aea-92ac-aee03bc53e5e'),
u'key': u'segmentation/raw_place'}})
But when we get the CLEAN_AND_RESAMPLE
stage, we don't consider that untracked time trip or the reset place for that matter. It is fine to not consider the place (since it will be the last place to link to), but it is a problem that we don't consider the untracked time, and it is probably the reason that we expand the place to much, to link to the next trip that we do have.
2017-07-03 17:17:56,666:DEBUG:140735683756992:Considering place 595adeb4f6858fc0673d7414: 2016-07-25T17:12:27.853000-07:00 -> 2016-07-25T17:12:27.853000-07:00
2017-07-03 17:17:56,666:DEBUG:140735683756992:Considering place 595adeb4f6858fc0673d7412: 2016-07-25T17:18:14.841000-07:00 -> 2016-07-25T17:25:06.307000-07:00
2017-07-03 17:17:56,666:DEBUG:140735683756992:Considering place 595adeb4f6858fc0673d7416: 2016-07-25T17:30:31-07:00 -> None
2017-07-03 17:17:56,666:DEBUG:140735683756992:Considering trip 595adeb4f6858fc0673d7411: 2016-07-25T17:12:27.853000-07:00 -> 2016-07-25T17:18:14.841000-07:00
2017-07-03 17:17:56,667:DEBUG:140735683756992:Considering trip 595adeb4f6858fc0673d7415: 2016-07-25T17:25:06.307000-07:00 -> 2016-07-25T17:30:31-07:00
This indicates that adding a blanket fuzz factor was a bad idea. We need a fuzz factor only for the TRIP_SEGMENTATION
stage, not for the other stages - otherwise we may miss some entries. Note that this is only really relevant to the untracked time trip - otherwise place duration should be at least 5 mins (we should have merged the trips on the two sides if we didn't have a long enough place).
So we can do two fixes:
TRIP_SEGMENTATION
stageI am tempted to go with (1) because really the fuzz is a hack to get around the fact that the end time for the trip segmentation does not reflect the actual set of objects processed. When we have processed t + 5mins, we set the end time to t. This is not true for the other stages.
Trying that change now...
We should probably not reset entries that are before the reset_ts
- e.g. from people who have uninstalled the app a long time ago so reset_ts
. It's not too bad - once these people reinstall, their entries will be after the reset_ts, but will fail in case they have not been able to push for a really long time, so have been holding on to entries on the phone for months, and then finally are able to push again.
Ran into one more issue while resetting. The user has only one place and it is the first place, so it has no enter and no exit.
INFO:root:resetting user 174fdc0a-9caa-4b45-a28d-d2662e2bf5a8 to ts 1506816000
DEBUG:root:Looking for last place before 1506816000
DEBUG:root:all places for this user = [{'_id': ObjectId('57f8034088f6630fbe895454'), 'data': {}}]
DEBUG:root:last place doc for user 174fdc0a-9caa-4b45-a28d-d2662e2bf5a8 = None
DEBUG:root:last trip doc for user 174fdc0a-9caa-4b45-a28d-d2662e2bf5a8 = None
DEBUG:root:last place doc = {'_id': ObjectId('57f8034088f6630fbe895454'), 'data': {'source': 'DwellSegmentationDistFilter', 'raw_places': [ObjectId('57f8033f88f6630fbe895446'), ObjectId('57f8033f88f6630fbe895446')], 'display_name': 'Haviland Road, Berkeley', 'location': {'type': 'Point', 'coordinates': [-122.2581048834238, 37.87522555192434]}}, 'user_id': UUID('174fdc0a-9caa-4b45-a28d-d2662e2bf5a8'), 'metadata': {'write_fmt_time': '2016-10-07T13:19:12.842688-07:00', 'write_ts': 1475871552.842688, 'time_zone': 'America/Los_Angeles', 'platform': 'server', 'write_local_dt': {'hour': 13, 'month': 10, 'second': 12, 'weekday': 4, 'year': 2016, 'timezone': 'America/Los_Angeles', 'day': 7, 'minute': 19}, 'key': 'analysis/cleaned_place'}}
DEBUG:root:last_place = Entry({'_id': ObjectId('57f8034088f6630fbe895454'), 'data': {'source': 'DwellSegmentationDistFilter', 'raw_places': [ObjectId('57f8033f88f6630fbe895446'), ObjectId('57f8033f88f6630fbe895446')], 'display_name': 'Haviland Road, Berkeley', 'location': {'type': 'Point', 'coordinates': [-122.2581048834238, 37.87522555192434]}}, 'user_id': UUID('174fdc0a-9caa-4b45-a28d-d2662e2bf5a8'), 'metadata': {'write_fmt_time': '2016-10-07T13:19:12.842688-07:00', 'write_ts': 1475871552.842688, 'time_zone': 'America/Los_Angeles', 'platform': 'server', 'write_local_dt': {'hour': 13, 'month': 10, 'second': 12, 'weekday': 4, 'year': 2016, 'timezone': 'America/Los_Angeles', 'day': 7, 'minute': 19}, 'key': 'analysis/cleaned_place'}}), reset_ts = 1506816000
So when we look for the enter_ts
, it is None
if last_place is None:
return None
elif last_place.data.enter_ts < reset_ts:
return last_place
else:
raise ValueError("No trip or place straddling time %s for user %s" %
(reset_ts, user_id))
Traceback (most recent call last):
File "bin/reset_pipeline.py", line 99, in <module>
epr.reset_user_to_ts(user_id, day_ts, args.dry_run)
File "/code/e-mission-server/emission/pipeline/reset.py", line 54, in reset_user_to_ts
last_cleaned_place = esdp.get_last_place_before(esda.CLEANED_PLACE_KEY, ts, user_id)
File "/code/e-mission-server/emission/storage/decorations/place_queries.py", line 107, in get_last_place_before
elif last_place.data.enter_ts < reset_ts:
TypeError: '<' not supported between instances of 'NoneType' and 'int'
This may have happened before, but is more clear in python3, where comparison between None and int
is not allowed.
Let's handle this by dealing with it like a None
last place, since there's no point to resetting it anyway.
The naive implementation of the pipeline reset deletes all entries greater than a particular ts. This breaks the chain of trips and sections and ends up with issues when the pipeline is re-run. We worked around this last time by re-running the entire pipeline, but that took multiple days and is no longer a valid option.
Instead, we should write a more complex reset procedure that retains the prior chain but opens up the end so that the new chain can be joined to it.