e-mission / e-mission-docs

Repository for docs and issues. If you need help, please file an issue here. Public conversations are better for open source projects than private email.
https://e-mission.readthedocs.io/en/latest
BSD 3-Clause "New" or "Revised" License
15 stars 34 forks source link

Fix pipeline reset #191

Open shankari opened 8 years ago

shankari commented 8 years ago

The naive implementation of the pipeline reset deletes all entries greater than a particular ts. This breaks the chain of trips and sections and ends up with issues when the pipeline is re-run. We worked around this last time by re-running the entire pipeline, but that took multiple days and is no longer a valid option.

Instead, we should write a more complex reset procedure that retains the prior chain but opens up the end so that the new chain can be joined to it.

shankari commented 7 years ago

I forgot to update the server to handle android issues and a bunch of trips are marked as UNKNOWN. This might be a good excuse to actually fix the pipeline reset.

I think that it is as simple as the following. To reset the pipeline to time t for user u

That should work.

shankari commented 7 years ago

I thought that the above won't work, because it may break the mapping between cleaned and raw places. But actually, it will because the raw ts and the cleaned timestamp are not synchronized. But it will also work to just use t1 for both. See scanned images below...

shankari commented 7 years ago

The initial implementation has all the reset code in the script. But that means that it cannot be unit tested. Let's move the code to the pipeline module so that it can be tested.

shankari commented 7 years ago

Let's see what functionality the module needs.

We want to support:

The resulting use case matrix looks like this:

all users all platform users input list of users
reset to start
reset to day

Note that the all platform users case reduces to the input list of users case after retrieving the list of users for that platform.

So the remaining matrix is:

all users input list of users
reset to start delete all analysis entries, remove all pipeline state delete all analysis entries, remove all pipeline state for each user (or list of users using $in
reset to day reduces to list of users since last ts is different for each user follow algorithm above to determine the last ts for each user, remove entries after it and reset state to it

This makes it clear that we need at least two main functions:

While those two are sufficient if we reduce the all users, reset to start case to the input list of users, reset to start case, since just deleting everything is so performant, we may want to add another reset_all_users_to_start method as an optimization.

And the matrix becomes:

all users input list of users
reset to start reset_all_users_to_start reset_user_to_start
reset to day reset_user_to_ts reset_user_to_ts
shankari commented 7 years ago

So the algorithm in https://github.com/e-mission/e-mission-server/issues/333#issuecomment-312167112 doesn't work perfectly because of the mismatch between the time that the trip end is detected and the trip actually ends (the dwell time).

So when we run the pipeline a second time, the trips could be off by ~ 5 minutes.

Here's an example.

Mismatch: "2016-07-22T16:59:46.998718-07:00" != "2016-07-22T16:51:35.782000-07:00"
2017-07-03 10:48:37,851:DEBUG:140735683756992:last_cleaned_place = Entry({u'user_id': UUID('86241484-c66a-4a67-be19-03a79ad8f9fe'),
 u'_id': ObjectId('595a8371f6858fb1bb8fc639'),
 u'data': {u'enter_fmt_time': u'2016-07-22T16:51:13-07:00',
 u'exit_fmt_time': u'2016-07-22T16:59:46.998718-07:00',
 u'ending_trip': ObjectId('595a836ff6858fb1bb8fc607'),
 u'starting_trip': ObjectId('595a8370f6858fb1bb8fc626'),
 u'source': u'DwellSegmentationTimeFilter',
 u'location': {u'type': u'Point',
 u'coordinates': [-122.2593545, 37.8688199]},
 u'enter_ts': 1469231473,
 u'duration': 513.9987180233002,
 u'exit_ts': 1469231986.998718},
 u'key': u'analysis/cleaned_place'}})

Resetting timestamp to 1469231473 = 2016-07-22T16:51:13-07:00. So far so good.

2017-07-03 10:48:38,027:INFO:140735683756992:For stage PipelineStages.TRIP_SEGMENTATION, start_ts = 2016-07-22T23:51:13

2017-07-03 10:48:38,091:DEBUG:140735683756992:------------------------------2016-07-22T16:51:35.782000-07:00------------------------------
2017-07-03 10:48:38,092:DEBUG:140735683756992:Appending currPoint because the current start point is None
2017-07-03 10:48:38,092:DEBUG:140735683756992:Setting new trip start point AttrDict(u'fmt_time': u'2016-07-22T16:51:35.782000-07:00') with idx 0
2017-07-03 10:48:38,142:DEBUG:140735683756992:------------------------------2016-07-22T16:54:08.368000-07:00------------------------------
2017-07-03 10:48:38,154:DEBUG:140735683756992:Too few points to make a decision, continuing
2017-07-03 10:48:38,154:DEBUG:140735683756992:------------------------------2016-07-22T17:01:21.964000-07:00------------------------------
2017-07-03 10:48:38,178:DEBUG:140735683756992:Too few points to make a decision, continuing
2017-07-03 10:48:38,578:DEBUG:140735683756992:------------------------------2016-07-22T17:08:10.604000-07:00------------------------------
2017-07-03 10:48:38,602:INFO:140735683756992:Found trip end at 2016-07-22T17:05:52.818000-07:00

What happened the first time?

2017-07-03 10:48:28,139:DEBUG:140735683756992:------------------------------2016-07-22T16:50:38.356000-07:00------------------------------
2017-07-03 10:48:28,157:DEBUG:140735683756992:last5MinsDistances.max() = 251.583191842, last10PointsDistance.max() = 251.583191842
2017-07-03 10:48:28,158:DEBUG:140735683756992:------------------------------2016-07-22T16:50:53.926000-07:00------------------------------
2017-07-03 10:48:28,184:DEBUG:140735683756992:last5MinsDistances.max() = 251.583191842, last10PointsDistance.max() = 176.953658227
2017-07-03 10:48:28,185:DEBUG:140735683756992:------------------------------2016-07-22T16:51:13-07:00------------------------------
2017-07-03 10:48:28,204:DEBUG:140735683756992:Too few points to make a decision, continuing
2017-07-03 10:48:28,205:DEBUG:140735683756992:------------------------------2016-07-22T16:51:35.782000-07:00------------------------------
2017-07-03 10:48:28,223:DEBUG:140735683756992:last5MinsDistances.max() = 190.03242523, last10PointsDistance.max() = 151.676167225
2017-07-03 10:48:28,223:DEBUG:140735683756992:------------------------------2016-07-22T16:51:59.776000-07:00------------------------------
...
2017-07-03 10:48:28,306:DEBUG:140735683756992:------------------------------2016-07-22T16:54:08.368000-07:00------------------------------
2017-07-03 10:48:28,331:INFO:140735683756992:Found trip end at 2016-07-22T16:51:13-07:00
2017-07-03 10:48:28,332:DEBUG:140735683756992:------------------------------2016-07-22T17:01:21.964000-07:00------------------------------
2017-07-03 10:48:28,340:DEBUG:140735683756992:prev_point is None, continuing trip

2017-07-03 10:48:28,340:DEBUG:140735683756992:Too few points to make a decision, continuing
2017-07-03 10:48:28,750:INFO:140735683756992:Found trip end at 2016-07-22T17:05:52.818000-07:00

As we can see, in the original run, we detected the trip end while processing the entry at 16:54, but the trip end was at 16:51. So we end up with two trips (16:40 - 16:51 and 17:01 - 17:05).

2017-07-03 10:48:29,177:DEBUG:140735683756992
start_loc = u'fmt_time': u'2016-07-22T16:40:52-07:00', u'ts': 1469230852
end_loc = u'fmt_time': u'2016-07-22T16:50:01-07:00', u'ts': 1469231401.358

2017-07-03 10:48:29,207:DEBUG:140735683756992
start_loc = u'fmt_time': u'2016-07-22T17:01:22-07:00', u'ts': 1469232081.964
end_loc = u'fmt_time': u'2016-07-22T17:05:53-07:00', u'ts': 1469232352.818

But the second time, because we reset the pipeline to 16:51, we start the trip at 16:51 and we end up with a single trip from 16:51 - 17:05.

2017-07-03 10:48:39,835:DEBUG:140735683756992
start_loc = u'fmt_time': u'2016-07-22T16:51:36-07:00', u'ts': 1469231495.782
end_loc = u'fmt_time': u'2016-07-22T17:05:53-07:00', u'ts': 1469232352.818
shankari commented 7 years ago

So the final result is:

BEFORE:

2016-07-22T09:26:08-07:00 - 2016-07-22T11:44:50-07:00
2016-07-22T16:38:06-07:00 - 2016-07-22T16:51:13-07:00
2016-07-22T16:59:47-07:00 - 2016-07-22T17:05:53-07:00
2016-07-22T17:12:19-07:00 - 2016-07-22T17:21:36-07:00

AFTER:

2016-07-22T09:26:08-07:00 - 2016-07-22T11:44:50-07:00
2016-07-22T16:38:06-07:00 - 2016-07-22T16:51:13-07:00
2016-07-22T16:51:36-07:00 - 2016-07-22T17:04:45-07:00
2016-07-22T17:12:19-07:00 - 2016-07-22T17:21:36-07:00

So it looks like there are two main options:

I'm going to try with setting the delta to 5 mins and see if that helps...

shankari commented 7 years ago

Also note that for this to work, it needs to be 5 mins from the enter ts of the first raw place. Let's see if that works here.

Cleaned place = 595a8371f6858fb1bb8fc639 Raw place = 595a836df6858fb1bb8fc51c

which is

2017-07-03 10:48:29,228:DEBUG:140735683756992:Saving entry Entry(
{'data': Rawplace({'enter_fmt_time': u'2016-07-22T16:51:13-07:00',
 'ending_trip': ObjectId('595a836df6858fb1bb8fc51b'),
 'starting_trip': ObjectId('595a836df6858fb1bb8fc51d'),
 'duration': 608.9639999866486,
 'exit_fmt_time': u'2016-07-22T17:01:21.964000-07:00',
 'enter_ts': 1469231473,
 'exit_ts': 1469232081.964}),
 '_id': ObjectId('595a836df6858fb1bb8fc51c'),
 'user_id': UUID('86241484-c66a-4a67-be19-03a79ad8f9fe'),
 'key': 'segmentation/raw_place'})}) into timeseries

Hm. Maybe it doesn't matter so much - the start time is the same, but the end time is different. On the other hand, there is no harm with using the start time from the raw place since we do perform some magic stuff for the end time of the trip and that would change the start time of the subsequent place.

shankari commented 7 years ago

That seems to work for the raw segmentation.

2017-07-03 14:21:21,552:DEBUG:140735683756992
start_loc = u'fmt_time': u'2016-07-22T16:40:23-07:00', u'ts': 1469230823.024
end_loc = u'fmt_time': u'2016-07-22T16:51:13-07:00', u'ts': 1469231473

2017-07-03 14:21:21,578:DEBUG:140735683756992
start_loc = u'fmt_time': u'2016-07-22T17:01:22-07:00', u'ts': 1469232081.964
end_loc = u'fmt_time': u'2016-07-22T17:05:53-07:00', u'ts': 1469232352.818
2017-07-03 14:21:29,568:DEBUG:140735683756992:reset_ts = 1469231473
2017-07-03 14:21:31,293:DEBUG:140735683756992
start_loc = u'fmt_time': u'2016-07-22T17:01:22-07:00', u'ts': 1469232081.964
end_loc = u'fmt_time': u'2016-07-22T17:05:53-07:00', u'ts': 1469232352.818

But the list of cleaned trips is now worse.

2016-07-22T09:26:08-07:00 2016-07-22T11:44:50-07:00
2016-07-22T16:38:06.693176-07:00 2016-07-22T16:51:13-07:00
2016-07-22T17:01:21.964000-07:00 2016-07-22T17:05:52.818000-07:00
2016-07-22T17:12:19.019962-07:00 2016-07-22T17:21:36-07:00

It looks like the extrapolation is not working well for the second round...

shankari commented 7 years ago

During the first run:

2017-07-03 14:21:21,600:DEBUG:140735683756992
Inserting entry Entry({'data': Rawtrip({'end_place': ObjectId('595ab551f6858fb93922a521'),
 'start_ts': 1469232081.964,
 'start_place': ObjectId('595ab551f6858fb93922a51f'),
 'duration': 270.85400009155273,
 'distance': 82.88829997716516,
 'end_ts': 1469232352.818,
 'start_fmt_time': u'2016-07-22T17:01:21.964000-07:00',
 'source': 'DwellSegmentationTimeFilter',
 'end_fmt_time': u'2016-07-22T17:05:52.818000-07:00',
 '_id': ObjectId('595ab551f6858fb93922a520'),
 'user_id': UUID('1fa90c82-e6b1-4a00-b1ef-edc9c20b428a'),
 'key': 'segmentation/raw_trip'})}) into timeseries
2017-07-03 14:21:22,707:DEBUG:140735683756992:Considering trip 595ab551f6858fb93922a520: 2016-07-22T17:01:21.964000-07:00 -> 2016-07-22T17:05:52.818000-07:00 

only one section:

2017-07-03 14:21:24,200:DEBUG:140735683756992
Starting with element of type trip,
 id 595ab552f6858fb93922a52f,
 details Entry({u'data': {u'sensed_mode': 2,
 u'end_ts': 1469232352.818,
 u'start_ts': 1469232081.964,
 u'start_fmt_time': u'2016-07-22T17:01:21.964000-07:00',
 u'source': u'SmoothedHighConfidenceMotion',
 u'end_fmt_time': u'2016-07-22T17:05:52.818000-07:00',
 u'duration': 270.85400009155273,
 u'trip_id': ObjectId('595ab551f6858fb93922a520'),
 u'_id': ObjectId('595ab552f6858fb93922a52f'),
 u'user_id': UUID('1fa90c82-e6b1-4a00-b1ef-edc9c20b428a'),
 u'key': u'segmentation/raw_section'}})
2017-07-03 14:21:24,206:DEBUG:140735683756992:deleting 0 points from section points
2017-07-03 14:21:24,209:DEBUG:140735683756992:Found 13 results
2017-07-03 14:21:24,211:DEBUG:140735683756992:After de-duping, converted 13 points to 13 
2017-07-03 14:21:24,214:DEBUG:140735683756992:Found first section, may need to extrapolate start point
2017-07-03 14:21:24,215:DEBUG:140735683756992:At index 0, loc is 5792bb92abbf384cadf84cd7
2017-07-03 14:21:24,216:DEBUG:140735683756992:Raw place is 595ab551f6858fb93922a51f and corresponding location is 5792bb92abbf384cadf84cbf
2017-07-03 14:21:24,229:DEBUG:140735683756992:Adding distance 132.789740146 to original 631.078945882 to extend section start from [-122.2599829, 37.8699062] to [-122.2593545, 37.8688199]
2017-07-03 14:21:24,231:DEBUG:140735683756992:After subtracting time 94.9652820442 from original 270.854000092 to cover additional distance 132.789740146 at speed 1.39829774932, new_start_ts = 1469231987.0
2017-07-03 14:21:24,296:DEBUG:140735683756992:Resampling data of size 14
2017-07-03 14:21:24,302:DEBUG:140735683756992:Resampling entry list                            fmt_time            ts   longitude   latitude
0  2016-07-22T16:59:46.998718-07:00  1.469232e+09 -122.259355  37.868820
1  2016-07-22T17:01:21.964000-07:00  1.469232e+09 -122.259983  37.869906
2         2016-07-22T17:01:53-07:00  1.469232e+09 -122.259550  37.870440
3  2016-07-22T17:01:57.839000-07:00  1.469232e+09 -122.259550  37.870440
4         2016-07-22T17:02:34-07:00  1.469232e+09 -122.259573  37.870521 of size 14

During the second run:

2017-07-03 14:21:31,298:DEBUG:140735683756992
Inserting entry Entry({'data': Rawtrip({'end_place': ObjectId('595ab55bf6858fb93922a689'),
 'start_ts': 1469232081.964,
 'start_place': ObjectId('595ab55bf6858fb93922a687'),
 'duration': 270.85400009155273,
 'distance': 82.88829997716516,
 'end_ts': 1469232352.818,
 'start_fmt_time': u'2016-07-22T17:01:21.964000-07:00',
 'source': 'DwellSegmentationTimeFilter',
 'end_fmt_time': u'2016-07-22T17:05:52.818000-07:00',
 '_id': ObjectId('595ab55bf6858fb93922a688'),
 'user_id': UUID('1fa90c82-e6b1-4a00-b1ef-edc9c20b428a'),
 'key': 'segmentation/raw_trip'})}) into timeseries
2017-07-03 14:21:31,823:DEBUG:140735683756992:Considering trip 595ab55bf6858fb93922a688: 2016-07-22T17:01:21.964000-07:00 -> 2016-07-22T17:05:52.818000-07:00 

only one section:


2017-07-03 14:21:31,867:DEBUG:140735683756992
Starting with element of type trip,
 id 595ab55bf6858fb93922a692,
 details Entry({u'data': {u'sensed_mode': 2,
 u'end_ts': 1469232352.818,
 u'start_ts': 1469232081.964,
 u'start_fmt_time': u'2016-07-22T17:01:21.964000-07:00',
 u'source': u'SmoothedHighConfidenceMotion',
 u'end_fmt_time': u'2016-07-22T17:05:52.818000-07:00',
 u'duration': 270.85400009155273,
 u'trip_id': ObjectId('595ab55bf6858fb93922a688'),
 u'_id': ObjectId('595ab55bf6858fb93922a692'),
 u'user_id': UUID('1fa90c82-e6b1-4a00-b1ef-edc9c20b428a'),
 u'key': u'segmentation/raw_section'}})
2017-07-03 14:21:31,873:DEBUG:140735683756992:deleting 0 points from section points
2017-07-03 14:21:31,877:DEBUG:140735683756992:Found 13 results
2017-07-03 14:21:31,879:DEBUG:140735683756992:After de-duping, converted 13 points to 13 
2017-07-03 14:21:31,881:DEBUG:140735683756992:Found first section, may need to extrapolate start point
2017-07-03 14:21:31,882:DEBUG:140735683756992:At index 0, loc is 5792bb92abbf384cadf84cd7
2017-07-03 14:21:31,884:DEBUG:140735683756992:Raw place is 595ab55bf6858fb93922a687 and corresponding location is 595ab55bf6858fb93922a69b
2017-07-03 14:21:31,900:DEBUG:140735683756992:Adding distance 0.0 to original 631.078945882 to extend section start from [-122.2599829, 37.8699062] to [-122.2599829, 37.8699062]

Ok so this is because the place and the start of the section are the same.

And I think that is because during the raw segmentation, we weren't able to link to the prior chain, e.g.

2017-07-03 14:21:31,295:DEBUG:140735683756992:last_place.enter_ts = None
2017-07-03 14:21:31,295:DEBUG:140735683756992:start of a chain, unable to check for restart from previous trip end, assuming not restarted

Let's look at the appropriate context. Here's the code.

    ts = esta.TimeSeries.get_time_series(user_id)
    last_place_entry = esdp.get_last_place_entry(esda.RAW_PLACE_KEY, user_id)
    if last_place_entry is None:
        last_place = start_new_chain(user_id)

    ....

    def start_new_chain(uuid):
    """
    Can't find the place that is the end of an existing chain, so we need to
    create a new one.  This might correspond to the start of tracking, or to an
    improperly terminated chain.
    """
    start_place = ecwrp.Rawplace()
    logging.debug("Starting tracking, created new start of chain %s" % start_place)
    return start_place

And sure enough, that's what we see

2017-07-03 14:21:31,289:DEBUG:140735683756992:Iterated over all points, just_ended = True, len(transition_df) = 10
2017-07-03 14:21:31,290:DEBUG:140735683756992:last place doc = None
2017-07-03 14:21:31,290:DEBUG:140735683756992:Starting tracking, created new start of chain Rawplace({})
2017-07-03 14:21:31,292:DEBUG:140735683756992:segmentation_point_list has length 4

Why is last place doc = None?

Note that from the first run, the start raw place was 595ab551f6858fb93922a51f.

We should NOT delete it, simply resetting it instead.

Here are the raw places that were reset.

2017-07-03 14:21:29,568:DEBUG:140735683756992
last_raw_place = Entry({u'data': 
{u'enter_fmt_time': u'2016-07-22T16:51:13-07:00',
 u'exit_fmt_time': u'2016-07-22T17:01:21.964000-07:00',
 u'ending_trip': ObjectId('595ab551f6858fb93922a51e'),
 u'starting_trip': ObjectId('595ab551f6858fb93922a520'),
 u'source': u'DwellSegmentationTimeFilter',
 u'enter_ts': 1469231473,
 u'duration': 608.9639999866486,
 u'exit_ts': 1469232081.964},
 u'_id': ObjectId('595ab551f6858fb93922a51f'),
 u'user_id': UUID('1fa90c82-e6b1-4a00-b1ef-edc9c20b428a'),
 u'key': u'segmentation/raw_place'}})

2017-07-03 14:21:29,568:DEBUG:140735683756992
first_raw_place = Entry({u'data': 
{u'enter_fmt_time': u'2016-07-22T16:51:13-07:00',
 u'exit_fmt_time': u'2016-07-22T17:01:21.964000-07:00',
 u'ending_trip': ObjectId('595ab551f6858fb93922a51e'),
 u'starting_trip': ObjectId('595ab551f6858fb93922a520'),
 u'source': u'DwellSegmentationTimeFilter',
 u'enter_ts': 1469231473,
 u'duration': 608.9639999866486,
 u'exit_ts': 1469232081.964},
 u'_id': ObjectId('595ab551f6858fb93922a51f'),
 u'user_id': UUID('1fa90c82-e6b1-4a00-b1ef-edc9c20b428a'),
 u'key': u'segmentation/raw_place'}})

Note that they are both the same, and both have _id = 595ab551f6858fb93922a51f.

Then, we reset the place

2017-07-03 14:21:29,696:DEBUG:140735683756992:match query = {'_id': ObjectId('595ab551f6858fb93922a51f')}
2017-07-03 14:21:29,696:DEBUG:140735683756992:reset_query = {'$unset': {'exit_local_dt': '', 'exit_fmt_time': '', 'starting_time': '', 'duration': '', 'raw_places': '', 'exit_ts': ''}}

But then it is None. The reset is not working for some reason. WTF!

2017-07-03 14:21:29,737:DEBUG:140735683756992:after update, entry is None
shankari commented 7 years ago

And the really weird thing is that the place actually still exists.

2017-07-03 14:21:21,600:DEBUG:140735683756992
Inserting entry Entry({'data': Rawtrip({'end_place': ObjectId('595ab551f6858fb93922a521'),
 'start_ts': 1469232081.964,
 'start_place': ObjectId('595ab551f6858fb93922a51f'),
 'duration': 270.85400009155273,
 'distance': 82.88829997716516,
 'end_ts': 1469232352.818,
 'start_fmt_time': u'2016-07-22T17:01:21.964000-07:00',
 'source': 'DwellSegmentationTimeFilter',
 'end_fmt_time': u'2016-07-22T17:05:52.818000-07:00',
 '_id': ObjectId('595ab551f6858fb93922a520'),
 'user_id': UUID('1fa90c82-e6b1-4a00-b1ef-edc9c20b428a'),
 'key': 'segmentation/raw_trip'})}) into timeseries

and the ids for the trip's start_place and place's starting trip and start/end times match up just fine.

{u'_id': ObjectId('595ab551f6858fb93922a51f'),
 u'data': {u'duration': 608.9639999866486,
  u'ending_trip': ObjectId('595ab551f6858fb93922a51e'),
  u'enter_fmt_time': u'2016-07-22T16:51:13-07:00',
  u'enter_ts': 1469231473,
  u'exit_fmt_time': u'2016-07-22T17:01:21.964000-07:00',
  u'exit_ts': 1469232081.964,
  u'location': {u'coordinates': [-122.2593545, 37.8688199], u'type': u'Point'},
  u'source': u'DwellSegmentationTimeFilter',
  u'starting_trip': ObjectId('595ab551f6858fb93922a520')},
  u'write_ts': 1499116881.553992},
 u'user_id': UUID('1fa90c82-e6b1-4a00-b1ef-edc9c20b428a')}

So that must mean that the initial query is not correct.

shankari commented 7 years ago

Ok after fixing the unset, we now have a last_place_doc with blank enter information.

2017-07-03 16:03:45,558:DEBUG:140735683756992
last place doc = {u'user_id': UUID('89e0955f-0e3e-4f8d-91e5-e5c84fa0c2ee'),
 u'_id': ObjectId('595acd45f6858fbd4e669f10'),
 u'data': {u'enter_fmt_time': u'2016-07-22T16:51:13-07:00',
 u'ending_trip': ObjectId('595acd45f6858fbd4e669f0f'),
 u'source': u'DwellSegmentationTimeFilter',
 u'enter_ts': 1469231473},
 u'key': u'segmentation/raw_place'}}
2017-07-03 16:03:45,560:DEBUG:140735683756992
start_loc = u'fmt_time': u'2016-07-22T17:01:22-07:00', u'ts': 1469232081.964
end_loc = u'fmt_time': u'2016-07-22T17:05:52-07:00', u'ts': 1469232352.818
2017-07-03 16:03:45,583:DEBUG:140735683756992
Inserting entry Entry(
{'data': Rawtrip({'end_place': ObjectId('595acd51f6858fbd4e66a079'),
 'start_ts': 1469232081.964,
 'start_place': ObjectId('595acd45f6858fbd4e669f10'),
 'duration': 270.85400009155273,
 'distance': 82.88829997716516,
 'end_ts': 1469232352.818,
 'start_fmt_time': u'2016-07-22T17:01:21.964000-07:00',
 'source': 'DwellSegmentationTimeFilter',
 'end_fmt_time': u'2016-07-22T17:05:52.818000-07:00',
 '_id': ObjectId('595acd51f6858fbd4e66a078'),
 'user_id': UUID('89e0955f-0e3e-4f8d-91e5-e5c84fa0c2ee'),
 'key': 'segmentation/raw_trip'})}) into timeseries
2017-07-03 16:03:46,192:DEBUG:140735683756992:Considering trip 595acd51f6858fbd4e66a078: 2016-07-22T17:01:21.964000-07:00 -> 2016-07-22T17:05:52.818000-07:00 
2017-07-03 16:03:46,248:DEBUG:140735683756992:Saving entries into cleaned section Cleanedsection({u'source': u'SmoothedHighConfidenceMotion', 'trip_id': ObjectId('595acd52f6858fbd4e66a08a')})
2017-07-03 16:03:46,253:DEBUG:140735683756992:deleting 0 points from section points
2017-07-03 16:03:46,258:DEBUG:140735683756992:Found 13 results
2017-07-03 16:03:46,261:DEBUG:140735683756992:After de-duping, converted 13 points to 13 
2017-07-03 16:03:46,265:DEBUG:140735683756992:After filtering, rows went from 13 -> 13, cols from 22 -> 21
2017-07-03 16:03:46,265:DEBUG:140735683756992:Found first section, may need to extrapolate start point
2017-07-03 16:03:46,267:DEBUG:140735683756992:At index 0, loc is 5792bb92abbf384cadf84cd7
2017-07-03 16:03:46,267:DEBUG:140735683756992:get_entry_at_ts query = {'metadata.key': 'background/filtered_location', 'user_id': UUID('89e0955f-0e3e-4f8d-91e5-e5c84fa0c2ee'), 'data.ts': 1469231473}
2017-07-03 16:03:46,268:DEBUG:140735683756992:Raw place is 595acd45f6858fbd4e669f10 and corresponding location is 5792bb92abbf384cadf84cbf
2017-07-03 16:03:46,285:DEBUG:140735683756992:Adding distance 132.789740146 to original 631.078945882 to extend section start from [-122.2599829, 37.8699062] to [-122.2593545, 37.8688199]
2017-07-03 16:03:46,288:DEBUG:140735683756992:After subtracting time 94.9652820442 from original 270.854000092 to cover additional distance 132.789740146 at speed 1.39829774932, new_start_ts = 1469231987.0

Corresponding cleaned trip is

2017-07-03 16:03:46,489:DEBUG:140735683756992:For raw trip 595acd51f6858fbd4e66a078, found filtered trip 595acd52f6858fbd4e66a08a

Failed because of no raw_places array. Need to set that to empty instead of unsetting it

2017-07-03 16:03:47,290:DEBUG:140735683756992:for trip Entry({'data': Cleanedtrip({'distance': 562.52852529365282, 'raw_trip': ObjectId('595acd51f6858fbd4e66a078'), 'start_loc': {u'type': u'Point', u'coordinates': [-122.2593545, 37.8688199]}, u'end_ts': 1469232352.8180001, 'start_ts': 1469231986.998718, 'start_fmt_time': '2016-07-22T16:59:46.998718-07:00', u'end_loc': {u'type': u'Point', u'coordinates': [-122.2602491, 37.8706214]}, u'source': u'DwellSegmentationTimeFilter', u'end_fmt_time': '2016-07-22T17:05:52.818000-07:00', u'end_local_dt': LocalDate({'hour': 17, 'month': 7, 'second': 52, 'weekday': 4, 'year': 2016, 'timezone': u'America/Los_Angeles', 'day': 22, 'minute': 5}), 'duration': 365.81928205490112, 'start_local_dt': LocalDate({'hour': 16, 'month': 7, 'second': 46, 'weekday': 4, 'year': 2016, 'timezone': u'America/Los_Angeles', 'day': 22, 'minute': 59})}), '_id': ObjectId('595acd52f6858fbd4e66a08a'), 'user_id': UUID('89e0955f-0e3e-4f8d-91e5-e5c84fa0c2ee'), 'metadata': Metadata({'write_fmt_time': '2017-07-03T16:03:46.246494-07:00', 'write_ts': 1499123026.246494, 'time_zone': 'America/Los_Angeles', 'platform': 'server', 'write_local_dt': LocalDate({'hour': 16, 'month': 7, 'second': 46, 'weekday': 0, 'year': 2017, 'timezone': 'America/Los_Angeles', 'day': 3, 'minute': 3}), 'key': 'analysis/cleaned_trip'})}) start, converting Entry({u'data': {u'enter_fmt_time': u'2016-07-22T16:51:13-07:00', u'display_name': u'Southside, Berkeley', u'enter_local_dt': {u'hour': 16, u'month': 7, u'second': 13, u'weekday': 4, u'year': 2016, u'timezone': u'America/Los_Angeles', u'day': 22, u'minute': 51}, u'ending_trip': ObjectId('595acd48f6858fbd4e669ffb'), u'source': u'DwellSegmentationTimeFilter', u'location': {u'type': u'Point', u'coordinates': [-122.2593545, 37.8688199]}, u'enter_ts': 1469231473}, u'_id': ObjectId('595acd49f6858fbd4e66a02d'), u'user_id': UUID('89e0955f-0e3e-4f8d-91e5-e5c84fa0c2ee'), u'metadata': {u'write_fmt_time': u'2017-07-03T16:03:37.654076-07:00', u'write_ts': 1499123017.654076, u'time_zone': u'America/Los_Angeles', u'platform': u'server', u'write_local_dt': {u'hour': 16, u'month': 7, u'second': 37, u'weekday': 0, u'year': 2017, u'timezone': u'America/Los_Angeles', u'day': 3, u'minute': 3}, u'key': u'analysis/cleaned_place'}}) to Entry({u'data': {u'enter_fmt_time': u'2016-07-22T16:51:13-07:00', u'exit_local_dt': {u'hour': 17, u'month': 7, u'second': 21, u'weekday': 4, u'year': 2016, u'timezone': u'America/Los_Angeles', u'day': 22, u'minute': 1}, u'exit_fmt_time': u'2016-07-22T17:01:21.964000-07:00', u'enter_local_dt': {u'hour': 16, u'month': 7, u'second': 13, u'weekday': 4, u'year': 2016, u'timezone': u'America/Los_Angeles', u'day': 22, u'minute': 51}, u'ending_trip': ObjectId('595acd45f6858fbd4e669f0f'), u'starting_trip': ObjectId('595acd51f6858fbd4e66a078'), u'source': u'DwellSegmentationTimeFilter', u'location': {u'type': u'Point', u'coordinates': [-122.2593545, 37.8688199]}, u'enter_ts': 1469231473, u'duration': 608.9639999866486, u'exit_ts': 1469232081.964}, u'_id': ObjectId('595acd45f6858fbd4e669f10'), u'user_id': UUID('89e0955f-0e3e-4f8d-91e5-e5c84fa0c2ee'), u'metadata': {u'write_fmt_time': u'2017-07-03T16:03:33.441360-07:00', u'write_ts': 1499123013.44136, u'time_zone': u'America/Los_Angeles', u'platform': u'server', u'write_local_dt': {u'hour': 16, u'month': 7, u'second': 33, u'weekday': 0, u'year': 2017, u'timezone': u'America/Los_Angeles', u'day': 3, u'minute': 3}, u'key': u'segmentation/raw_place'}})
2017-07-03 16:03:47,290:ERROR:140735683756992:Cleaning and resampling failed for user 89e0955f-0e3e-4f8d-91e5-e5c84fa0c2ee
Traceback (most recent call last):
  File "/Users/shankari/e-mission/e-mission-server/emission/analysis/intake/cleaning/clean_and_resample.py", line 72, in clean_and_resample
    last_raw_place = save_cleaned_segments_for_ts(user_id, time_query.startTs, time_query.endTs)
  File "/Users/shankari/e-mission/e-mission-server/emission/analysis/intake/cleaning/clean_and_resample.py", line 98, in save_cleaned_segments_for_ts
    return save_cleaned_segments_for_timeline(user_id, tl)
  File "/Users/shankari/e-mission/e-mission-server/emission/analysis/intake/cleaning/clean_and_resample.py", line 124, in save_cleaned_segments_for_timeline
    (last_cleaned_place, filtered_tl) = create_and_link_timeline(tl, user_id, trip_map)
  File "/Users/shankari/e-mission/e-mission-server/emission/analysis/intake/cleaning/clean_and_resample.py", line 886, in create_and_link_timeline
    link_trip_start(ts, curr_cleaned_trip, curr_cleaned_start_place, raw_start_place)
  File "/Users/shankari/e-mission/e-mission-server/emission/analysis/intake/cleaning/clean_and_resample.py", line 957, in link_trip_start
    cleaned_start_place_data.append_raw_place(raw_start_place.get_id())
  File "/Users/shankari/e-mission/e-mission-server/emission/core/wrapper/cleanedplace.py", line 15, in append_raw_place
    self["raw_places"].append(place_id)
KeyError: 'raw_places'
shankari commented 7 years ago

After these fixes, the trip reset function broke. In particular, this trip (which is the untracked trip)

2017-07-03 17:17:58,134:ERROR:140735683756992:
Found error 'NoneType' object has no attribute 'get_id' while processing trip Entry({u'data': {u'distance': 55389.54955322857,
 u'end_place': ObjectId('595adeb2f6858fc0673d740e'),
 u'end_ts': 1469491947.853,
 u'start_ts': 1469233301,
 u'start_fmt_time': u'2016-07-22T17:21:36-07:00',
 u'source': u'DwellSegmentationTimeFilter',
 u'start_place': ObjectId('595adeb2f6858fc0673d740d'),
 u'end_fmt_time': u'2016-07-25T17:12:27.853000-07:00',
 u'duration': 258646.8529999256,
 u'_id': ObjectId('595adeb1f6858fc0673d73ed'),
 u'user_id': UUID('2c808701-345c-4aea-92ac-aee03bc53e5e'),
 u'key': u'analysis/cleaned_untracked'}})

has no end place.

In [102]: edb.get_analysis_timeseries_db().find({"_id": boi.ObjectId("595adeb2f6858fc0673d740e")}).count()
Out[102]: 0

Does it have a start place?

In [103]: edb.get_analysis_timeseries_db().find({"_id": boi.ObjectId("595adeb2f6858fc0673d740d")}).count()
Out[103]: 1

Yes. And it was

{u'_id': ObjectId('595adeb2f6858fc0673d740d'),
 u'data': {u'display_name': u'Shattuck Avenue, Berkeley',
  u'duration': 258651.8529999256,
  u'ending_trip': ObjectId('595adeb1f6858fc0673d73d6'),
  u'enter_fmt_time': u'2016-07-22T17:21:36-07:00',
  u'enter_ts': 1469233296,
  u'exit_fmt_time': u'2016-07-25T17:12:27.853000-07:00',
  u'exit_ts': 1469491947.853,
  u'location': {u'coordinates': [-122.26827, 37.8700362], u'type': u'Point'},
  u'raw_places': [ObjectId('595adeb4f6858fc0673d7414')],
  u'source': u'DwellSegmentationTimeFilter',
  u'starting_trip': ObjectId('595adeb4f6858fc0673d741b')},
 u'user_id': UUID('2c808701-345c-4aea-92ac-aee03bc53e5e')}

That is almost certainly the place that overlaps the reset. But why do we have that if we have the untracked time too?

We have

 u'start_fmt_time': u'2016-07-22T17:21:36-07:00',
 u'end_fmt_time': u'2016-07-25T17:12:27.853000-07:00',
 u'key': u'analysis/cleaned_untracked'}})

 u'enter_fmt_time': u'2016-07-22T17:21:36-07:00',
 u'exit_fmt_time': u'2016-07-25T17:12:27.853000-07:00',
 u'key': u'analysis/cleaned_place',

How can we have both untracked time and cleaned place that overlap with each other?

shankari commented 7 years ago

timeline appears to be:

trip (595adeb1f6858fc0673d73d6)
    2016-07-22T17:21:36-07:00
place (595adeb2f6858fc0673d740d)
    2016-07-25T17:12:27.853000-07:00
trip (595adeb4f6858fc0673d741b)
    2016-07-25T17:18:14.841000-07:00

So somehow, the untracked_time trip got replaced by a place after the reset

shankari commented 7 years ago

Before

2017-07-03 17:17:54,168:DEBUG:140735683756992:
for trip Entry({'data': Untrackedtime({u'distance': 55389.54955322857,
 u'end_ts': 1469491947.853,
 u'start_ts': 1469233301,
 u'start_fmt_time': u'2016-07-22T17:21:36-07:00',
 u'source': u'DwellSegmentationTimeFilter',
 u'end_fmt_time': u'2016-07-25T17:12:27.853000-07:00',
 u'duration': 258646.8529999256,
 '_id': ObjectId('595adeb1f6858fc0673d73ed'),
 'user_id': UUID('2c808701-345c-4aea-92ac-aee03bc53e5e'),
 'key': 'analysis/cleaned_untracked'})})

converting Entry({'data': Cleanedplace(
{u'enter_fmt_time': u'2016-07-22T17:21:36-07:00',
 u'source': u'DwellSegmentationTimeFilter',
 'raw_places': [ObjectId('595adeb0f6858fc0673d73c9')],
 'display_name': u'Shattuck Avenue, Berkeley',
 u'enter_ts': 1469233296,
 'ending_trip': ObjectId('595adeb1f6858fc0673d73d6')}),
 '_id': ObjectId('595adeb2f6858fc0673d740d'),
 'user_id': UUID('2c808701-345c-4aea-92ac-aee03bc53e5e'),
 'key': 'analysis/cleaned_place'})})
2017-07-03 17:17:54,790:DEBUG:140735683756992:Considering place 595adeb2f6858fc0673d740d: 2016-07-22T17:21:36-07:00 -> 2016-07-22T17:21:36-07:00 
...
2017-07-03 17:17:54,794:DEBUG:140735683756992:for trip = 595adeb1f6858fc0673d73ed, adding start_place 595adeb2f6858fc0673d740d and end_place 595adeb2f6858fc0673d740e

During compare

        "enter_fmt_time": "2016-07-22T17:21:36-07:00", 
        "exit_fmt_time": "2016-07-22T17:21:36-07:00", 

During reset

{u'_id': ObjectId('595adeb2f6858fc0673d740d'), u'data': {u'enter_fmt_time': u'2016-07-22T17:21:36-07:00', u'exit_fmt_time': u'2016-07-22T17:21:36-07:00'}},
2017-07-03 17:17:55,359:DEBUG:140735683756992
last_cleaned_place = Entry({u'user_id': UUID('2c808701-345c-4aea-92ac-aee03bc53e5e'),
 u'_id': ObjectId('595adeb2f6858fc0673d740d'),
 u'data': {u'enter_fmt_time': u'2016-07-22T17:21:36-07:00',
 u'display_name': u'Shattuck Avenue, Berkeley',
 u'exit_fmt_time': u'2016-07-22T17:21:36-07:00',
 u'ending_trip': ObjectId('595adeb1f6858fc0673d73d6'),
 u'starting_trip': ObjectId('595adeb1f6858fc0673d73ed'),
 u'source': u'DwellSegmentationTimeFilter',
 u'enter_ts': 1469233296,
 u'duration': 5,
 u'raw_places': [ObjectId('595adeb0f6858fc0673d73c9'),
 ObjectId('595adeb0f6858fc0673d73c9')],
 u'exit_ts': 1469233301},
 u'key': u'analysis/cleaned_place'}})

2017-07-03 17:17:55,359
DEBUG:140735683756992:last_raw_place = Entry({u'data': 
{u'enter_fmt_time': u'2016-07-22T17:21:36-07:00',
 u'exit_fmt_time': u'2016-07-22T17:21:36-07:00',
 u'ending_trip': ObjectId('595adeb0f6858fc0673d73c8'),
 u'starting_trip': ObjectId('595adeb0f6858fc0673d73cc'),
 u'source': u'DwellSegmentationTimeFilter',
 u'enter_ts': 1469233296,
 u'duration': 5,
 u'exit_ts': 1469233301},
 u'_id': ObjectId('595adeb0f6858fc0673d73c9'),
 u'user_id': UUID('2c808701-345c-4aea-92ac-aee03bc53e5e'),
 u'key': u'segmentation/raw_place'}})

2017-07-03 17:17:55,
360:DEBUG:140735683756992:first_raw_place = Entry(
{u'data': {u'enter_fmt_time': u'2016-07-22T17:21:36-07:00',
 u'exit_fmt_time': u'2016-07-22T17:21:36-07:00',
 u'ending_trip': ObjectId('595adeb0f6858fc0673d73c8'),
 u'starting_trip': ObjectId('595adeb0f6858fc0673d73cc'),
 u'source': u'DwellSegmentationTimeFilter',
 u'enter_ts': 1469233296,
 u'duration': 5,
 u'exit_ts': 1469233301},
 u'_id': ObjectId('595adeb0f6858fc0673d73c9'),
 u'user_id': UUID('2c808701-345c-4aea-92ac-aee03bc53e5e'),
 u'key': u'segmentation/raw_place'}})

Perform the reset

Now we reset to 2016-07-22T17:21:41-07:00

2017-07-03 17:17:55,360:DEBUG:140735683756992:last_place_enter_ts = 1469233296
2017-07-03 17:17:55,360:DEBUG:140735683756992:reset_ts = 1469233296

And as expected, the exit entries are removed and raw_places is set to []

2017-07-03 17:17:55,477:DEBUG:140735683756992:after update,
 entry is {u'user_id': UUID('2c808701-345c-4aea-92ac-aee03bc53e5e'),
 u'_id': ObjectId('595adeb2f6858fc0673d740d'),
 u'data': {u'enter_fmt_time': u'2016-07-22T17:21:36-07:00',
 u'display_name': u'Shattuck Avenue, Berkeley',
 u'ending_trip': ObjectId('595adeb1f6858fc0673d73d6'),
 u'source': u'DwellSegmentationTimeFilter',
 u'enter_ts': 1469233296,
 u'raw_places': []},
 u'key': u'analysis/cleaned_place'}}

Note that because this is a weird place (enter almost == exit, exit - enter = 5 secs), and the reset_ts has a fuzz factor associated with it, searching for entries that are greater than the reset_ts will not delete the untracked time trip.

We should fix that, but it still doesn't explain why the the place expanded after the reset.

After the reset

Bunch of transitions, including the all-important BOOTEDs

0  2016-07-22T17:33:54.634000-07:00           4 (BOOTED)
1  2016-07-25T15:35:33.679000-07:00           4 (BOOTED)
2  2016-07-25T15:36:34.447000-07:00           0 (INITIALIZE)
3  2016-07-25T17:12:31.952000-07:00           1 (EXITED_GEOFENCE)
4  2016-07-25T17:21:04.399000-07:00           2 (STOPPED_MOVING)
5  2016-07-25T17:25:06.900000-07:00           1 (EXITED_GEOFENCE)
6  2016-07-25T17:33:42.862000-07:00           2 (STOPPED_MOVING)

Found the restarts

2017-07-03 17:17:56,367:DEBUG:140735683756992:In range 1469233296 -> 1469491947.85 found transitions                            fmt_time  curr_state  transition
0  2016-07-22T17:33:54.634000-07:00           3           4
1  2016-07-25T15:35:33.679000-07:00           3           4
2  2016-07-25T15:36:34.447000-07:00           1           0

2017-07-03 17:17:56,379:DEBUG:140735683756992:On android, found restart events                         _id  curr_state                          fmt_time  \
0  5793f338abbf384cadf90ba9           3  2016-07-22T17:33:54.634000-07:00   
1  5796b24fabbf384cadfa37d9           3  2016-07-25T15:35:33.679000-07:00   

2017-07-03 17:17:56,379:DEBUG:140735683756992:tracking has been restarted, returning True

And we re-insert raw untracked time.

2017-07-03 17:17:56,386:DEBUG:140735683756992
Inserting entry Entry({'data': Untrackedtime({'end_place': ObjectId('595adeb4f6858fc0673d7414'),
 'start_ts': 1469233301,
 'start_place': ObjectId('595adeb0f6858fc0673d73c9'),
 'duration': 258646.8529999256,
 'distance': 55389.54955322857,
 'end_ts': 1469491947.853,
 'start_fmt_time': u'2016-07-22T17:21:36-07:00',
 'source': 'DwellSegmentationTimeFilter',
 'end_fmt_time': u'2016-07-25T17:12:27.853000-07:00',
 '_id': ObjectId('595adeb4f6858fc0673d7413'),
 'user_id': UUID('2c808701-345c-4aea-92ac-aee03bc53e5e'),
 'metadata': Metadata({'write_fmt_time': '2017-07-03T17:17:56.380142-07:00',
 'write_ts': 1499127476.380142,
 'time_zone': 'America/Los_Angeles',
 'platform': 'server',
 'key': 'segmentation/raw_untracked'})}) into timeseries

And the exit is filled in correctly.

2017-07-03 17:17:56,390:DEBUG:140735683756992
Resetting last_place_entry from Entry({u'data': Rawplace(
{u'enter_fmt_time': u'2016-07-22T17:21:36-07:00',
 u'ending_trip': ObjectId('595adeb0f6858fc0673d73c8'),
 'starting_trip': ObjectId('595adeb4f6858fc0673d7413'),
 'duration': 5,
 'exit_fmt_time': u'2016-07-22T17:21:36-07:00',
 u'source': u'DwellSegmentationTimeFilter',
 u'enter_ts': 1469233296,
 'exit_ts': 1469233301}),
 u'_id': ObjectId('595adeb0f6858fc0673d73c9'),
 u'user_id': UUID('2c808701-345c-4aea-92ac-aee03bc53e5e'),
 u'key': u'segmentation/raw_place'}}) 

But when we get the CLEAN_AND_RESAMPLE stage, we don't consider that untracked time trip or the reset place for that matter. It is fine to not consider the place (since it will be the last place to link to), but it is a problem that we don't consider the untracked time, and it is probably the reason that we expand the place to much, to link to the next trip that we do have.

2017-07-03 17:17:56,666:DEBUG:140735683756992:Considering place 595adeb4f6858fc0673d7414: 2016-07-25T17:12:27.853000-07:00 -> 2016-07-25T17:12:27.853000-07:00 
2017-07-03 17:17:56,666:DEBUG:140735683756992:Considering place 595adeb4f6858fc0673d7412: 2016-07-25T17:18:14.841000-07:00 -> 2016-07-25T17:25:06.307000-07:00 
2017-07-03 17:17:56,666:DEBUG:140735683756992:Considering place 595adeb4f6858fc0673d7416: 2016-07-25T17:30:31-07:00 -> None 
2017-07-03 17:17:56,666:DEBUG:140735683756992:Considering trip 595adeb4f6858fc0673d7411: 2016-07-25T17:12:27.853000-07:00 -> 2016-07-25T17:18:14.841000-07:00 
2017-07-03 17:17:56,667:DEBUG:140735683756992:Considering trip 595adeb4f6858fc0673d7415: 2016-07-25T17:25:06.307000-07:00 -> 2016-07-25T17:30:31-07:00 
shankari commented 7 years ago

This indicates that adding a blanket fuzz factor was a bad idea. We need a fuzz factor only for the TRIP_SEGMENTATION stage, not for the other stages - otherwise we may miss some entries. Note that this is only really relevant to the untracked time trip - otherwise place duration should be at least 5 mins (we should have merged the trips on the two sides if we didn't have a long enough place).

So we can do two fixes:

  1. apply the fuzz factor only for the TRIP_SEGMENTATION stage
  2. set the fuzz factor to min(5 mins, duration at last place - 1)

I am tempted to go with (1) because really the fuzz is a hack to get around the fact that the end time for the trip segmentation does not reflect the actual set of objects processed. When we have processed t + 5mins, we set the end time to t. This is not true for the other stages.

Trying that change now...

shankari commented 7 years ago

We should probably not reset entries that are before the reset_ts - e.g. from people who have uninstalled the app a long time ago so reset_ts. It's not too bad - once these people reinstall, their entries will be after the reset_ts, but will fail in case they have not been able to push for a really long time, so have been holding on to entries on the phone for months, and then finally are able to push again.

shankari commented 6 years ago

Ran into one more issue while resetting. The user has only one place and it is the first place, so it has no enter and no exit.

INFO:root:resetting user 174fdc0a-9caa-4b45-a28d-d2662e2bf5a8 to ts 1506816000
DEBUG:root:Looking for last place before 1506816000
DEBUG:root:all places for this user = [{'_id': ObjectId('57f8034088f6630fbe895454'), 'data': {}}]
DEBUG:root:last place doc for user 174fdc0a-9caa-4b45-a28d-d2662e2bf5a8 = None
DEBUG:root:last trip doc for user 174fdc0a-9caa-4b45-a28d-d2662e2bf5a8 = None
DEBUG:root:last place doc = {'_id': ObjectId('57f8034088f6630fbe895454'), 'data': {'source': 'DwellSegmentationDistFilter', 'raw_places': [ObjectId('57f8033f88f6630fbe895446'), ObjectId('57f8033f88f6630fbe895446')], 'display_name': 'Haviland Road, Berkeley', 'location': {'type': 'Point', 'coordinates': [-122.2581048834238, 37.87522555192434]}}, 'user_id': UUID('174fdc0a-9caa-4b45-a28d-d2662e2bf5a8'), 'metadata': {'write_fmt_time': '2016-10-07T13:19:12.842688-07:00', 'write_ts': 1475871552.842688, 'time_zone': 'America/Los_Angeles', 'platform': 'server', 'write_local_dt': {'hour': 13, 'month': 10, 'second': 12, 'weekday': 4, 'year': 2016, 'timezone': 'America/Los_Angeles', 'day': 7, 'minute': 19}, 'key': 'analysis/cleaned_place'}}
DEBUG:root:last_place = Entry({'_id': ObjectId('57f8034088f6630fbe895454'), 'data': {'source': 'DwellSegmentationDistFilter', 'raw_places': [ObjectId('57f8033f88f6630fbe895446'), ObjectId('57f8033f88f6630fbe895446')], 'display_name': 'Haviland Road, Berkeley', 'location': {'type': 'Point', 'coordinates': [-122.2581048834238, 37.87522555192434]}}, 'user_id': UUID('174fdc0a-9caa-4b45-a28d-d2662e2bf5a8'), 'metadata': {'write_fmt_time': '2016-10-07T13:19:12.842688-07:00', 'write_ts': 1475871552.842688, 'time_zone': 'America/Los_Angeles', 'platform': 'server', 'write_local_dt': {'hour': 13, 'month': 10, 'second': 12, 'weekday': 4, 'year': 2016, 'timezone': 'America/Los_Angeles', 'day': 7, 'minute': 19}, 'key': 'analysis/cleaned_place'}}), reset_ts = 1506816000

So when we look for the enter_ts, it is None

        if last_place is None:
            return None
        elif last_place.data.enter_ts < reset_ts:
            return last_place
        else:
            raise ValueError("No trip or place straddling time %s for user %s" %
                (reset_ts, user_id))
Traceback (most recent call last):
  File "bin/reset_pipeline.py", line 99, in <module>
    epr.reset_user_to_ts(user_id, day_ts, args.dry_run)
  File "/code/e-mission-server/emission/pipeline/reset.py", line 54, in reset_user_to_ts
    last_cleaned_place = esdp.get_last_place_before(esda.CLEANED_PLACE_KEY, ts, user_id)
  File "/code/e-mission-server/emission/storage/decorations/place_queries.py", line 107, in get_last_place_before
    elif last_place.data.enter_ts < reset_ts:
TypeError: '<' not supported between instances of 'NoneType' and 'int'

This may have happened before, but is more clear in python3, where comparison between None and int is not allowed.

shankari commented 6 years ago

Let's handle this by dealing with it like a None last place, since there's no point to resetting it anyway.