Closed shankari closed 5 years ago
The good news is that this is 100% reproducible. Now I just need to add additional logging and figure out what's going on.
Rerunning locally with additional logging.
This is the raw trip that was lost.
2016-08-14 11:48:03,387:DEBUG:Filtering trip Entry({u'data': {u'end_place': ObjectId('57b0bce2f6858f96f5e38e71'), u'start_loc': {u'type': u'Point', u'coordinates': [-122.0860081, 37.3918007]}, u'end_ts': 1470939591.759, u'start_ts': 1470932414.817, u'start_fmt_time': u'2016-08-11T09:20:14.817000-07:00', u'end_loc': {u'type': u'Point', u'coordinates': [-122.2726109, 37.8023388]}, u'source': u'DwellSegmentationTimeFilter', u'start_place': ObjectId('57b0bcbaf6858f96a6abc975'), u'end_fmt_time': u'2016-08-11T11:19:51.759000-07:00', u'end_local_dt': {u'hour': 11, u'month': 8, u'second': 51, u'weekday': 3, u'year': 2016, u'timezone': u'America/Los_Angeles', u'day': 11, u'minute': 19}, u'duration': 7176.9420001506805, u'start_local_dt': {u'hour': 9, u'month': 8, u'second': 14, u'weekday': 3, u'year': 2016, u'timezone': u'America/Los_Angeles', u'day': 11, u'minute': 20}}, u'_id': ObjectId('57b0bce2f6858f96f5e38e70'), u'user_id': UUID('0763de67-f61e-3f5d-90e7-518e69793954'), u'metadata': {u'write_fmt_time': u'2016-08-14T11:48:02.261679-07:00', u'write_ts': 1471200482.261679, u'time_zone': u'America/Los_Angeles', u'platform': u'server', u'write_local_dt': {u'hour': 11, u'month': 8, u'second': 2, u'weekday': 6, u'year': 2016, u'timezone': u'America/Los_Angeles', u'day': 14, u'minute': 48}, u'key': u'segmentation/raw_trip'}})
We got through the get_filtered_trip
process and have a filtered_trip, so far, so good
2016-08-14 11:48:04,719:DEBUG:For raw trip 57b0bce2f6858f96f5e38e70, found filtered trip 57b0bce3f6858f96f5e38e83
We talk about how to link the raw place to the cleaned place for this cleaned trip. Note that we did this in the initial code as well, so the
2016-08-14 11:48:05,590:DEBUG:for trip Entry({'data': Cleanedtrip({'distance': 53909.743497049596, 'raw_trip': ObjectId('57b0bce2f6858f96f5e38e70'), u'start_loc': {u'type': u'Point', u'coordinates': [-122.0860081, 37.3918007]}, u'end_ts': 1470939591.759, u'start_ts': 1470932414.817, u'start_fmt_time': u'2016-08-11T09:20:14.817000-07:00', u'end_loc': {u'type': u'Point', u'coordinates': [-122.2726109, 37.8023388]}, u'source': u'DwellSegmentationTimeFilter', u'end_fmt_time': u'2016-08-11T11:19:51.759000-07:00', u'end_local_dt': {u'hour': 11, u'month': 8, u'second': 51, u'weekday': 3, u'year': 2016, u'timezone': u'America/Los_Angeles', u'day': 11, u'minute': 19}, u'duration': 7176.9420001506805, u'start_local_dt': {u'hour': 9, u'month': 8, u'second': 14, u'weekday': 3, u'year': 2016, u'timezone': u'America/Los_Angeles', u'day': 11, u'minute': 20}}), '_id': ObjectId('57b0bce3f6858f96f5e38e83'), 'user_id': UUID('0763de67-f61e-3f5d-90e7-518e69793954'), 'metadata': Metadata({'write_fmt_time': '2016-08-14T11:48:03.424020-07:00', 'write_ts': 1471200483.42402, 'time_zone': 'America/Los_Angeles', 'platform': 'server', 'write_local_dt': LocalDate({'hour': 11, 'month': 8, 'second': 3, 'weekday': 6, 'year': 2016, 'timezone': 'America/Los_Angeles', 'day': 14, 'minute': 48}), 'key': 'analysis/cleaned_trip'})}) start, converting Entry({u'data': {u'enter_fmt_time': u'2016-08-10T19:48:43.490000-07:00', u'display_name': u'South Shoreline Boulevard, Mountain View', u'enter_local_dt': {u'hour': 19, u'month': 8, u'second': 43, u'weekday': 2, u'year': 2016, u'timezone': u'America/Los_Angeles', u'day': 10, u'minute': 48}, u'ending_trip': ObjectId('57b0bcbcf6858f96a6abc9c0'), u'source': u'DwellSegmentationTimeFilter', u'location': {u'type': u'Point', u'coordinates': [-122.0864577, 37.3906569]}, u'enter_ts': 1470883723.49, u'raw_places': [ObjectId('57b0bcbaf6858f96a6abc975')]}, u'_id': ObjectId('57b0bcbef6858f96a6abc9e7'), u'user_id': UUID('0763de67-f61e-3f5d-90e7-518e69793954'), u'metadata': {u'write_fmt_time': u'2016-08-14T11:47:26.038447-07:00', u'write_ts': 1471200446.038447, u'time_zone': u'America/Los_Angeles', u'platform': u'server', u'write_local_dt': {u'hour': 11, u'month': 8, u'second': 26, u'weekday': 6, u'year': 2016, u'timezone': u'America/Los_Angeles', u'day': 14, u'minute': 47}, u'key': u'analysis/cleaned_place'}}) to Entry({u'data': {u'enter_fmt_time': u'2016-08-10T19:48:43.490000-07:00', u'exit_local_dt': {u'hour': 9, u'month': 8, u'second': 14, u'weekday': 3, u'year': 2016, u'timezone': u'America/Los_Angeles', u'day': 11, u'minute': 20}, u'exit_fmt_time': u'2016-08-11T09:20:14.817000-07:00', u'enter_local_dt': {u'hour': 19, u'month': 8, u'second': 43, u'weekday': 2, u'year': 2016, u'timezone': u'America/Los_Angeles', u'day': 10, u'minute': 48}, u'ending_trip': ObjectId('57b0bcbaf6858f96a6abc974'), u'starting_trip': ObjectId('57b0bce2f6858f96f5e38e70'), u'source': u'DwellSegmentationTimeFilter', u'location': {u'type': u'Point', u'coordinates': [-122.0864577, 37.3906569]}, u'enter_ts': 1470883723.49, u'duration': 48691.326999902725, u'exit_ts': 1470932414.817}, u'_id': ObjectId('57b0bcbaf6858f96a6abc975'), u'user_id': UUID('0763de67-f61e-3f5d-90e7-518e69793954'), u'metadata': {u'write_fmt_time': u'2016-08-14T11:47:22.520776-07:00', u'write_ts': 1471200442.520776, u'time_zone': u'America/Los_Angeles', u'platform': u'server', u'write_local_dt': {u'hour': 11, u'month': 8, u'second': 22, u'weekday': 6, u'year': 2016, u'timezone': u'America/Los_Angeles', u'day': 14, u'minute': 47}, u'key': u'segmentation/raw_place'}})
2016-08-14 11:48:05,940:DEBUG:parsed_response = {u'display_name': u'1111 Broadway Building, 1111, Broadway, Downtown, Oakland, Alameda County, California, 94612, United States of America', u'place_id': u'69307564', u'lon': u'-122.272845402369', u'boundingbox': [u'37.8025602', u'37.8031434', u'-122.2731911', u'-122.2725764'], u'osm_type': u'way', u'licence': u'Data \xa9 OpenStreetMap contributors, ODbL 1.0. http://www.openstreetmap.org/copyright', u'osm_id': u'36977642', u'lat': u'37.8029298', u'address': {u'building': u'1111 Broadway Building', u'city': u'Oakland', u'house_number': u'1111', u'country': u'United States of America', u'county': u'Alameda County', u'state': u'California', u'road': u'Broadway', u'country_code': u'us', u'neighbourhood': u'Downtown', u'postcode': u'94612'}}
We are still not squished
2016-08-14 11:48:05,942:DEBUG:Found mapping 57b0bce2f6858f96f5e38e70 -> 57b0bce3f6858f96f5e38e83, added links
Ok, so we finished creating and linking the timeline, and we return 3 places and 3 trips (in the original, this was 2, but that was because it was run midway through the day).
2016-08-14 11:48:06,654:DEBUG:Finished creating and linking timeline, returning 3 places and 3 trips
2016-08-14 11:48:06,654:DEBUG:keys = (analysis/cleaned_place, analysis/cleaned_trip), len(places) = 3, len(trips) = 3
And now we start from the place again.
2016-08-14 11:48:06,655:DEBUG:Starting with element of type place, id 57b0bce5f6858f96f5e38fc3, details Entry({'data': Cleanedplace({u'enter_fmt_time': u'2016-08-11T11:19:51.759000-07:00', u'exit_local_dt': {u'hour': 11, u'month': 8, u'second': 51, u'weekday': 3, u'year': 2016, u'timezone': u'America/Los_Angeles', u'day': 11, u'minute': 48}, 'display_name': u'Broadway, Oakland', u'exit_fmt_time': u'2016-08-11T11:48:51-07:00', u'enter_local_dt': {u'hour': 11, u'month': 8, u'second': 51, u'weekday': 3, u'year': 2016, u'timezone': u'America/Los_Angeles', u'day': 11, u'minute': 19}, 'ending_trip': ObjectId('57b0bce3f6858f96f5e38e83'), 'starting_trip': ObjectId('57b0bce4f6858f96f5e38f6f'), u'source': u'DwellSegmentationTimeFilter', u'location': {u'type': u'Point', u'coordinates': [-122.2726109, 37.8023388]}, u'enter_ts': 1470939591.759, 'duration': 1739.2409999370575, 'raw_places': [ObjectId('57b0bce2f6858f96f5e38e71'), ObjectId('57b0bce2f6858f96f5e38e71'), ObjectId('57b0bce2f6858f96f5e38e73')], u'exit_ts': 1470941331}), '_id': ObjectId('57b0bce5f6858f96f5e38fc3'), 'user_id': UUID('0763de67-f61e-3f5d-90e7-518e69793954'), 'metadata': Metadata({'write_fmt_time': '2016-08-14T11:48:05.941356-07:00', 'write_ts': 1471200485.941356, 'time_zone': 'America/Los_Angeles', 'platform': 'server', 'write_local_dt': LocalDate({'hour': 11, 'month': 8, 'second': 5, 'weekday': 6, 'year': 2016, 'timezone': 'America/Los_Angeles', 'day': 14, 'minute': 48}), 'key': 'analysis/cleaned_place'})})
So something went wrong in create_and_link_timeline
Ok this turned out to be so stupid that I can't believe it. It explains how, even in the real instance, the timeline had the correct number of trips (2)
2016-08-11 20:08:17,231:DEBUG:Finished creating and linking timeline, returning 2 places and 2 trips
2016-08-11 20:08:17,231:DEBUG:keys = (analysis/cleaned_place, analysis/cleaned_trip), len(places) = 2, len(trips) = 2
2016-08-11 20:08:17,231:DEBUG:Finished creating and linking timeline, returning 2 places and 2 trips
although only one was saved.
2016-08-11 20:08:17,231:DEBUG:Finished creating and linking timeline, returning 2 places and 2 trips
2016-08-11 20:08:17,291:DEBUG:Inserting entry Entry({'data': Cleanedplace({u'enter_fmt_time': u'2016-08-11T11:19:51.759000-07:00', u'exit_local_dt': {u'hour': 11, u'month': 8, u'second': 51, u'weekday': 3, u'year': 2016, u'timezone': u'America/Los_Angeles', u'day': 11, u'minute': 48}, 'display_name': u'Broadway, Oakland', u'exit_fmt_time': u'2016-08-11T11:48:51-07:00', u'enter_local_dt': {u'hour': 11, u'month': 8, u'second': 51, u'weekday': 3, u'year': 2016, u'timezone': u'America/Los_Angeles', u'day': 11, u'minute': 19}, 'ending_trip': ObjectId('57acdb2d88f6634f22600214'), 'starting_trip': ObjectId('57acdb3088f6634f22600300'), u'source': u'DwellSegmentationTimeFilter', u'location': {u'type': u'Point', u'coordinates': [-122.2726109, 37.8023388]}, u'enter_ts': 1470939591.759, 'duration': 1739.2409999370575, 'raw_places': [ObjectId('57acdb2a88f6634f22600209'), ObjectId('57acdb2a88f6634f22600209'), ObjectId('57acdb2b88f6634f2260020b')], u'exit_ts': 1470941331}), '_id': ObjectId('57acdb3188f6634f2260030b'), 'user_id': UUID('0763de67-f61e-3f5d-90e7-518e69793954'), 'metadata': Metadata({'write_fmt_time': '2016-08-11T13:08:17.044527-07:00', 'write_ts': 1470946097.044527, 'time_zone': 'America/Los_Angeles', 'platform': 'server', 'write_local_dt': LocalDate({'hour': 13, 'month': 8, 'second': 17, 'weekday': 3, 'year': 2016, 'timezone': 'America/Los_Angeles', 'day': 11, 'minute': 8}), 'key': 'analysis/cleaned_place'})}) into timeseries
2016-08-11 20:08:17,302:DEBUG:Inserting entry Entry({'data': Cleanedtrip({'distance': 184.41528363173799, 'end_place': ObjectId('57acdb3188f6634f2260030c'), 'raw_trip': ObjectId('57acdb2b88f6634f2260020e'), u'start_loc': {u'type': u'Point', u'coordinates': [-122.2724498, 37.805523]}, u'end_ts': 1470941549.534, u'start_ts': 1470941331, u'start_fmt_time': u'2016-08-11T11:48:51-07:00', u'end_loc': {u'type': u'Point', u'coordinates': [-122.272921, 37.8062295]}, u'source': u'DwellSegmentationTimeFilter', 'start_place': ObjectId('57acdb3188f6634f2260030b'), u'end_fmt_time': u'2016-08-11T11:52:29.534000-07:00', u'end_local_dt': {u'hour': 11, u'month': 8, u'second': 29, u'weekday': 3, u'year': 2016, u'timezone': u'America/Los_Angeles', u'day': 11, u'minute': 52}, u'duration': 218.53399991989136, u'start_local_dt': {u'hour': 11, u'month': 8, u'second': 51, u'weekday': 3, u'year': 2016, u'timezone': u'America/Los_Angeles', u'day': 11, u'minute': 48}}), '_id': ObjectId('57acdb3088f6634f22600300'), 'user_id': UUID('0763de67-f61e-3f5d-90e7-518e69793954'), 'metadata': Metadata({'write_fmt_time': '2016-08-11T13:08:16.455504-07:00', 'write_ts': 1470946096.455504, 'time_zone': 'America/Los_Angeles', 'platform': 'server', 'write_local_dt': LocalDate({'hour': 13, 'month': 8, 'second': 16, 'weekday': 3, 'year': 2016, 'timezone': 'America/Los_Angeles', 'day': 11, 'minute': 8}), 'key': 'analysis/cleaned_trip'})}) into timeseries
2016-08-11 20:08:17,305:DEBUG:Inserting entry Entry({'data': Cleanedplace({u'enter_fmt_time': u'2016-08-11T11:52:29.534000-07:00', u'source': u'DwellSegmentationTimeFilter', 'raw_places': [ObjectId('57acdb2b88f6634f2260020f')], 'display_name': u'Downtown, Oakland', u'location': {u'type': u'Point', u'coordinates': [-122.272921, 37.8062295]}, u'enter_ts': 1470941549.534, u'enter_local_dt': {u'hour': 11, u'month': 8, u'second': 29, u'weekday': 3, u'year': 2016, u'timezone': u'America/Los_Angeles', u'day': 11, u'minute': 52}, 'ending_trip': ObjectId('57acdb3088f6634f22600300')}), '_id': ObjectId('57acdb3188f6634f2260030c'), 'user_id': UUID('0763de67-f61e-3f5d-90e7-518e69793954'), 'metadata': Metadata({'write_fmt_time': '2016-08-11T13:08:17.230292-07:00', 'write_ts': 1470946097.230292, 'time_zone': 'America/Los_Angeles', 'platform': 'server', 'write_local_dt': LocalDate({'hour': 13, 'month': 8, 'second': 17, 'weekday': 3, 'year': 2016, 'timezone': 'America/Los_Angeles', 'day': 11, 'minute': 8}), 'key': 'analysis/cleaned_place'})}) into timeseries
It also explains why we haven't found it before. Although it is reproducible, it is only reproducible for certain cases.
The basic issue is with this section of code:
logging.debug("Finished creating and linking timeline, returning %d places and %d trips" % (len(cleaned_places), len(trip_map.values())))
return (last_cleaned_place, esdtl.Timeline(esda.CLEANED_PLACE_KEY,
esda.CLEANED_TRIP_KEY,
cleaned_places,
trip_map.values()))
Looks pretty benign, but note that map values are not guaranteed to be ordered!! So when we create the timeline, the first trip is not the first in the list, we think that the timeline starts with a place, and we skip the trip.
We can see this from the additional logging that I added.
Here's the mapping from the raw trip to the filtered trip
2016-08-14 12:45:01,819:DEBUG:For raw trip 57b0ca39f6858fae5cd98ad6, found filtered trip 57b0ca3bf6858fae5cd98ae9
we can confirm that the trip starts at 9:20 and causes the previous place to be exited at 9:20.
2016-08-14 12:45:03,302:DEBUG:for trip Entry({'data': Cleanedtrip({'distance': 53909.743497049596, 'raw_trip': ObjectId('57b0ca39f6858fae5cd98ad6'), u'start_loc': {u'type': u'Point', u'coordinates': [-122.0860081, 37.3918007]}, u'end_ts': 1470939591.759, u'start_ts': 1470932414.817, u'start_fmt_time': u'2016-08-11T09:20:14.817000-07:00', u'end_loc': {u'type': u'Point', u'coordinates': [-122.2726109, 37.8023388]}, u'source': u'DwellSegmentationTimeFilter', u'end_fmt_time': u'2016-08-11T11:19:51.759000-07:00', u'end_local_dt': {u'hour': 11, u'month': 8, u'second': 51, u'weekday': 3, u'year': 2016, u'timezone': u'America/Los_Angeles', u'day': 11, u'minute': 19}, u'duration': 7176.9420001506805, u'start_local_dt': {u'hour': 9, u'month': 8, u'second': 14, u'weekday': 3, u'year': 2016, u'timezone': u'America/Los_Angeles', u'day': 11, u'minute': 20}}), '_id': ObjectId('57b0ca3bf6858fae5cd98ae9'), 'user_id': UUID('0763de67-f61e-3f5d-90e7-518e69793954'), 'metadata': Metadata({'write_fmt_time': '2016-08-14T12:44:59.385382-07:00', 'write_ts': 1471203899.385382, 'time_zone': 'America/Los_Angeles', 'platform': 'server', 'write_local_dt': LocalDate({'hour': 12, 'month': 8, 'second': 59, 'weekday': 6, 'year': 2016, 'timezone': 'America/Los_Angeles', 'day': 14, 'minute': 44}), 'key': 'analysis/cleaned_trip'})}) start,
converting Entry({u'data': {u'enter_fmt_time': u'2016-08-10T19:48:43.490000-07:00', u'display_name': u'South Shoreline Boulevard, Mountain View', u'enter_local_dt': {u'hour': 19, u'month': 8, u'second': 43, u'weekday': 2, u'year': 2016, u'timezone': u'America/Los_Angeles', u'day': 10, u'minute': 48}, u'ending_trip': ObjectId('57b0c98bf6858fad31e55b7b'), u'source': u'DwellSegmentationTimeFilter', u'location': {u'type': u'Point', u'coordinates': [-122.0864577, 37.3906569]}, u'enter_ts': 1470883723.49, u'raw_places': [ObjectId('57b0c989f6858fad31e55b30')]}, u'_id': ObjectId('57b0c98cf6858fad31e55ba2'), u'user_id': UUID('0763de67-f61e-3f5d-90e7-518e69793954'), u'metadata': {u'write_fmt_time': u'2016-08-14T12:42:04.828773-07:00', u'write_ts': 1471203724.828773, u'time_zone': u'America/Los_Angeles', u'platform': u'server', u'write_local_dt': {u'hour': 12, u'month': 8, u'second': 4, u'weekday': 6, u'year': 2016, u'timezone': u'America/Los_Angeles', u'day': 14, u'minute': 42}, u'key': u'analysis/cleaned_place'}})
to Entry({u'data': {u'enter_fmt_time': u'2016-08-10T19:48:43.490000-07:00', u'exit_local_dt': {u'hour': 9, u'month': 8, u'second': 14, u'weekday': 3, u'year': 2016, u'timezone': u'America/Los_Angeles', u'day': 11, u'minute': 20}, u'exit_fmt_time': u'2016-08-11T09:20:14.817000-07:00', u'enter_local_dt': {u'hour': 19, u'month': 8, u'second': 43, u'weekday': 2, u'year': 2016, u'timezone': u'America/Los_Angeles', u'day': 10, u'minute': 48}, u'ending_trip': ObjectId('57b0c989f6858fad31e55b2f'), u'starting_trip': ObjectId('57b0ca39f6858fae5cd98ad6'), u'source': u'DwellSegmentationTimeFilter', u'location': {u'type': u'Point', u'coordinates': [-122.0864577, 37.3906569]}, u'enter_ts': 1470883723.49, u'duration': 48691.326999902725, u'exit_ts': 1470932414.817}, u'_id': ObjectId('57b0c989f6858fad31e55b30'), u'user_id': UUID('0763de67-f61e-3f5d-90e7-518e69793954'), u'metadata': {u'write_fmt_time': u'2016-08-14T12:42:01.146805-07:00', u'write_ts': 1471203721.146805, u'time_zone': u'America/Los_Angeles', u'platform': u'server', u'write_local_dt': {u'hour': 12, u'month': 8, u'second': 1, u'weekday': 6, u'year': 2016, u'timezone': u'America/Los_Angeles', u'day': 14, u'minute': 42}, u'key': u'segmentation/raw_place'}})
We've added links for it
2016-08-14 12:45:03,669:DEBUG:Found mapping 57b0ca39f6858fae5cd98ad6 -> 57b0ca3bf6858fae5cd98ae9, added links
We finished creating and linking timeline, and it has 3 places and 3 trips
2016-08-14 12:45:04,396:DEBUG:Finished creating and linking timeline, returning 3 places and 3 trips
2016-08-14 12:45:04,396:DEBUG:keys = (analysis/cleaned_place, analysis/cleaned_trip), len(places) = 3, len(trips) = 3
But this trip (57b0ca3bf6858fae5cd98ae9
) is the last one.
2016-08-14 12:45:04,396:DEBUG:places_or_stops = [ObjectId('57b0ca3ff6858fae5cd98c29'), ObjectId('57b0ca40f6858fae5cd98c2a'), ObjectId('57b0ca40f6858fae5cd98c2b')]
2016-08-14 12:45:04,396:DEBUG:trips_or_sections = [ObjectId('57b0ca3ef6858fae5cd98be0'), ObjectId('57b0ca3df6858fae5cd98bd5'), ObjectId('57b0ca3bf6858fae5cd98ae9')]
So to find the start, we compare
2016-08-14 12:45:04,396:DEBUG:to find start, comparing place 57b0ca3ff6858fae5cd98c29 with trip 57b0ca3ef6858fae5cd98be0
And we determine that the timeline starts with a place
And after that, we just step through the timeline and insert values
2016-08-14 12:45:04,453:DEBUG:entry was fine, no need to fix it
2016-08-14 12:45:04,454:DEBUG:Inserting entry Entry({'data': Cleanedplace({u'enter_fmt_time': u'2016-08-11T11:19:51.759000-07:00', u'exit_local_dt': {u'hour': 11, u'month': 8, u'second': 51, u'weekday': 3, u'year': 2016, u'timezone': u'America/Los_Angeles', u'day': 11, u'minute': 48}, 'display_name': u'Broadway, Oakland', u'exit_fmt_time': u'2016-08-11T11:48:51-07:00', u'enter_local_dt': {u'hour': 11, u'month': 8, u'second': 51, u'weekday': 3, u'year': 2016, u'timezone': u'America/Los_Angeles', u'day': 11, u'minute': 19}, 'ending_trip': ObjectId('57b0ca3bf6858fae5cd98ae9'), 'starting_trip': ObjectId('57b0ca3df6858fae5cd98bd5'), u'source': u'DwellSegmentationTimeFilter', u'location': {u'type': u'Point', u'coordinates': [-122.2726109, 37.8023388]}, u'enter_ts': 1470939591.759, 'duration': 1739.2409999370575, 'raw_places': [ObjectId('57b0ca39f6858fae5cd98ad7'), ObjectId('57b0ca39f6858fae5cd98ad7'), ObjectId('57b0ca39f6858fae5cd98ad9')], u'exit_ts': 1470941331}), '_id': ObjectId('57b0ca3ff6858fae5cd98c29'), 'user_id': UUID('0763de67-f61e-3f5d-90e7-518e69793954'), 'metadata': Metadata({'write_fmt_time': '2016-08-14T12:45:03.667744-07:00', 'write_ts': 1471203903.667744, 'time_zone': 'America/Los_Angeles', 'platform': 'server', 'write_local_dt': LocalDate({'hour': 12, 'month': 8, 'second': 3, 'weekday': 6, 'year': 2016, 'timezone': 'America/Los_Angeles', 'day': 14, 'minute': 45}), 'key': 'analysis/cleaned_place'})}) into timeseries
And we only insert two of the three trips.
bash-3.2$ grep -r "Inserting entry Entry({'data': Cleanedtrip" /tmp/intake.local.2 | wc -l
2
Note also that we can see this from the view generation as well. The place is exited at
2016-08-14 12:45:05,781:DEBUG:Considering place 57b0c98cf6858fad31e55ba2: 2016-08-10T19:48:43.490000-07:00 -> 2016-08-11T09:20:14.817000-07:00
although the trip is missing.
2016-08-14 12:45:05,783:DEBUG:Considering trip 57b0c98af6858fad31e55b6f: 2016-08-10T18:11:04.989000-07:00 -> 2016-08-10T18:13:49.871000-07:00
2016-08-14 12:45:05,784:DEBUG:Considering trip 57b0ca3df6858fae5cd98bd5: 2016-08-11T11:48:51-07:00 -> 2016-08-11T11:52:29.534000-07:00
I just confirmed that this is the exact same behavior on the real server by looking at the view generation.
2016-08-11 20:10:27,136:DEBUG:Considering place 57abf9c888f6633e5ec6186d: 2016-08-10T19:48:43.490000-07:00 -> 2016-08-11T09:20:14.817000-07:00
...
2016-08-11 20:10:27,139:DEBUG:Considering trip 57abf9c788f6633e5ec61849: 2016-08-10T19:32:39.328000-07:00 -> 2016-08-10T19:48:43.490000-07:00
2016-08-11 20:10:27,139:DEBUG:Considering trip 57acdb3088f6634f22600300: 2016-08-11T11:48:51-07:00 -> 2016-08-11T11:52:29.534000-07:00
Should be a fairly straightforward fix.
Should be interesting to see how many of the to_investigate
issues this fixes.
This fixes the problem.
Note that there is a straight line from San Mateo to Oakland.
This corresponds to the following observation. In the first comment.
Big gap here for some reason. This is not just BART, I think. Probably bad quality points - need to check.
It turns out that it is indeed due to bad quality points.
If we look at the raw points, we were still getting them all through the trip.
>> data_df = esda.get_data_df("background/location", UUID("0763de67-f61e-3f5d-90e7-518e69793954"), time_query=None)
>> during_trip_df = data_df[(data_df.local_dt_day == 11) & (data_df.local_dt_hour == 10)]
>> during_trip_df.fmt_time.tail()
499 2016-08-11T10:57:42.751000-07:00
500 2016-08-11T10:58:12.799000-07:00
501 2016-08-11T10:58:42.845000-07:00
502 2016-08-11T10:59:13.004000-07:00
503 2016-08-11T10:59:43.019000-07:00
but they were low quality
>>> during_trip_df.accuracy.tail()
499 1700
500 1700
501 1700
502 1700
503 1700
and the last good points were at around 10:13.
>>> during_trip_df[during_trip_df.accuracy < 200].fmt_time
385 2016-08-11T10:00:06.258000-07:00
387 2016-08-11T10:01:06.397000-07:00
388 2016-08-11T10:01:36.370000-07:00
389 2016-08-11T10:02:36.488000-07:00
390 2016-08-11T10:03:06.553000-07:00
391 2016-08-11T10:03:36.642000-07:00
392 2016-08-11T10:04:06.611000-07:00
394 2016-08-11T10:05:06.801000-07:00
395 2016-08-11T10:05:36.841000-07:00
396 2016-08-11T10:06:07.026000-07:00
397 2016-08-11T10:06:36.988000-07:00
398 2016-08-11T10:07:07.052000-07:00
399 2016-08-11T10:07:37.073000-07:00
400 2016-08-11T10:08:07.139000-07:00
401 2016-08-11T10:08:37.154000-07:00
407 2016-08-11T10:11:33.204000-07:00
408 2016-08-11T10:12:05.011000-07:00
409 2016-08-11T10:12:36.838000-07:00
410 2016-08-11T10:13:00.585000-07:00
I think that this is what happens if you keep your phone in your armpit on the train :)
On zooming in further, we see that the trip doesn't have an end walking part, and doesn't end at the Oakland DOT office. We do have points on the way there, we just seem to ignore them.
I am fairly sure that this is a raw trip segmentation problem, but it would be nice to fix it if there is an easy fix.
After a big gap, we started getting points again.
2016-08-11 20:08:10,282:DEBUG:------------------------------2016-08-11T10:13:00.585000-07:00------------------------------
2016-08-11 20:08:10,287:DEBUG:last5MinsDistances = [ 4084.14912455 3451.52647937 1504.04296114 1337.49816784 446.65066228] with length 5
2016-08-11 20:08:10,292:DEBUG:last10PointsDistances = [ 4583.49305378 4549.46266567 4540.59988907 4573.67651691 4084.14912455
2016-08-11 20:08:10,292:DEBUG:len(last10PointsDistances) = 10, len(last5MinsDistances) = 5
2016-08-11 20:08:10,292:DEBUG:last5MinsDistances.max() = 4084.14912455, last10PointsDistance.max() = 4583.49305378
2016-08-11 20:08:10,293:DEBUG:------------------------------2016-08-11T11:19:10.269000-07:00------------------------------
2016-08-11 20:08:10,295:DEBUG:last5MinsDistances = [] with length 0
2016-08-11 20:08:10,300:DEBUG:last10PointsDistances = [ 31277.26212486 31250.32917744 31289.01878554 30906.9231772
2016-08-11 20:08:10,300:DEBUG:len(last10PointsDistances) = 10, len(last5MinsDistances) = 0
2016-08-11 20:08:10,300:DEBUG:Too few points to make a decision, continuing
And at this point, we got a cluster of points that are only 5-10 secs apart, not 30 secs. This is probably because I was using the navigation on the phone to see where I was going.
2016-08-11 20:08:10,293:DEBUG:------------------------------2016-08-11T11:19:10.269000-07:00------------------------------
2016-08-11 20:08:10,301:DEBUG:------------------------------2016-08-11T11:19:20.301000-07:00------------------------------
2016-08-11 20:08:10,310:DEBUG:------------------------------2016-08-11T11:19:31.641000-07:00------------------------------
2016-08-11 20:08:10,320:DEBUG:------------------------------2016-08-11T11:19:41.803000-07:00------------------------------
2016-08-11 20:08:10,330:DEBUG:------------------------------2016-08-11T11:19:51.759000-07:00------------------------------
2016-08-11 20:08:10,340:DEBUG:------------------------------2016-08-11T11:20:01.837000-07:00------------------------------
2016-08-11 20:08:10,351:DEBUG:------------------------------2016-08-11T11:20:10.890000-07:00------------------------------
2016-08-11 20:08:10,363:DEBUG:------------------------------2016-08-11T11:20:21.018000-07:00------------------------------
2016-08-11 20:08:10,375:DEBUG:------------------------------2016-08-11T11:20:26.062000-07:00------------------------------
2016-08-11 20:08:10,387:DEBUG:------------------------------2016-08-11T11:20:31.066000-07:00------------------------------
So at this point, we have 10 points since the restart. They clearly don't represent a full 5 mins though, but we still detect that the trip ended.
2016-08-11 20:08:10,393:DEBUG:last5MinsDistances = [ 13.7195477 16.56804407 8.34715835 6.85977309 8.28402203
2016-08-11 20:08:10,398:DEBUG:last10PointsDistances = [ 13.7195477 16.56804407 8.34715835 6.85977309 8.28402203
2016-08-11 20:08:10,399:DEBUG:len(last10PointsDistances) = 10, len(last5MinsDistances) = 9
2016-08-11 20:08:10,399:DEBUG:last5MinsDistances.max() = 16.5680440701, last10PointsDistance.max() = 16.5680440701
2016-08-11 20:08:10,400:DEBUG:Appending last_trip_end_point AttrDict({u'local_dt_hour': 11, u'local_dt_timezone': u'America/Los_Angeles', u'local_dt_second': 51, u'local_dt_year': 2016, u'local_dt_month': 8, u'local_dt_weekday': 3, u'elapsedRealtimeNanos': 7616956000000, u'loc': {u'type': u'Point', u'coordinates': [-122.2726109, 37.8023388]}, 'user_id': UUID('0763de67-f61e-3f5d-90e7-518e69793954'), u'altitude': 0.0, u'ts': 1470939591.7590001, u'latitude': 37.802338800000001, u'sensed_speed': 0.0, u'accuracy': 19.847000000000001, u'fmt_time': u'2016-08-11T11:19:51.759000-07:00', u'local_dt_minute': 19, 'metadata_write_ts': 1470939628.72, u'longitude': -122.2726109, u'filter': u'time', '_id': ObjectId('57accff9f311e474bfda658a'), u'heading': 0, u'local_dt_day': 11}) with index 76
And then we have to restart, but then we detect an end again and all kinds of bad stuff happens.
It looks like the real problem is that we haven't actually been stationary for 5 mins. This is a general problem with the trip end detection. When we search for points for the last 5 mins, and find their max, we may only find points that are actually for the past 2 mins because we haven't been getting points for enough time.
The last 10 points were supposed to check for that, but it looks like they can also be fooled. I think that we should add a check to ensure that the points actually cover at least 5 mins before making the decision.
If we do that, we can remove the "last 10 points" check. Let's add the new check first and then remove it once we have the evaluation pipeline in place.
The trip from the City Center to the Radio Shack is similar.
Trip started at 11:44:36
2016-08-11 20:08:10,519:DEBUG:Setting new trip start point AttrDict({u'local_dt_hour': 11, u'local_dt_timezone': u'America/Los_Angeles', u'local_dt_second': 36, u'local_dt_year': 2016, u'local_dt_month': 8, u'local_dt_weekday': 3, u'elapsedRealtimeNanos': 9101219000000, u'loc': {u'type': u'Point', u'coordinates': [-122.2726109, 37.8046482]}, 'user_id': UUID('0763de67-f61e-3f5d-90e7-518e69793954'), u'altitude': 0.0, u'ts': 1470941076.022, u'latitude': 37.8046482, u'sensed_speed': 0.0, u'accuracy': 19.593, u'fmt_time': u'2016-08-11T11:44:36.022000-07:00', u'local_dt_minute': 44, 'metadata_write_ts': 1470941078.54, 'idx': 120, u'longitude': -122.2726109, u'filter': u'time', '_id': ObjectId('57acdb2988f6634f22600207'), u'heading': 0, u'local_dt_day': 11}) with idx 120
We got some points that were < 30 secs apart
2016-08-11 20:08:10,517:DEBUG:------------------------------2016-08-11T11:44:36.022000-07:00------------------------------
2016-08-11 20:08:10,523:DEBUG:------------------------------2016-08-11T11:45:20.424000-07:00------------------------------
2016-08-11 20:08:10,528:DEBUG:------------------------------2016-08-11T11:45:28-07:00------------------------------
2016-08-11 20:08:10,534:DEBUG:------------------------------2016-08-11T11:45:52.502000-07:00------------------------------
2016-08-11 20:08:10,541:DEBUG:------------------------------2016-08-11T11:46:30.100000-07:00------------------------------
2016-08-11 20:08:10,549:DEBUG:------------------------------2016-08-11T11:47:00.053000-07:00------------------------------
2016-08-11 20:08:10,558:DEBUG:------------------------------2016-08-11T11:47:08-07:00------------------------------
2016-08-11 20:08:10,567:DEBUG:------------------------------2016-08-11T11:47:30.180000-07:00------------------------------
And in 3 mins, we marked the trip as done.
Moving to long-term
Bunch of battery data
Got transition to start at 9:14
Got transition to end at 11:20
So why didn't we convert this into a trip?
Just for the record, we then had the trip from City Hall to Radio Shack 11:22 to 11:31
Spurious trip from 11:44 to 11:55
Ok so now we get to the trip start
Big gap here for some reason. This is not just BART, I think. Probably bad quality points - need to check.
but we haven't ended the trip yet, so this is still fine We finally end the trip at 11:19, which seems fine too.
New trip start at 11:22
Trip end at 11:23:30. Points from 11:24:46 - 11:44:36 ignored because "they are part of the same trip".
Unsure why the trip ended at 11:23:30, so soon after starting at 11:22. Note that the max distance is 56. So basically, the 11th point is still within 100m, so we think that the trip should end. And once it has ended, we are in the "within distance filter and 1 min apart" use case, so we continue the existing trip.
Then, we get a point that was 10 mins apart (at 11:44), so we start a new trip which ends at 11:45
Then, we get another trip start at 11:48, which ends at 11:52. This is the first trip on the phone. Everything seems to work well after that.
Ok... so far so good.
Next step, actual conversion into trips.
This is our house.
Ok so we got a raw trip from 9:20 to 11:19.
And then other short trips
Now, I'm going to focus on the first (long) raw trip (57acdb2a88f6634f22600208) and see what happened to it as it moved through the pipeline.
We found two sections - from 9:20 - 9:36 and 9:41 to 11:19
Jump smoothing did a NOP, just like before.
So it's all down to CLEAN_RESAMPLING now.
Here's the trip in the list.
We have two sections
We get a bunch of recreated locations for the first section.
And the second.
Inserted cleaned sections
Ok so we even have cleaned sections.
Now, we link up the timeline. The last place was home, entered at 7:48.
We have a cleaned trip,
And link the existing cleaned place to the raw place.
which is apparently "Broadway"
Ok so it looks like this is the problem
We have supposedly created and linked the timeline and there are two trips.
2016-08-11 20:08:17,231:DEBUG:Finished creating and linking timeline, returning 2 places and 2 trips
But we say that the timeline starts with a place, and in particular, the Broadway place. Note that when we created the timeline from the raw data, it started with a trip.
So we update the last cleaned place to exit at 9:40
But then we only insert the Broadway place, the next trip, and the DOT place, so 2 places and one trip.
So the trip doesn't show up in the list of cleaned trips, and so it doesn't show up in the timeline.
But wait! Although we had cleaned sections for the trip from home, we apparently didn't really insert a cleaned trip.
Yep, only one trip until the cleaning stage ended.