Closed shankari closed 1 year ago
ok so we do have confirmed places, but only two
2023-04-26 00:09:34,213:DEBUG:140704418879104:Found {'curr_confirmed_place_count': 2, 'curr_composite_trip_count': 0} existing entries before this run
That's because we had some unprocessed data and created confirmed objects just before this step
2023-04-26 00:09:33,891:DEBUG:140704418879104:cleaned timeline has 2 places and 1 trips
2023-04-26 00:09:33,922:DEBUG:140704418879104:Iterating through 1 cleaned trips to create confirmed trips
2023-04-26 00:09:34,024:DEBUG:140704418879104:keys = (analysis/confirmed_place, analysis/confirmed_trip), len(places) = 2, len(trips) = 1
And fortunately, the start confirmed trip does have both enter and exit times
2023-04-26 00:09:34,024:DEBUG:140704418879104:to find start, comparing place 6448ce2dd226a3fcb64bcdbf with trip 6448ce2dd226a3fcb64bcdc0
2023-04-26 00:09:34,026:DEBUG:140704418879104:Starting with element of type place, id 6448ce2dd226a3fcb64bcdbf, details
Entry({'_id': ObjectId('6448ce2dd226a3fcb64bcdbf'),
'metadata': Metadata({'key': 'analysis/confirmed_place',
'enter_fmt_time': '2023-04-22T10:19:20.999886-04:00',
'exit_fmt_time': '2023-04-23T13:31:44.541360-04:00'})
So we cannot assume that there are no confirmed places in the code below. We have to check for each trip individually.
assert "additions" not in ct["data"]
logging.info("old-style format, converting from cleaned to confirmed")
needs_hack = True
assert statuscheck["curr_confirmed_place_count"] == 0
convert_cleaned_to_confirmed(ts, ct, keys)
ct["data"]["additions"] = []
Added
diff --git a/emission/analysis/plotting/composite_trip_creation.py b/emission/analysis/plotting/composite_trip_creation.py
index d9f7d575..7b5541a2 100644
--- a/emission/analysis/plotting/composite_trip_creation.py
+++ b/emission/analysis/plotting/composite_trip_creation.py
@@ -34,10 +34,14 @@ def create_composite_trip(ts, ct):
convert_cleaned_to_confirmed(ts, ct, keys)
del ct["data"]["trip_addition"]
else:
+ import emission.core.get_database as edb
assert "additions" not in ct["data"]
logging.info("old-style format, converting from cleaned to confirmed")
needs_hack = True
- assert statuscheck["curr_confirmed_place_count"] == 0
+ existing_start_confirmed_place = edb.get_analysis_timeseries_db().find_one({"data.cleaned_place": ct["data"]["start_place"]})
+ existing_end_confirmed_place = edb.get_analysis_timeseries_db().find_one({"data.cleaned_place": ct["data"]["end_place"]})
+ assert existing_start_confirmed_place is None and existing_end_confirmed_place is None,\
+ f"for {ct['_id']} {existing_start_confirmed_place=} and {existing_end_confirmed_place=}"
convert_cleaned_to_confirmed(ts, ct, keys)
ct["data"]["additions"] = []
But now getting
Error while creating composite objects, timestamp is unchanged
Traceback (most recent call last):
File "/Users/kshankar/e-mission/gis_branch_tests/emission/analysis/plotting/composite_trip_creation.py", line 119, in create_composite_objects
last_done_ts = create_composite_trip(ts, t)
File "/Users/kshankar/e-mission/gis_branch_tests/emission/analysis/plotting/composite_trip_creation.py", line 43, in create_composite_trip
assert existing_start_confirmed_place is None and existing_end_confirmed_place is None,\
I'm still getting this after clearing, re-loading data and re-running the pipeline, so it is not just an artifact of the previous failed runs
Looking more carefully at place creation...
2023-04-26 11:18:59,762:INFO:140704418879104:For stage PipelineStages.CREATE_CONFIRMED_OBJECTS, start_ts = 2023-04-22T14:19:25.999886
2023-04-26 11:18:59,865:DEBUG:140704418879104:cleaned timeline has 2 places and 1 trips
2023-04-26 11:18:59,858:DEBUG:140704418879104:for trip = 64496b135ab8a40aadd1002e, adding start_place 6443ef9de109f0a38c98df2d and end_place 64496b135ab8a40aadd10027
2023-04-26 11:19:00,095:INFO:140704418879104:For stage PipelineStages.CREATE_CONFIRMED_OBJECTS, last_ts_processed = 2023-04-23T18:38:43.000002
existing_start_confirmed_place={'_id': ObjectId('64496b155ab8a40aadd10036')
'metadata': {'key': 'analysis/confirmed_place',
'cleaned_place': ObjectId('643f27a8665ea5d30f22644f'),
But 643f27a8665ea5d30f22644f
is not one of the cleaned places that we processed in CREATE_CONFIRMED_OBJECTS.
I bet we created it as the end place of a previous trip
Bingo!
2023-04-26 11:19:00,672:INFO:140704418879104:old-style format, converting from cleaned to confirmed
2023-04-26 11:19:01,018:DEBUG:140704418879104:Setting the start_place key to the newly created id 64496b155ab8a40aadd10035
2023-04-26 11:19:01,175:DEBUG:140704418879104:Setting the end_place key to the newly created id 64496b155ab8a4
0aadd10036
While investigating this further, ran into a weird issue
here was an existing confirmed place
AssertionError: For 6443ef9ee109f0a38c98df38 found existing_start_confirmed_place={'_id': ObjectId('64497553eb8b9db3c666b50b'),
'metadata': {'key': 'analysis/confirmed_place',
'enter_fmt_time': '2023-04-22T10:19:20.999886-04:00',
'exit_fmt_time': '2023-04-23T13:31:44.541360-04:00'}}
'cleaned_place': ObjectId('6443ef9de109f0a38c98df2d'),
'ending_trip': ObjectId('6443ef9ce109f0a38c98df0b')
'starting_trip': ObjectId('64497553eb8b9db3c666b50c') but no associated composite trip
On looking up the related objects, it looks like the confirmed place has a cleaned trip on one side
>>> edb.get_analysis_timeseries_db().find_one({"_id": boi.ObjectId("6443ef9ce109f0a38c98df0b")})
{'_id': ObjectId('6443ef9ce109f0a38c98df0b'),
'metadata': {'key': 'analysis/cleaned_trip',
'start_place': ObjectId('6443473cf01764a706cdfffd'),
'end_place': ObjectId('6443ef9de109f0a38c98df2d')}}
The end_place of the cleaned trip is the same as the cleaned place of the confirmed place. Why are we setting this to cleaned place and not confirmed place?
ah, because when we were creating confirmed objects, we read the first cleaned place in the incremental timeline
2023-04-26 14:30:39,988:DEBUG:140704418879104:place id 6443ef9de109f0a38c98df2d is not in the map, searching i
2023-04-26 14:30:39,990:DEBUG:140704418879104:retrieved object
{'_id': ObjectId('6443ef9de109f0a38c98df2d'),
'metadata': {'key': 'analysis/cleaned_place',
'ending_trip': ObjectId('6443ef9ce109f0a38c98df0b'), 'starting_trip': ObjectId('644997fec78091492d
) and added to id_map
2023-04-26 14:30:39,991:DEBUG:140704418879104:cleaned timeline has 2 places and 1 trips
Since this was incremental, it had both an ending and a starting trip There was no last confirmed place found, so we created a new one
2023-04-26 14:30:40,023:DEBUG:140704418879104:no last confirmed place found, created place with id 64499800c78
091492d035265
and we linked the trip to the end of this confirmed place
2023-04-26 14:30:40,051:DEBUG:140704418879104:for trip
Entry({'_id': ObjectId('64499800c78091492d035266'),
'metadata': Metadata({'key': 'analysis/confirmed_trip') start, linking to Entry({'_id': ObjectId('64499800c78091492d035265'),
'metadata': Metadata({'key': 'analysis/confirmed_place',
But we don't link the previous confirmed trip (created in the previous round) to the ending trip
I will fix this, but it might have been faster to just reset the pipelines instead of this death of a thousand cuts
Yay! Finally ran without any errors
2023-04-26T15:09:16.482916-07:00**********UUID 1858a5d7-d255-4dbe-8f82-df5e03c25f9c: populating expectations**********
2023-04-26T15:09:16.503540-07:00**********UUID 1858a5d7-d255-4dbe-8f82-df5e03c25f9c: creating confirmed objects **********
2023-04-26T15:09:16.718601-07:00**********UUID 1858a5d7-d255-4dbe-8f82-df5e03c25f9c: creating composite objects **********
While trying to re-run from scratch, got another error
2023-04-26 16:32:31,572:ERROR:140704418879104:Error while creating composite objects, timestamp is unchanged
Traceback (most recent call last):
File "/Users/kshankar/e-mission/gis_branch_tests/emission/analysis/plotting/composite_trip_creation.py", line 134, in create_composite_objects
last_done_ts = create_composite_trip(ts, t)
File "/Users/kshankar/e-mission/gis_branch_tests/emission/analysis/plotting/composite_trip_creation.py", line 48, in create_composite_trip
assert previous_trip is not None,\
AssertionError: For 6443ef9ee109f0a38c98df38 found existing_start_confirmed_place={'_id': ObjectId('6449b47e8cb54312cd3e8f3e'),
The place was a first place and the trip was the trip that ended at it
2023-04-26 16:32:14,578:DEBUG:140704418879104:no last confirmed place found, created place with id 6449b47e8cb54312cd3e8f3e
2023-04-26 16:32:14,706:DEBUG:140704418879104:Found matching_confirmed_trip={
'_id': ObjectId('6443ef9ee109f0a38c98df38'),
'metadata': {'key': 'analysis/confirmed_trip'
'start_place': ObjectId('6443473cf01764a706cdfffd'),
'end_place': ObjectId('6443ef9de109f0a38c98df2d'),
'cleaned_trip': ObjectId('6443ef9ce109f0a38c98df0b'),
for curr_confirmed_start_place.data.ending_trip=ObjectId('6443ef9ce109f0a38c98df0b')
What is surprising is that this is the trip that is ending at 6449b47e8cb54312cd3e8f3e
.
But our assertion claims that this is the starting_place.
For 6443ef9ee109f0a38c98df38 found existing_start_confirmed_place={'_id': ObjectId('6449b47e8cb54312cd3e8f3e'),
'cleaned_place': ObjectId('6443ef9de109f0a38c98df2d'),
Even though it clearly is not while querying the database
>>> edb.get_analysis_timeseries_db().find_one({"_id": boi.ObjectId("6443ef9ee109f0a38c98df38")})
'_id': ObjectId('6443ef9ee109f0a38c98df38'),
'start_place': ObjectId('6443473cf01764a706cdfffd'),
'end_place': ObjectId('6443ef9de109f0a38c98df2d'),
'cleaned_trip': ObjectId('6443ef9ce109f0a38c98df0b'),
Wait, the end_place does seem to match up with the cleaned_place of the confirmed_place though. Is this some stupid start/end lookup error?
Yes, and there was another stupid start/end lookup error. But they are all resolved now, and running the pipeline on a freshly loaded instance works without errors
2023-04-26T23:16:05.187683-07:00**********UUID 1858a5d7-d255-4dbe-8f82-df5e03c25f9c: inferring labels**********
2023-04-26T23:16:05.255804-07:00**********UUID 1858a5d7-d255-4dbe-8f82-df5e03c25f9c: populating expectations**********
2023-04-26T23:16:05.341649-07:00**********UUID 1858a5d7-d255-4dbe-8f82-df5e03c25f9c: creating confirmed objects **********
2023-04-26T23:16:05.814577-07:00**********UUID 1858a5d7-d255-4dbe-8f82-df5e03c25f9c: creating composite objects **********
2023-04-26T23:16:25.543376-07:00**********UUID 1858a5d7-d255-4dbe-8f82-df5e03c25f9c: storing views to cache**********
Ran this on the full Durham dataset and found 5 errors
2023-04-27 12:19:08,054:ERROR:140704418879104:Error while creating confirmed objects, timestamp is unchanged
2023-04-27 13:35:37,082:ERROR:140704418879104:Error while creating composite objects, timestamp is unchanged
2023-04-27 15:00:17,573:ERROR:140704418879104:Error while creating composite objects, timestamp is unchanged
2023-04-27 15:11:17,732:ERROR:140704418879104:Error while creating composite objects, timestamp is unchanged
2023-04-27 15:22:50,331:ERROR:140704418879104:Error while creating composite objects, timestamp is unchanged
Investigating further...
All of the composite object creation errors are due to database timeout
2023-04-27 13:35:37,082:ERROR:140704418879104:Error while creating composite objects, timestamp is unchanged
pymongo.errors.CursorNotFound: cursor id 111869280301504251 not found, full error: {'ok': 0.0, 'errmsg': 'cursor id 111869280301504251 not found', 'code': 43, 'codeName': 'CursorNotFound'}
2023-04-27 15:00:17,573:ERROR:140704418879104:Error while creating composite objects, timestamp is unchanged
pymongo.errors.CursorNotFound: cursor id 4860483662849025561 not found, full error: {'ok': 0.0, 'errmsg': 'cur
sor id 4860483662849025561 not found', 'code': 43, 'codeName': 'CursorNotFound'}
2023-04-27 15:11:17,732:ERROR:140704418879104:Error while creating composite objects, timestamp is unchanged
pymongo.errors.CursorNotFound: cursor id 1637737902225172962 not found, full error: {'ok': 0.0, 'errmsg': 'cur
sor id 1637737902225172962 not found', 'code': 43, 'codeName': 'CursorNotFound'}
2023-04-27 15:22:50,331:ERROR:140704418879104:Error while creating composite objects, timestamp is unchanged
pymongo.errors.CursorNotFound: cursor id 5764353970537108054 not found, full error: {'ok': 0.0, 'errmsg': 'cursor id 5764353970537108054 not found', 'code': 43, 'codeName': 'CursorNotFound'}
from the backtrace
Traceback (most recent call last):
File "/Users/kshankar/e-mission/gis_branch_tests/emission/analysis/plotting/composite_trip_creation.py", line 133, in create_composite_objects
for t in triplikeEntries:
File "/Users/kshankar/miniconda-23.1.0/envs/emission/lib/python3.9/site-packages/pymongo/cursor.py", line 1248, in next
if len(self.__data) or self._refresh():
File "/Users/kshankar/miniconda-23.1.0/envs/emission/lib/python3.9/site-packages/pymongo/cursor.py", line 1188, in _refresh
self.__send_message(g)
File "/Users/kshankar/miniconda-23.1.0/envs/emission/lib/python3.9/site-packages/pymongo/cursor.py", line 1052, in __send_message
response = client._run_operation(
File "/Users/kshankar/miniconda-23.1.0/envs/emission/lib/python3.9/site-packages/pymongo/_csot.py", line 105, in csot_wrapper
return func(self, *args, **kwargs)
File "/Users/kshankar/miniconda-23.1.0/envs/emission/lib/python3.9/site-packages/pymongo/mongo_client.py", line 1330, in _run_operation
return self._retryable_read(
File "/Users/kshankar/miniconda-23.1.0/envs/emission/lib/python3.9/site-packages/pymongo/_csot.py", line 105, in csot_wrapper
return func(self, *args, **kwargs)
File "/Users/kshankar/miniconda-23.1.0/envs/emission/lib/python3.9/site-packages/pymongo/mongo_client.py", line 1448, in _retryable_read
return func(session, server, sock_info, read_pref)
File "/Users/kshankar/miniconda-23.1.0/envs/emission/lib/python3.9/site-packages/pymongo/mongo_client.py", line 1326, in _cmd
return server.run_operation(
File "/Users/kshankar/miniconda-23.1.0/envs/emission/lib/python3.9/site-packages/pymongo/server.py", line 134, in run_operation
_check_command_response(first, sock_info.max_wire_version)
File "/Users/kshankar/miniconda-23.1.0/envs/emission/lib/python3.9/site-packages/pymongo/helpers.py", line 179, in _check_command_response
raise CursorNotFound(errmsg, code, response, max_wire_version)
The related code is at
triplikeEntries = ts.find_entries([esda.CONFIRMED_TRIP_KEY, esda.CONFIRMED_UNTRACKED_KEY], time_query=time_query)
last_done_ts = None
count_created = 0
for t in triplikeEntries:
last_done_ts = create_composite_trip(ts, t)
count_created += 1
logging.debug("Created %d composite trips" % count_created if count_created > 0
else "No new triplike entries to process, no composite trips created")
epq.mark_composite_object_creation_done(user_id, last_done_ts)
I guess if we have a lot of confirmed trips, then, since we create the composite trips before iterating, it is possible for the cursor to timeout.
A simple fix (at the expense of additional memory consumption) would be to just read all the trips into a list.
This seems like it would not result in too much memory consumption since we read all the trips to build models and to generate the public dashboard outputs.
Let's just double-check the time taken in each of the cases before making the change. Yup! The timeout seems to be ~ 10 minutes
2023-04-27 13:25:19,995:INFO:140704418879104:**********UUID ...: creating composite objects **********
2023-04-27 13:35:37,082:ERROR:140704418879104:Error while creating composite objects, timestamp is unchanged
2023-04-27 14:50:04,823:INFO:140704418879104:**********UUID ...: creating composite objects **********
2023-04-27 15:00:17,573:ERROR:140704418879104:Error while creating composite objects, timestamp is unchanged
2023-04-27 15:00:55,132:INFO:140704418879104:**********UUID ...: creating composite objects **********
2023-04-27 15:11:17,732:ERROR:140704418879104:Error while creating composite objects, timestamp is unchanged
The confirmed trip creation is more complex. It turns out that the reason is that we have a timeline with a single place only, which is the last place of the timeline (has only entry, does not have exit)
2023-04-27 12:19:08,034:DEBUG:140704418879104:keys = (analysis/cleaned_place, analysis/expected_trip), len(places) = 1, len(trips) = 0
2023-04-27 12:19:08,034:DEBUG:140704418879104:places_or_stops = [ObjectId('633fd8c5beb878b8f772434f')]
2023-04-27 12:19:08,035:DEBUG:140704418879104:trips_or_sections = []
2023-04-27 12:19:08,035:DEBUG:140704418879104:Starting with element of type place, id 633fd8c5beb878b8f772434f, details Entry({'_id': ObjectId('633fd8c5beb878b8f772434f'), 'metadata': {'key': 'analysis/cleaned_place',
'enter_fmt_time': '...',
'raw_places': [], 'ending_trip': ObjectId('633fd8b1beb878b8f772432a')}})
2023-04-27 12:19:08,035:DEBUG:140704418879104:cleaned timeline has 1 places and 0 trips
But then
2023-04-27 12:19:08,054:DEBUG:140704418879104:last place doc = None
so we try to create the confirmed place from the cleaned place
curr_confirmed_start_place = create_confirmed_entry(ts,
timeline.first_place(), esda.CONFIRMED_PLACE_KEY, keys)
but the timeline.first_place() is None, which leads to
2023-04-27 12:19:08,054:ERROR:140704418879104:Error while creating confirmed objects, timestamp is unchanged
Traceback (most recent call last):
File "/Users/kshankar/e-mission/gis_branch_tests/emission/analysis/userinput/matcher.py", line 94, in create_confirmed_objects
confirmed_tl = create_and_link_timeline(ts, timeline, last_confirmed_place)
File "/Users/kshankar/e-mission/gis_branch_tests/emission/analysis/userinput/matcher.py", line 121, in create_and_link_timeline
curr_confirmed_start_place = create_confirmed_entry(ts,
File "/Users/kshankar/e-mission/gis_branch_tests/emission/analysis/userinput/matcher.py", line 194, in create_confirmed_entry
confirmed_object_data = copy.copy(tce["data"])
TypeError: 'NoneType' object is not subscriptable
And the timeline first place is None because there are no trips, only places
def first_place(self):
if self.first_trip() is not None and \
self.first_trip().data.start_place is not None:
return self.get_object(self.first_trip().data.start_place)
else:
return None
There are two possible solutions for https://github.com/e-mission/e-mission-docs/issues/898#issuecomment-1526862427
fix first_place
so that it works even when there is only one place in the timeline
skip the confirmed place creation if there is the cleaned place is None a. in this case, we will fill in the confirmed place while creating composite trips, as part of the backwards compat codebase
seems a bit easier, but also seems hackier. We will remove the backwards compat logic after a while, who will create the corresponding confirmed place then? a. arguably, the next run of the pipeline will include trips so we will run then b. but what if the pipeline does not run again? What if this is the final run? If we were matching inputs to places, we would miss the inputs to this final place.
seems more principled. If I call first_place
, I should get the first place of the timeline, if it exists. That will also ensure that we do process all cleaned -> confirmed conversions.
Let's try (1) first and fall back to (2) if (1) doesn't work
with these fixes, a new error
2023-04-27 20:01:33,450:DEBUG:140704418879104:cleaned timeline has 1 places and 0 trips
2023-04-27 20:01:33,459:DEBUG:140704418879104:last place doc = {'_id': ObjectId('644ad15c03ce06258b305ebd'), 'metadata': {'key': 'analysis/confirmed_place',
'enter_fmt_time': '...',
'raw_places': [ObjectId('6362a8e45aa370f03d71be8d')],
'ending_trip': ObjectId('6362a97d5aa370f03d71be93'),
'cleaned_place': ObjectId('6362a9835aa370f03d71bec0'),
'user_input': {}, 'additions': []}}
2023-04-27 20:01:33,460:ERROR:140704418879104:Error while creating confirmed objects, timestamp is unchanged
Traceback (most recent call last):
File "/Users/kshankar/e-mission/gis_branch_tests/emission/analysis/userinput/matcher.py", line 94, in create_confirmed_objects
confirmed_tl = create_and_link_timeline(ts, timeline, last_confirmed_place)
File "/Users/kshankar/e-mission/gis_branch_tests/emission/analysis/userinput/matcher.py", line 150, in create_and_link_timeline
curr_confirmed_start_place["data"]["exit_local_dt"] = matching_cleaned_place.data.exit_local_dt
File "/Users/kshankar/e-mission/gis_branch_tests/emission/core/wrapper/wrapperbase.py", line 71, in __getattr__
raise AttributeError(
AttributeError: 'Cleanedplace' instance has no attribute 'exit_local_dt'
So the issue here is that the last cleaned place doesn't have an exit timestamp. That is should be pretty common, since most of the time (unless they take trips every hour), users will be at a final place for a while. If the pipeline runs multiple times during that time, the last place will be the place without an exit timestamp.
How do we deal with this in our sample clean code?
In the sample clean code, we are creating cleaned places from scratch (because of the squishing) and don't just copy over the raw information. So the first place code looks like
if curr_cleaned_start_place is None:
# If it is not present - maybe this user is getting started for the first
# time, we create an entry based on the first trip from the timeline
curr_cleaned_start_place = get_filtered_place(tl.first_place())
logging.debug("no last cleaned place found, created place with id %s" % curr_cleaned_start_place.get_id())
# We just created this place here, so lets add it to the created places
# and insert rather than update it
cleaned_places.append(curr_cleaned_start_place)
else:
logging.debug("Cleaned place %s found, using it" % curr_cleaned_start_place.get_id())
There's not much in the else
After making all the changes in
🔙 ⚡ 🐛 🧠 ✅ Backwards compat, robustness and bug-fixes for migration and incremental pipeline runs
re-loaded the old data and re-ran the new pipeline
New error
Error while creating composite objects, timestamp is unchanged
Traceback (most recent call last):
File "/Users/kshankar/e-mission/gis_branch_tests/emission/analysis/plotting/composite_trip_creation.py", line 136, in create_composite_objects
last_done_ts = create_composite_trip(ts, t)
File "/Users/kshankar/e-mission/gis_branch_tests/emission/analysis/plotting/composite_trip_creation.py", line 57, in create_composite_trip
assert next_trip is not None and next_trip["metadata"]["key"] == "analysis/confirmed_trip" \
AssertionError: for 63594365a720fd515389d185 found
existing_end_confirmed_place={'_id': ObjectId('644d9601682e00944e4ecd5a'),
'metadata': {'key': 'analysis/confirmed_place',
'data': {'enter_fmt_time': '2022-10-26T10:04:37-04:00',
'ending_trip': ObjectId('63594365a720fd515389d185'),
'cleaned_place': ObjectId('63594364a720fd515389d178'),
'user_input': {}, 'additions': []}} but next_trip=None
Error while creating composite objects, timestamp is unchanged
It seems like this composite place does not have a starting trip (so it is the end of the chain) and so the next trip should in fact be none.
We can add a fix for this, but it is not clear why it is not failing for other users?
I guess these are the users that had unprocessed data, so these terminating confirmed places were created in CREATE_CONFIRMED_OBJECTS
Let's verify before fixing.
/var/tmp/intake_0-errors.log:2023-04-29 15:32:26,106:ERROR:140704418879104:Error while creating composite objects, timestamp is unchanged
/var/tmp/intake_0.log:2023-04-29 15:32:26,106:ERROR:140704418879104:Error while creating composite objects, timestamp is unchanged
/var/tmp/intake_1-errors.log:2023-04-29 15:30:01,268:ERROR:140704418879104:Error while creating composite objects, timestamp is unchanged
/var/tmp/intake_1-errors.log:2023-04-29 15:46:39,880:ERROR:140704418879104:Error while creating composite objects, timestamp is unchanged
/var/tmp/intake_1.log:2023-04-29 15:30:01,268:ERROR:140704418879104:Error while creating composite objects, timestamp is unchanged
/var/tmp/intake_1.log:2023-04-29 15:46:39,880:ERROR:140704418879104:Error while creating composite objects, timestamp is unchanged
Let's start with the first of these:
AssertionError: for 633fda30beb878b8f7724399
found existing_end_confirmed_place={'_id': ObjectId('644d98e55f92e1613f9f4926')
'metadata': {'key': 'analysis/confirmed_place'},
'data': {'enter_fmt_time': '2022-10-05T14:00:05.628000-04:00'
'ending_trip': ObjectId('633fda30beb878b8f7724399'),
'cleaned_place': ObjectId('633fd8c5beb878b8f772434f')
'user_input': {}, 'additions': []}} but next_trip=None
Confirmed
2023-04-29 15:23:33,090:INFO:140704418879104:For stage PipelineStages.CREATE_CONFIRMED_OBJECTS, start_ts = 2022-10-05T17:58:29.613000
2023-04-29 15:23:33,246:DEBUG:140704418879104:Considering place 633fd8c5beb878b8f772434f: 2022-10-05T14:00:05.628000-04:00 -> None
2023-04-29 15:23:33,246:DEBUG:140704418879104:keys = (analysis/cleaned_place, analysis/expected_trip), len(places) = 1, len(trips) = 0
2023-04-29 15:23:33,246:DEBUG:140704418879104:cleaned timeline has 1 places and 0 trips
2023-04-29 15:23:33,268:DEBUG:140704418879104:last place doc = None
2023-04-29 15:23:33,391:DEBUG:140704418879104:no last confirmed place found, created place with id 644d98e55f92e1613f9f4926
2023-04-29 15:23:33,450:DEBUG:140704418879104:Found matching_confirmed_trip={'_id': ObjectId('633fda30beb878b8f7724399'),
'metadata': {'key': 'analysis/confirmed_trip'
'data': {
'end_fmt_time': '2022-10-05T13:58:24.613000-04:00'
'start_fmt_time': '2022-10-05T13:52:12.304000-04:00'
'start_place': ObjectId('633fd8c5beb878b8f772434e'), 'end_place': ObjectId('633fd8c5beb878b8f772434f'),
'cleaned_trip': ObjectId('633fd8b1beb878b8f772432a'),
'user_input': {}}} for curr_confirmed_start_place.data.ending_trip=ObjectId('633fd8b1beb878b8f772432a')
2023-04-29 15:23:33,450:DEBUG:140704418879104:Creating confirmed timeline with 1 places and 0 trips
2023-04-29 15:23:33,452:DEBUG:140704418879104:Returning multi_result.inserted_ids = [ObjectId('644d98e55f92e1613f9f4926')]... of length 1
Checking the second
AssertionError: for 63594365a720fd515389d185 found existing_end_confirmed_place={'_id': ObjectId('644d9601682e00944e4ecd5a'),
'metadata': {'key': 'analysis/confirmed_place'
'data': {'enter_fmt_time': '2022-10-26T10:04:37-04:00'
'cleaned_place': ObjectId('63594364a720fd515389d178'),
'user_input': {}, 'additions': []}} but next_trip=None
Confirmed
2023-04-29 15:11:13,107:DEBUG:140704418879104:no last confirmed place found, created place with id 644d9601682e00944e4ecd5a
2023-04-29 15:11:13,170:DEBUG:140704418879104:Found matching_confirmed_trip={'_id': ObjectId('63594365a720fd515389d185')} for curr_confirmed_start_place.data.ending_trip=ObjectId('63594362a720fd515389d165')
2023-04-29 15:11:13,180:DEBUG:140704418879104:Creating confirmed timeline with 1 places and 0 trips
2023-04-29 15:11:13,182:DEBUG:140704418879104:Returning multi_result.inserted_ids = [ObjectId('644d9601682e00944e4ecd5a')]... of length 1
Re-ran the pipeline multiple times on the Durham data.
After restore, first Durham run completed without any unexpected errors. without restore, second Durham run completed without any unexpected errors. without restore, third Durham run completed without any unexpected errors
Ran the pipeline against the mm_masscec dataset as well and there were no errors. I finally declare this issue resolved.
To test https://github.com/e-mission/e-mission-phone/pull/957, I loaded some "old format" data.
To see it in the new client, I had to re-run the pipeline to create composite trips. Note that this is what will happen when we upgrade the existing production systems to the new master
While running the pipeline, I get the following error