Open shankari opened 1 year ago
Ah ok, so we deployed from master instead of GIS. Still doesn't help us figure out if the previous entry was incorrect or not, but let's recreate for now
2023-06-26 02:14:49,934:DEBUG:140505714460480:orig_ts_db_matches = 0, analysis_ts_db_matches = 5
2023-06-26 02:14:49,959:DEBUG:140505714460480:Returning entry with length 5 result
2023-06-26 02:14:49,960:ERROR:140505714460480:Error while inferring modes, timestamp is unchanged
Traceback (most recent call last):
File "/usr/src/app/emission/analysis/classification/inference/mode/pipeline.py", line 41, in predict_mode
mip.runPredictionPipeline(user_id, time_query)
File "/usr/src/app/emission/analysis/classification/inference/mode/pipeline.py", line 139, in runPredictionPipeline
self.loadModelStage()
File "/usr/src/app/emission/analysis/classification/inference/mode/pipeline.py", line 156, in loadModelStage
self.model = seedp.ModeInferencePipelineMovesFormat.loadModel()
File "/usr/src/app/emission/analysis/classification/inference/mode/seed/pipeline.py", line 93, in loadModel
fd = open(SAVED_MODEL_FILENAME, "r")
FileNotFoundError: [Errno 2] No such file or directory: 'seed_model.json'
I still don't understand the "807" matches for that one trip, but let's investigate that later
This also brings up another issue. In the pipeline in general, we don't create objects in later stages if an earlier stage fails. The confirmed objects are an exception, because we try to use the inferred sections but then fall back to cleaned sections if they exist. But then once the pipeline is fixed, and the inferred sections are in fact filled in, we will not (with the current implementation) go back and fill in the confirmed trip.
Need to figure out how to fix that.
For now, reset the pipeline for this single user. After redeploying from the GIS branch, everything works
'inferred_section_summary': {'distance': {'WALKING': 403.4288117669756}, 'duration': {'WALKING': 641.4911608695984}, 'count': {'WALKING': 1}},
'cleaned_section_summary': {'distance': {'ON_FOOT': 403.4288117669756}, 'duration': {'ON_FOOT': 641.4911608695984}, 'count': {'ON_FOOT': 1}}
Launching the migration script and calling it a day for now
I have stashed changes and moved the following files
bin/debug/delete_composite_objects_and_state.py
emission/storage/json_wrappers.py
Launching the migration script and calling it a day for now
Ran into several errors during the migration
Found error ObjectId('62d16908048ceaa2b4ee5769') while processing pipeline for user 0951d756-b17a-4a33-aa90-a44830928a9e, check log files for details
Found error ObjectId('634854f22c03517c520aaff0') while processing pipeline for user af046be2-3593-42b3-b2e5-ea4ba8287bbf, check log files for details
Found error ObjectId('62e8876c9f3797eb9d1c0b31') while processing pipeline for user b74def1e-a134-4c81-bffc-616c8a72639d, check log files for details
The error is apparently that there is no matching composite trip for a particular confirmed trip.
ERROR:root:Found error ObjectId('62daf6770fd38d61e7e9a884') while processing pipeline for user fdcb2d34-c8e8-4
d5a-8d98-e0e2d397e490, skipping
Traceback (most recent call last):
File "/Users/kshankar/e-mission/nrel-db-connect/bin/historical/migrations/add_sections_and_summaries_to_trip
s.py", line 27, in add_sections_to_trips
add_sections_to_trips_for_user(uuid)
File "/Users/kshankar/e-mission/nrel-db-connect/bin/historical/migrations/add_sections_and_summaries_to_trip
s.py", line 45, in add_sections_to_trips_for_user
matching_composite_trip = composite_trips_map[t["_id"]]
KeyError: ObjectId('62daf6770fd38d61e7e9a884')
Need to see where we broke those links, likely during an earlier migration.
Double checking this before I start the giant back-to-back meetings. All these errors are because there were confirmed trips but no composite trips
DEBUG:root:curr_query = {'invalid': {'$exists': False}, 'user_id': UUID('af046be2-3593-42b3-b2e5-ea4ba8287bbf'), '$or': [{'metadata.key': 'analysis/cleaned_trip'}]}, sort_key = metadata.write_ts
DEBUG:root:orig_ts_db_keys = [], analysis_ts_db_keys = ['analysis/cleaned_trip']
DEBUG:root:finished querying values for ['analysis/cleaned_trip'], count = 2
DEBUG:root:orig_ts_db_matches = 0, analysis_ts_db_matches = 2
DEBUG:root:curr_query = {'invalid': {'$exists': False}, 'user_id': UUID('af046be2-3593-42b3-b2e5-ea4ba8287bbf'), '$or': [{'metadata.key': 'analysis/confirmed_trip'}]}, sort_key = metadata.write_ts
DEBUG:root:orig_ts_db_keys = [], analysis_ts_db_keys = ['analysis/confirmed_trip']
DEBUG:root:finished querying values for ['analysis/confirmed_trip'], count = 2
DEBUG:root:orig_ts_db_matches = 0, analysis_ts_db_matches = 2
DEBUG:root:curr_query = {'invalid': {'$exists': False}, 'user_id': UUID('af046be2-3593-42b3-b2e5-ea4ba8287bbf'), '$or': [{'metadata.key': 'analysis/composite_trip'}]}, sort_key = metadata.write_ts
DEBUG:root:orig_ts_db_keys = [], analysis_ts_db_keys = ['analysis/composite_trip']
DEBUG:root:finished querying values for ['analysis/composite_trip'], count = 0
DEBUG:root:orig_ts_db_matches = 0, analysis_ts_db_matches = 0
ERROR:root:Found error ObjectId('634854f22c03517c520aaff0') while processing pipeline for user af046be2-3593-42b3-b2e5-ea4ba8287bbf, skipping
Traceback (most recent call last):
File "/Users/kshankar/e-mission/nrel-db-connect/bin/historical/migrations/add_sections_and_summaries_to_trips.py", line 27, in add_sections_to_trips
add_sections_to_trips_for_user(uuid)
File "/Users/kshankar/e-mission/nrel-db-connect/bin/historical/migrations/add_sections_and_summaries_to_trips.py", line 45, in add_sections_to_trips_for_user
matching_composite_trip = composite_trips_map[t["_id"]]
KeyError: ObjectId('634854f22c03517c520aaff0')
Ok so this is happening because of an error with the backwards compat migration code in the composite trips. I noticed this in the error even before the migration. I'm pulling the stage data so we can investigate and fix this as part of pipeline fixes
2023-06-26 17:10:31,050:ERROR:140368867940160:Error while creating composite objects, timestamp is unchanged
Traceback (most recent call last):
File "/usr/src/app/emission/analysis/plotting/composite_trip_creation.py", line 138, in create_composite_objects
last_done_ts = create_composite_trip(ts, t)
File "/usr/src/app/emission/analysis/plotting/composite_trip_creation.py", line 62, in create_composite_trip
assert next_trip is not None and next_trip["metadata"]["key"] == "analysis/confirmed_trip" \
AssertionError: for 634854f22c03517c520aaff0 found existing_end_confirmed_place={'_id': ObjectId('641ba62b34dda48794b6be65'), 'user_id': UUID('af046be2-3593-42b3-b2e5-ea4ba8287bbf'), 'metadata': {'key': 'analysis/confirmed_place', 'platform': 'server', 'write_ts': 1665684710.845848, 'time_zone': 'America/Los_Angeles', 'write_local_dt': {'year': 2022, 'month': 10, 'day': 13, 'hour': 11, 'minute': 11, 'second': 50, 'weekday': 3, 'timezone': 'America/Los_Angeles'}, 'write_fmt_time': '2022-10-13T11:11:50.845848-07:00'}, 'data': {'source': 'DwellSegmentationTimeFilter', 'enter_ts': 1665173762.129, 'enter_local_dt': {'year': 2022, 'month': 10, 'day': 7, 'hour': 13, 'minute': 16, 'second': 2, 'weekday': 4, 'timezone': 'America/Los_Angeles'}, 'enter_fmt_time': '2022-10-07T13:16:02.129000-07:00', 'location': {'type': 'Point', 'coordinates': [-104.9618674, 39.6717046]}, 'raw_places': [ObjectId('634854df2c03517c520aafad'), ObjectId('634854df2c03517c520aafad')], 'ending_trip': ObjectId('634854e62c03517c520aafb9'), 'starting_trip': ObjectId('634854e62c03517c520aafcb'), 'exit_ts': 1665173767.129, 'exit_fmt_time': '2022-10-07T13:16:02.129000-07:00', 'exit_local_dt': {'year': 2022, 'month': 10, 'day': 7, 'hour': 13, 'minute': 16, 'second': 2, 'weekday': 4, 'timezone': 'America/Los_Angeles'}, 'duration': 5.0, 'cleaned_place': ObjectId('634854e62c03517c520aafdd'), 'user_input': {}, 'additions': []}} but next_trip=None
While experimenting with trip time calculation, ran into this again, and it again had exactly 807 matches. https://github.com/e-mission/op-admin-dashboard/pull/61#issuecomment-1784145680
So this means that I either did not reset the pipeline for this user on staging or there is something super weird about this user that ends up with 807 matches. https://github.com/e-mission/e-mission-docs/issues/927#issuecomment-1606674206 It seems likely to be an overpass error because the odds that it would fail again exactly 807 errors is incredibly small.
>>> all_807_matches = list(edb.get_analysis_timeseries_db().find({'metadata.key': 'analysis/inferred_section', 'user_id': UUID('d83a43a1-df6b-42ed-986f-f5b5f6150221'), 'data.cleaned_section': ObjectId('644df8edea199f1d0473e301')}))
>>> all_807_matches_df = pd.json_normalize(all_807_matches)
>>> len(all_807_matches_df)
807
>>> all_807_matches_df["data.cleaned_section"].unique()
array([ObjectId('644df8edea199f1d0473e301')], dtype=object)
>>> all_807_matches_df["data.sensed_mode"].unique()
array([2])
When were these entries created
>>> all_807_matches_df["metadata.write_fmt_time"].head()
0 2023-04-29T22:13:17.690103-07:00
1 2023-04-29T22:13:17.690103-07:00
2 2023-04-29T22:13:17.690103-07:00
3 2023-04-29T22:13:17.690103-07:00
4 2023-04-29T22:13:17.690103-07:00
...
>>> all_807_matches_df["metadata.write_fmt_time"].tail()
802 2023-04-29T22:13:17.690103-07:00
803 2023-04-29T22:13:17.690103-07:00
804 2023-04-29T22:13:17.690103-07:00
805 2023-04-29T22:13:17.690103-07:00
806 2023-04-29T22:13:17.690103-07:00
At exactly the same time (with the same exact timestamp), way before this was supposedly reset (on Jun 25)
Resetting and re-running the pipeline for this user in the copied staging.
We may also want to replace that assertion by checking to see if all the entries as the same - e.g. seeing if unique
has only one set of values, at least on production.
>>> all_807_matches_df["metadata.write_ts"].unique()
array([1.6828316e+09])
>>> all_807_matches[0]["metadata"]["write_ts"]
1682831597.6901028
>>> all_807_matches[-1]["metadata"]["write_ts"]
1682831597.6901028
Ran into this for another user
>>> all_4_entries = pd.json_normalize(list(edb.get_analysis_timeseries_db().find({'metadata.key': 'analysis/inferred_section', 'user_id': UUID('fdcb2d34-c8e8-4d5a-8d98-e0e2d397e490'), 'data.cleaned_section': ObjectId('6313b50f0b4e353e781a95a4')})))
>>> all_4_entries["data.cleaned_section"]
0 6313b50f0b4e353e781a95a4
1 6313b50f0b4e353e781a95a4
2 6313b50f0b4e353e781a95a4
3 6313b50f0b4e353e781a95a4
Name: data.cleaned_section, dtype: object
>>> all_4_entries["data.cleaned_section"].unique()
array([ObjectId('6313b50f0b4e353e781a95a4')], dtype=object)
>>> all_4_entries["metadata.write_fmt_time"]
0 2022-09-03T13:11:59.935477-07:00
1 2022-09-03T13:11:59.935477-07:00
2 2022-09-03T13:11:59.935477-07:00
3 2022-09-03T13:11:59.935477-07:00
Name: metadata.write_fmt_time, dtype: object
>>> all_4_entries["data.start_fmt_time"]
0 2022-09-03T10:33:00.996521-07:00
1 2022-09-03T10:33:00.996521-07:00
2 2022-09-03T10:33:00.996521-07:00
3 2022-09-03T10:33:00.996521-07:00
Name: data.start_fmt_time, dtype: object
>>> all_4_entries["data.sensed_mode"]
0 5
1 5
2 5
3 5
Name: data.sensed_mode, dtype: int64
>>>
While experimenting with trip time calculation, ran into this again, and it again had exactly 807 matches. e-mission/op-admin-dashboard#61 (comment)
Resetting and re-running the pipeline for this user in the copied staging. We may also want to replace that assertion by checking to see if all the entries as the same - e.g. seeing if
unique
has only one set of values, at least on production.>>> all_807_matches_df["metadata.write_ts"].unique() array([1.6828316e+09]) >>> all_807_matches[0]["metadata"]["write_ts"] 1682831597.6901028 >>> all_807_matches[-1]["metadata"]["write_ts"] 1682831597.6901028
There are a lot of 807 matches for this user:
I am 99% sure this is not correct. When I poked through the data, the 807 entries was definitely an outlier. Please double-check and document with the script for checking this....
And regardless, there is a workaround getting rid of the 807 entries, so I am not sure why this is relevant.
I was using this code
>>> df = pd.json_normalize(list(edb.get_analysis_timeseries_db().find({'metadata.key': 'analysis/inferred_section'})))
coun
>>> counts = df.groupby(['data.cleaned_section', 'user_id']).size().reset_index(name='count')
>>> dup_counts=counts[counts['count']>1]
>>> print('\n'.join([f"Data.cleaned_section {row['data.cleaned_section']}, user_id {row['user_id']}: Count{row['count']}" for _, row in dup_counts.iterrows() ]))
>>> all_807_entries = pd.json_normalize(list(edb.get_analysis_timeseries_db().find({'metadata.key': 'analysis/inferred_section', 'user_id': UUID('d83a43a1-df6b-42ed-986f-f5b5f6150221'), 'data.cleaned_section'
: ObjectId('644dec55ea199f1d0473a31d')})))
>>> len(all_807_entries)
807
>>> all_807_entries["data.cleaned_section"].unique()
array([ObjectId('644dec55ea199f1d0473a31d')], dtype=object)
>>> all_807_entries["metadata.write_fmt_time"].head()
0 2023-04-29T21:19:33.405336-07:00
1 2023-04-29T21:19:33.405336-07:00
2 2023-04-29T21:19:33.405336-07:00
3 2023-04-29T21:19:33.405336-07:00
4 2023-04-29T21:19:33.405336-07:00
Name: metadata.write_fmt_time, dtype: object
>>> all_807_entries["metadata.write_fmt_time"].tail()
802 2023-04-29T21:19:33.405336-07:00
803 2023-04-29T21:19:33.405336-07:00
804 2023-04-29T21:19:33.405336-07:00
805 2023-04-29T21:19:33.405336-07:00
806 2023-04-29T21:19:33.405336-07:00
Name: metadata.write_fmt_time, dtype: object
>>> all_807_entries = pd.json_normalize(list(edb.get_analysis_timeseries_db().find({'metadata.key': 'analysis/inferred_section', 'user_id': UUID('d83a43a1-df6b-42ed-986f-f5b5f6150221'), 'data.cleaned_section': ObjectId('644dec45ea199f1d0473a2aa')})))
>>> all_807_entries["data.cleaned_section"].unique()
array([ObjectId('644dec45ea199f1d0473a2aa')], dtype=object)
>>> all_807_entries["data.cleaned_section"].unique()
array([ObjectId('644dec45ea199f1d0473a2aa')], dtype=object)
>>> all_807_entries["metadata.write_fmt_time"].head()
0 2023-04-29T21:19:17.707970-07:00
1 2023-04-29T21:19:17.707970-07:00
2 2023-04-29T21:19:17.707970-07:00
3 2023-04-29T21:19:17.707970-07:00
4 2023-04-29T21:19:17.707970-07:00
Name: metadata.write_fmt_time, dtype: object
>>> all_807_entries["metadata.write_fmt_time"].tail()
802 2023-04-29T21:19:17.707970-07:00
803 2023-04-29T21:19:17.707970-07:00
804 2023-04-29T21:19:17.707970-07:00
805 2023-04-29T21:19:17.707970-07:00
806 2023-04-29T21:19:17.707970-07:00
Name: metadata.write_fmt_time, dtype: object
While experimenting with trip time calculation, ran into this again, and it again had exactly 807 matches.
I was getting the same error while checking trip time calculation, that is why I was looking into it.
correct, but as I said in https://github.com/e-mission/e-mission-docs/issues/927#issuecomment-1804395333
And regardless, there is a workaround getting rid of the 807 entries, so I am not sure why this is relevant.
You should use the workaround and move on
While testing the deployment of https://github.com/e-mission/e-mission-server/pull/917 to stage, I found that the inferred section summaries were filled in but the cleaned section summaries were not.
'cleaned_trip': ObjectId('6498f494a9956a43810c2caf'), 'inferred_section_summary': {'distance': {}, 'duration': {}, 'count': {}}, 'cleaned_section_summary': {'distance': {'ON_FOOT': 407.50089074389757}, 'duration': {'ON_FOOT': 714.4223742485046}, 'count': {'ON_FOOT': 1}}
That seemed weird, since there did not appear to be any errors while generating the mode inference.
I then looked for the matching sections and there is indeed one cleaned section and no inferred sections.
However, if I try to find one inferred section, and
and find the set of sections for that trip, I find
which looks wrong.
We may be messing up the inferred section to trip mappings. Need to investigate further.