Closed shankari closed 9 months ago
Got the working code: Updated filtering code to identify the right confirmed trip item. RESULT: The charts are identical with this approach too.
def remove_confirmed_trip_labels():
TS_LIMIT = 1703042805
print("Inside test_function")
#first, remove all trips written after a cutoff time
for t in list(edb.get_timeseries_db().find({"metadata.write_ts": {"$gt": TS_LIMIT}, "metadata.key": {"$regex": '^manual/(mode_confirm|purpose_confirm|replaced_mode)$'}})):
# Below code is referenced from function get_confirmed_obj_for_user_input_obj(ts, ui_obj): in
# emission/storage/decorations/trip_queries.py
ts = esta.TimeSeries.get_aggregate_time_series()
ONE_DAY = 24 * 60 * 60
if 'data' in t and 'start_ts' in t['data']:
start_ts_value = t['data']['start_ts']
else:
pass
tq = estt.TimeQuery("data.start_ts", start_ts_value - ONE_DAY,
start_ts_value + ONE_DAY)
# iff the input's key is one of these, the input belongs on a place
# all other keys are only used for trip inputs
place_keys = ["manual/place_user_input", "manual/place_addition_input"]
if t['metadata']['key'] in place_keys:
# if place, we'll query the same time range, but with 'enter_ts'
tq.timeType = "data.enter_ts"
confirmed_trip_list = list(ts.find_entries(["analysis/confirmed_place"], tq))
# print("Hi")
else:
confirmed_trip_list = list(ts.find_entries(["analysis/confirmed_trip"], tq))
if confirmed_trip_list is None:
print("No matching confirmed trip for %s" % t["data"]["start_fmt_time"])
continue
# Assuming t is your current iteration from the loop
current_user_id = t["user_id"]
current_start_ts = t["data"]["start_ts"]
# Filter the confirmed_list based on the specified conditions
matching_items = [item for item in confirmed_trip_list if
item.get("user_id") == current_user_id
and item.get("metadata", {}).get("key") == "analysis/confirmed_trip"
and item.get("data", {}).get("start_ts") == current_start_ts]
# Check if any matching items were found
if matching_items:
# Access the first matching item
confirmed_trip = matching_items[0]
print("Matching item found:")
if confirmed_trip["data"]["user_input"] == {}:
print("Found confirmed trip with matching inferred trip, without user labels")
else:
print("Getting here")
print(confirmed_trip['data']['user_input'])
update_results = edb.get_analysis_timeseries_db().update_one({"user_id": t["user_id"],
"metadata.key": "analysis/confirmed_trip",
"data.start_ts": t["data"]["start_ts"]}, { "$set": { 'data.user_input': {} } })
print("Update results")
print(update_results)
confirmed_trip["data"]["user_input"] = {}
else:
print("No matching item found for the specified conditions.")
print("delete after timestamp")
edb.get_timeseries_db().delete_many({ "metadata.write_ts": { "$gt": TS_LIMIT } })
edb.get_analysis_timeseries_db().delete_many({ "metadata.write_ts": { "$gt": TS_LIMIT } })
Steps of execution:
1. Dropped the existing database `openpath_prod_usaid_laos_ev`
2. Re-loaded the database from the original snapshot
3. Executed the following commands:
docker-compose -f docker-compose.yml build
docker-compose -f docker-compose.yml up
4. Executed the Generic Metric notebook loading script
Executed the following script for Generic Metrics:
ashrest2-35384s:em-public-dashboard ashrest2$ docker exec -it em-public-dashboard-notebook-server-1 /bin/bash
root@a053cc45b642:/usr/src/app# source setup/activate.sh
(emission) root@a053cc45b642:/usr/src/app# cd saved-notebooks
(emission) root@a053cc45b642:/usr/src/app/saved-notebooks# PYTHONPATH=.. python bin/update_mappings.py mapping_dictionaries.ipynb
(emission) root@a053cc45b642:/usr/src/app/saved-notebooks# PYTHONPATH=.. python bin/generate_plots.py generic_metrics.ipynb default
/usr/src/app/saved-notebooks/bin/generate_plots.py:30: SyntaxWarning: "is not" with a literal. Did you mean "!="?
if r.status_code is not 200:
About to download config from https://raw.githubusercontent.com/e-mission/nrel-openpath-deploy-configs/main/configs/usaid-laos-ev.nrel-op.json
Successfully downloaded config with version 1 for USAID-NREL Support for Electric Vehicle Readiness and data collection URL https://USAID-laos-EV-openpath.nrel.gov/api/
Dynamic labels download was successful for nrel-openpath-deploy-configs: usaid-laos-ev
Running at 2024-01-12T15:30:32.264590+00:00 with args Namespace(plot_notebook='generic_metrics.ipynb', program='default', date=None) for range (<Arrow [2023-05-01T00:00:00+00:00]>, <Arrow [2024-01-01T00:00:00+00:00]>)
Running at 2024-01-12T15:30:32.308029+00:00 with params [Parameter('year', int), Parameter('month', int), Parameter('program', str, value='default'), Parameter('study_type', str, value='study'), Parameter('include_test_users', bool, value=True), Parameter('dynamic_labels', dict, value={'MODE': [{'value': 'walk', 'baseMode': 'WALKING', 'met_equivalent': 'WALKING', 'kgCo2PerKm': 0}, {'value': 'e-auto_rickshaw', 'baseMode': 'MOPED', 'met_equivalent': 'IN_VEHICLE', 'kgCo2PerKm': 0.085416859}, {'value': 'auto_rickshaw', 'baseMode': 'MOPED', 'met_equivalent': 'IN_VEHICLE', 'kgCo2PerKm': 0.231943784}, {'value': 'motorcycle', 'baseMode': 'MOPED', 'met_equivalent': 'IN_VEHICLE', 'kgCo2PerKm': 0.113143309}, {'value': 'e-bike', 'baseMode': 'E_BIKE', 'met': {'ALL': {'range': [0, -1], 'mets': 4.9}}, 'kgCo2PerKm': 0.00728}, {'value': 'bike', 'baseMode': 'BICYCLING', 'met_equivalent': 'BICYCLING', 'kgCo2PerKm': 0}, {'value': 'drove_alone', 'baseMode': 'CAR', 'met_equivalent': 'IN_VEHICLE', 'kgCo2PerKm': 0.22031}, {'value': 'shared_ride', 'baseMode': 'CAR', 'met_equivalent': 'IN_VEHICLE', 'kgCo2PerKm': 0.11015}, {'value': 'e_car_drove_alone', 'baseMode': 'E_CAR', 'met_equivalent': 'IN_VEHICLE', 'kgCo2PerKm': 0.08216}, {'value': 'e_car_shared_ride', 'baseMode': 'E_CAR', 'met_equivalent': 'IN_VEHICLE', 'kgCo2PerKm': 0.04108}, {'value': 'taxi', 'baseMode': 'TAXI', 'met_equivalent': 'IN_VEHICLE', 'kgCo2PerKm': 0.30741}, {'value': 'bus', 'baseMode': 'BUS', 'met_equivalent': 'IN_VEHICLE', 'kgCo2PerKm': 0.20727}, {'value': 'train', 'baseMode': 'TRAIN', 'met_equivalent': 'IN_VEHICLE', 'kgCo2PerKm': 0.12256}, {'value': 'free_shuttle', 'baseMode': 'BUS', 'met_equivalent': 'IN_VEHICLE', 'kgCo2PerKm': 0.20727}, {'value': 'air', 'baseMode': 'AIR', 'met_equivalent': 'IN_VEHICLE', 'kgCo2PerKm': 0.09975}, {'value': 'not_a_trip', 'baseMode': 'UNKNOWN', 'met_equivalent': 'UNKNOWN', 'kgCo2PerKm': 0}, {'value': 'other', 'baseMode': 'OTHER', 'met_equivalent': 'UNKNOWN', 'kgCo2PerKm': 0}], 'PURPOSE': [{'value': 'home'}, {'value': 'work'}, {'value': 'at_work'}, {'value': 'school'}, {'value': 'transit_transfer'}, {'value': 'shopping'}, {'value': 'meal'}, {'value': 'pick_drop_person'}, {'value': 'pick_drop_item'}, {'value': 'personal_med'}, {'value': 'access_recreation'}, {'value': 'exercise'}, {'value': 'entertainment'}, {'value': 'religious'}, {'value': 'other'}], 'translations': {'en': {'walk': 'Walk', 'e-auto_rickshaw': 'E-tuk tuk', 'auto_rickshaw': 'Tuk Tuk', 'motorcycle': 'Motorcycle', 'e-bike': 'E-bike', 'bike': 'Bicycle', 'drove_alone': 'Car Drove Alone', 'shared_ride': 'Car Shared Ride', 'e_car_drove_alone': 'E-Car Drove Alone', 'e_car_shared_ride': 'E-Car Shared Ride', 'taxi': 'Taxi/Loca/inDrive', 'bus': 'Bus', 'train': 'Train', 'free_shuttle': 'Free Shuttle', 'air': 'Airplane', 'not_a_trip': 'Not a trip', 'home': 'Home', 'work': 'To Work', 'at_work': 'At Work', 'school': 'School', 'transit_transfer': 'Transit transfer', 'shopping': 'Shopping', 'meal': 'Meal', 'pick_drop_person': 'Pick-up/ Drop off Person', 'pick_drop_item': 'Pick-up/ Drop off Item', 'personal_med': 'Personal/ Medical', 'access_recreation': 'Access Recreation', 'exercise': 'Recreation/ Exercise', 'entertainment': 'Entertainment/ Social', 'religious': 'Religious', 'other': 'Other'}, 'lo': {'walk': 'ດ້ວຍການຍ່າງ', 'e-auto_rickshaw': 'ລົດ 3 ລໍ້ໄຟຟ້າ ຫລື ຕຸກຕຸກໄຟຟ້າ', 'auto_rickshaw': 'ເດີນທາດ້ວຍ ລົດຕຸກຕຸກ ຫລື ລົດສາມລໍ້', 'motorcycle': 'ລົດຈັກ', 'e-bike': 'ວຍລົດຈັກໄຟຟ້າ', 'bike': 'ລົດຖີບ', 'drove_alone': 'ເດີນທາງ ດ້ວຍລົດໃຫ່ຍ ເຊີ່ງເປັນລົດທີ່ຂັບເອງ', 'shared_ride': 'ເດີນທາງດ້ວຍລົດໃຫ່ຍ ຮ່ວມກັບລົດຄົນອຶ່ນ', 'e_car_drove_alone': 'ດ້ວຍການຂັບລົດໄຟຟ້າໄປເອງ', 'e_car_shared_ride': 'ດ້ວຍການຈ້າງລົດໄຟຟ້າໄປ', 'taxi': 'ແທັກຊີ', 'bus': 'ລົດເມ', 'train': 'ລົດໄຟ', 'free_shuttle': 'ລົດຮັບສົ່ງຟຣີ', 'air': 'ຍົນ', 'not_a_trip': 'ບໍ່ແມ່ນການເດີນທາງ', 'home': 'ບ້ານ', 'work': 'ໄປເຮັດວຽກ', 'at_work': 'ຢູ່ບ່ອນເຮັດວຽກ', 'school': 'ໄປໂຮງຮຽນ', 'transit_transfer': 'ການຖ່າຍໂອນການເດີນທາງ', 'shopping': 'ຊອບປິ້ງ', 'meal': 'ອາຫານ', 'pick_drop_person': 'ໄປຮັບ ຫລື ສົນ ຄົນ', 'pick_drop_item': 'ໄປຮັບ ຫລື ສົ່ງສິນຄ້າ', 'personal_med': 'ໄປຫາໝໍ', 'access_recreation': 'ເຂົ້າເຖິງການພັກຜ່ອນ', 'exercise': 'ພັກຜ່ອນ/ອອກກຳລັງກາຍ', 'entertainment': 'ບັນເທີງ/ສັງຄົມ', 'religious': 'ຈຸດປະສົງທາງສາດສະໜາ', 'other': 'ອື່ນໆ'}}})]
Result: Cleared the cache, and accessed: http://localhost:3274/?study_config=usaid-laos-ev Validated the charts are identical.
Production Snapshot Dec 20th | Docker compose prod |
---|---|
@iantei
Got the working code: Updated filtering code to identify the right confirmed trip item.
This is not using the correct code to identify the right confirmed trip item. I don't see you using match_incoming_inputs
.
Instead, you continue to filter based on the start_ts
# Filter the confirmed_list based on the specified conditions
matching_items = [item for item in confirmed_trip_list if
item.get("user_id") == current_user_id
and item.get("metadata", {}).get("key") == "analysis/confirmed_trip"
and item.get("data", {}).get("start_ts") == current_start_ts]
I am not sure where you found that "the specified conditions" include matching by start_ts
Also, is there a reason why you are copy-pasting code instead of just calling the server function?
@shankari
UPDATE: FIXED - I encountered a challenge while trying to call the server function in the following way:
import emission.analysis.userinput.matcher as eum
...
def remove_confirmed_trip_labels():
TS_LIMIT = 1703042805
print("Inside test_function")
for t in list(edb.get_timeseries_db().find({"metadata.write_ts": {"$gt": TS_LIMIT}, "metadata.key": {"$regex": '^manual/(mode_confirm|purpose_confirm|replaced_mode)$'}})):
current_user_id = t["user_id"]
time_query = epq.get_time_range_for_incoming_userinput_match(current_user_id)
confirmed_trip = eum.match_incoming_inputs(current_user_id, time_query)
if confirmed_trip["data"]["user_input"] == {}:
print("Found confirmed trip with matching inferred trip, without user labels")
else:
print("Getting here")
print(confirmed_trip['data']['user_input'])
update_results = edb.get_analysis_timeseries_db().update_one({"user_id": t["user_id"],
"metadata.key": "analysis/confirmed_trip",
"data.start_ts": t["data"]["start_ts"]}, { "$set": { 'data.user_input': {} } })
print("Update results")
print(update_results)
confirmed_trip["data"]["user_input"] = {}
print("delete after timestamp")
edb.get_timeseries_db().delete_many({ "metadata.write_ts": { "$gt": TS_LIMIT } })
edb.get_analysis_timeseries_db().delete_many({ "metadata.write_ts": { "$gt": TS_LIMIT } })
This resulted into the following error:
File /usr/src/app/emission/analysis/config.py:8, in get_config_data()
6 except:
7 print("analysis.debug.conf.json not configured, falling back to sample, default configuration")
----> 8 config_file = open('conf/analysis/debug.conf.json.sample')
9 ret_val = json.load(config_file)
10 config_file.close()
FileNotFoundError: [Errno 2] No such file or directory: 'conf/analysis/debug.conf.json.sample'
Upon investigating a little more, I found the following. Currently, we have the following:
(emission) root@92cf7066cfd9:/usr/src/app/saved-notebooks# ls conf/
storage
(emission) root@92cf7066cfd9:/usr/src/app/saved-notebooks#
We still have the following:
root@92cf7066cfd9:/usr/src/app# ls conf/storage/
db.conf db.conf.docker.sample db.conf.sample
root@92cf7066cfd9:/usr/src/app# ls conf/analysis/
debug.conf.json.sample trip_model.conf.json.sample
root@92cf7066cfd9:/usr/src/app#
We don’t have conf/analysis/
inside the emission environment, unlike conf/storage/
Therefore, when I try to call import emission.analysis.userinput.matcher as eum
from scaffolding.py, this results into an error.
I will try to see if I can fix this.
Adding the below code in the start_notebook.sh
and executing docker-compose -f docker-compose.yml build
then up
.
mkdir -p saved-notebooks/conf/analysis
cp conf/analysis/debug.conf.json.sample saved-notebooks/conf/analysis/debug.conf.json.sample
cat saved-notebooks/conf/analysis/debug.conf.json.sample
Re-execution of the Generic metrics. There are some issues with the function remove_confirmed_trip_labels()
, investigating to fix it.
Update: Added case to handle when confirmed_trip
is None
.
Getting assertion error for:
epq.get_time_range_for_incoming_userinput_match(current_user_id)
340 assert curr_state.curr_run_ts is None, "curr_state.curr_run_ts = %s" % curr_state.curr_run_ts
AssertionError: curr_state.curr_run_ts = 1705224557.5451016
I don't have much insights about pipeline_queries.py
.
@iantei
Got the working code: Updated filtering code to identify the right confirmed trip item.
This is not using the correct code to identify the right confirmed trip item. I don't see you using
match_incoming_inputs
. Instead, you continue to filter based on thestart_ts
# Filter the confirmed_list based on the specified conditions matching_items = [item for item in confirmed_trip_list if item.get("user_id") == current_user_id and item.get("metadata", {}).get("key") == "analysis/confirmed_trip" and item.get("data", {}).get("start_ts") == current_start_ts]
I am not sure where you found that "the specified conditions" include matching by
start_ts
I took inference for this above condition from here, where there is computation of confirmed_trip
.
https://github.com/e-mission/e-mission-server/blob/master/bin/debug/label_stats.py
for t in list(edb.get_analysis_timeseries_db().find({"metadata.key": "analysis/inferred_trip", "user_id": sel_uuid})):
if t["data"]["inferred_labels"] != []:
confirmed_trip = edb.get_analysis_timeseries_db().find_one({"user_id": t["user_id"],
"metadata.key": "analysis/confirmed_trip",
"data.start_ts": t["data"]["start_ts"]})
if confirmed_trip is None:
print("No matching confirmed trip for %s" % t["data"]["start_fmt_time"])
continue
if confirmed_trip["data"]["user_input"] == {}:
print("Found confirmed trip with matching inferred trip, without user labels")
Getting assertion error for:
epq.get_time_range_for_incoming_userinput_match(current_user_id) 340 assert curr_state.curr_run_ts is None, "curr_state.curr_run_ts = %s" % curr_state.curr_run_ts AssertionError: curr_state.curr_run_ts = 1705224557.5451016
Debugging update: Getting this error when the current_user_id
is same in following call to the epq.get_time_range_for_incoming_userinput_match()
.
def identify_unique_uuids():
TS_LIMIT = 1703042805
unique_uuids = set(item["user_id"] for item in list(edb.get_timeseries_db().find({"metadata.write_ts": {"$gt": TS_LIMIT}, "metadata.key": {"$regex": '^manual/(mode_confirm|purpose_confirm|replaced_mode)$'}})))
for unique_uuid in unique_uuids:
print("The uuid is: " + str(unique_uuid))
time_query = epq.get_time_range_for_incoming_userinput_match(unique_uuid)
print ("Time Query is: " + str(time_query))
Re-execution of the above block leads to same ASSERTION error again.
Clear the database, load the dataset and re-execute.
Since, epq.get_time_range_for_incoming_userinput_match()
throws ASSERTION error when we try to call it with same UUID again (user_id). I created a map such that we have key value pairing of the UUIDs and corresponding time range.
def identify_unique_uuids():
TS_LIMIT = 1703042805
unique_uuids = set(item["user_id"] for item in list(edb.get_timeseries_db().find({"metadata.write_ts": {"$gt": TS_LIMIT}, "metadata.key": {"$regex": '^manual/(mode_confirm|purpose_confirm|replaced_mode)$'}})))
uuid_time_map = {}
for unique_uuid in unique_uuids:
# print("The uuid is: " + str(unique_uuid))
time_query = epq.get_time_range_for_incoming_userinput_match(unique_uuid)
# print ("Time Query is: " + str(time_query))
uuid_time_map[unique_uuid] = time_query
return uuid_time_map
Now, we will make a call to epq.get_time_range_for_incoming_userinput_match(unique_uuid)
just once for an unique uuid.
def remove_confirmed_trip_labels():
TS_LIMIT = 1703042805
print("Inside test_function")
# Create a set of current_user_id s
# Map of this unique current_user_id with the timerange
resulting_uuid_time_map = identify_unique_uuids()
for uuid, time_query in resulting_uuid_time_map.items():
print(f"UUID: {uuid}, Time Query: {time_query}")
# Other alternative is that some values needs to be reset.
# I really don't see that happening in the server code which is calling get_time_range_for_incoming_userinput_match - so I am gonna ignore that for now
for t in list(edb.get_timeseries_db().find({"metadata.write_ts": {"$gt": TS_LIMIT}, "metadata.key": {"$regex": '^manual/(mode_confirm|purpose_confirm|replaced_mode)$'}})):
# print("Values associated with t")
# print(t)
# print("\n")
current_user_id = t["user_id"]
print ("The current user id is: " + str(current_user_id))
time_query = resulting_uuid_time_map[current_user_id]
print ("Time Query is: " + str(time_query))
confirmed_trip = eum.match_incoming_inputs(current_user_id, time_query)
# time_query = epq.get_time_range_for_incoming_userinput_match(current_user_id)
# print ("Time Query is: " + str(time_query))
# confirmed_trip = eum.match_incoming_inputs(current_user_id, time_query)
print("The value of confirmed trip")
print(confirmed_trip)
if confirmed_trip is None:
# Code to be executed if confirmed_trip is None
print("No confirmed trip found.")
else:
# Code to be executed if confirmed_trip is not None
print("Confirmed trip found.")
print("Inside here")
if confirmed_trip["data"]["user_input"] == {}:
print("Found confirmed trip with matching inferred trip, without user labels")
else:
print("Getting here")
print(confirmed_trip['data']['user_input'])
update_results = edb.get_analysis_timeseries_db().update_one({"user_id": t["user_id"],
"metadata.key": "analysis/confirmed_trip",
"data.start_ts": t["data"]["start_ts"]}, { "$set": { 'data.user_input': {} } })
print("Update results")
print(update_results)
confirmed_trip["data"]["user_input"] = {}
Calling scaffolding.remove_confirmed_trip_labels()
.
This results in getting all the confirmed_trip
values as None
, which likely happens inside the match_incoming_inputs(user_id, timerange)
in matcher.py
if len(toMatchInputs) == 0:
logging.debug("len(toMatchInputs) == 0, early return")
return None
I am unsure whether using the unique corresponding timerange for a unique UUID is a good idea, but the other way around
epq.get_time_range_for_incoming_userinput_match(unique_uuid)
is throwing error for the second encounter of the same UUID. I am a bit cornered about what to do next.
Both of these approaches doesn't seem to work properly.
Summary of the approach attempted:
The function inside matcher.py
- match_incoming_inputs(user_id, timerange)
takes two arguments - user_id, timerange
I have extracted user_id
from the list of items in edb.get_timeseries_db()
filtered with with timestamp and manual/mode*
.
To get the timerange, I used the following:
time_query = epq.get_time_range_for_incoming_userinput_match(user_id)
which is used in match_incoming_user_inputs()
function in matcher.py
itself.
Since, second call to the match_incoming_inputs(user_id, timerange)
with same user_id was resulting in ASSERTION error. I created a map for two unique user_id
to its corresponding time range
.
This way, I expected to get the confirmed_trip
, but it just returned None
.
Since the calculation of timerange
only depends on the user_id
, the approach of using mapping for user_id
to timerange
should be fine as well?
I took inference for this above condition from here, where there is computation of confirmed_trip. https://github.com/e-mission/e-mission-server/blob/master/bin/debug/label_stats.py
That finds the confirmed trip for an inferred trip, not for a user input. It is not enough to check the output, you also need to check the input.
Why are you trying to call get_time_range_for_incoming_userinput_match
?
Here's the pseudocode that I suggested
for me in to_be_deleted_manual_entries:
matching_confirmed_trip = ea...find_matching_confirmed_trip(me)
del matching_confirmed_trip['data']['user_input']
Where does that have time_range_for_incoming_userinput
?
I am taking over now
After reverting to a previous snapshot by using the script in https://github.com/e-mission/em-public-dashboard/pull/112, which uses the standard matching algorithm, and incorporates multiple assertions to validate the reset, I still get the same values https://github.com/e-mission/em-public-dashboard/pull/112#issuecomment-1891274132
next steps:
Changes have been pushed to production, closing this now.
Confirmed using https://openpath-stage.nrel.gov/public/?study_config=smart-commute-ebike which does not have the energy impact calculation