🧹 ✈️ 👤 Pending cleanups

Abby-Wheelis commented 2 months ago

Centralizing some cleanups that need to happen pending the unification of base mode colors:

[ ] finish unifying the labels
[x] remove the csvs
[ ] filter by baseMode for "land trips" see this comment
[ ] consider displaying user-entered modes if they're common enough

At a high level, in a future fix, I think we should consider user labels if they represent a large enough proportion of the aggregate data. So if 40% of the data is "Other" and 90% of the "Other" is "Air", maybe we should just show "Air". But that is going to require a lot of judgement calls - what is "large enough proportion"? what if the text entered by users is subtly different? so let's punt on that for now.

user mode display comment

Abby-Wheelis commented 2 months ago

what is "large enough proportion"?

My initial though of what would be a "large enough" proportion of user-entered labels might be "if they're big enough to not get grouped into other anyways", which I suppose could result in a lot of bars that are just bigger than our cutoff (2%), so we would probably want to do some experimenting.

We could also use the cutoff that for the numbers to be printed in the bar - which is currently 4%

what if the text entered by users is subtly different?

We can easily take care of things like capitalization and spaces, but other factors like spelling or subtle wording (roller skate vs roller blade) might get difficult to control for

Abby-Wheelis commented 2 months ago

Additional Cleanup required - the patches to prevent both "Other" and "OTHER" are messy

iantei commented 2 months ago

We do not have INVALID as BASE_MODES in e-mission-common base_modes.py's list of BASE_MODES. We will need to update the list with INVALID into the list of BASE_MODES as move to use BASE_MODES for mapping labels to color palette.

Abby-Wheelis commented 2 months ago

We do not have INVALID as BASE_MODES in e-mission-common base_modes.py's list of BASE_MODES. We will need to update the list with INVALID into the list of BASE_MODES as move to use BASE_MODES for mapping labels to color palette.

The patch added INVALID as a sensed mode, not as a base mode, so I'm not sure we need to add it to the base modes, unless we discover a need for an INVALID base mode as well as sensed mode

sensed_values = ["WALKING", "BICYCLING", "IN_VEHICLE", "AIR_OR_HSR", "UNKNOWN", "OTHER", "INVALID"]

iantei commented 2 months ago

Yes, INVALID is added up as a sensed mode. However, we are either mapping internal labels to the base mode, or using the base mode as is with sensed_values. Therefore, we would need to add INVALID as a base mode.

colors_sensed = dict(zip(sensed_values, [BASE_MODES[x.upper()]['color'] for x in sensed_values]))

Abby-Wheelis commented 2 months ago

Additional cleanup coming out of #148 related to fleet and survey deployments:

[ ] consider combining survey_metrics and generic_metrics_sensed - there does not seem to be a reason to replace the sensed mode with BLE sensed mode in two separate notebooks - I think we can add the new charts from survey_metrics to generic_metrics_sensed, since they do display sensed modes, and then not show them in the dashboard for label-based deployments, since those charts really just replace those in generic-metrics 148#discussion_r1766009942
[ ] stop notebooks that don't need to run for survey deployments from running to avoid wasting time and resources (eg generic-metrics) 148#issuecomment-2360045768

Abby-Wheelis commented 2 months ago

More cleanup from #145

[ ] use server code to filter inferred confidence comment on #145
[ ] remove all use of "display modes" and just translate the modes when it's time to display them comment here
[ ] clean up labels as mentioned in this comment
[ ] investigate dropping NaN inferred values here

iantei commented 2 months ago

Use server code to filter inferred confidence:

I am merging this for now, but note that there is a similar function in the server when we handle metrics to include trips with confidence > 90% as "confirmed". We do not and should not need to reinvent code.

https://github.com/e-mission/e-mission-server/blob/52adee205f686d87e167bd4b1d166098938870c6/emission/analysis/result/metrics/time_grouping.py#L134

Made some code changes:

ashrest2-41625s:em-public-dashboard ashrest2$ git diff viz_scripts/scaffolding.py 
diff --git a/viz_scripts/scaffolding.py b/viz_scripts/scaffolding.py
index 880c819..a20b13c 100644
--- a/viz_scripts/scaffolding.py
+++ b/viz_scripts/scaffolding.py
@@ -7,6 +7,7 @@ from collections import OrderedDict
 import difflib

 import emission.storage.timeseries.abstract_timeseries as esta
+import emission.storage.decorations.trip_queries as esdt
 import emission.storage.timeseries.tcquery as esttc
 import emission.core.wrapper.localdate as ecwl
 import emcommon.diary.base_modes as emcdb
@@ -239,8 +240,8 @@ async def load_viz_notebook_inferred_data(year, month, program, study_type, dyna
     # Access database
     tq = get_time_query(year, month)
     participant_ct_df = load_all_participant_trips(program, tq, include_test_users)
-    inferred_ct = filter_inferred_trips(participant_ct_df)
-    expanded_it = expand_inferredlabels(inferred_ct)
+    inferred_ct = esdt.has_final_labels_df(participant_ct_df)
+    expanded_it = esdt.expand_finallabels(inferred_ct)
     expanded_it = await map_trip_data(expanded_it, study_type, dynamic_labels, dic_re, dic_pur)

Result:	Old	Server-code implementation

There is a difference in the implementation of my inferred labels trips bars and the one in which server generates it.

Firstly the filtering logic:
- Current:
  inferred_ct = mixed_trip_df[(mixed_trip_df['inferred_labels'].apply(lambda x: bool(x))) | (mixed_trip_df.user_input != {})] return inferred_ct
- Server: def has_final_labels(confirmed_trip_data): return (confirmed_trip_data["user_input"] != {} or confirmed_trip_data["expectation"]["to_label"] == False)
Secondly, the expand_finalLabels logic:

Current: Check for user_inputs: add to the list If user_input == {} Check for inferred_labels, but with max ‘p’ value greater than confidence_threshold; add to the list Else: add ‘mode_confirm’ = ‘uncertain’ Convert this list into a data frame, and append it to the original data frame. Filter out all the rows where mode_confirm = “uncertain”
Server: Filters out with user_input - adds into the list uses expectation.to_label to filter out high_confidence_no_userinput_df = labeled_ct[ (labeled_ct.user_input == {}) & (expectation_expansion.to_label == False) ]

@shankari Do you recommend following the server approach for the Labeled and Inferred bars in the Stacked Bar chart? I have used confidence_threshold to filter out after knowing there's no user_input, that's different than the approach of using expectation column for the trip.

While executing the server approach, I encountered the below issue for cortezebikes, will look more into it:

There is no ['analysis/confirmed_trip'], therefore there is no expectation column.

Details of the issue:

``` Running at 2024-09-25T04:56:02.857039+00:00 with params [Parameter('year', int, value=2023), Parameter('month', int, value=10), Parameter('program', str, value='default'), Parameter('study_type', str, value='program'), Parameter('include_test_users', bool, value=False), Parameter('dynamic_labels', dict, value={}), Parameter('use_imperial', bool, value=True), Parameter('sensed_algo_prefix', str, value='cleaned')] Running at 2024-09-25T04:56:10.199455+00:00 with params [Parameter('year', int, value=2023), Parameter('month', int, value=11), Parameter('program', str, value='default'), Parameter('study_type', str, value='program'), Parameter('include_test_users', bool, value=False), Parameter('dynamic_labels', dict, value={}), Parameter('use_imperial', bool, value=True), Parameter('sensed_algo_prefix', str, value='cleaned')] Traceback (most recent call last): File "/usr/src/app/saved-notebooks/bin/generate_plots.py", line 111, in compute_for_date(month_year.month, month_year.year) File "/usr/src/app/saved-notebooks/bin/generate_plots.py", line 104, in compute_for_date nbclient.execute(new_nb) File "/root/miniconda-23.5.2/envs/emission/lib/python3.9/site-packages/nbclient/client.py", line 1314, in execute return NotebookClient(nb=nb, resources=resources, km=km, **kwargs).execute() File "/root/miniconda-23.5.2/envs/emission/lib/python3.9/site-packages/jupyter_core/utils/__init__.py", line 165, in wrapped return loop.run_until_complete(inner) File "/root/miniconda-23.5.2/envs/emission/lib/python3.9/asyncio/base_events.py", line 647, in run_until_complete return future.result() File "/root/miniconda-23.5.2/envs/emission/lib/python3.9/site-packages/nbclient/client.py", line 709, in async_execute await self.async_execute_cell( File "/root/miniconda-23.5.2/envs/emission/lib/python3.9/site-packages/nbclient/client.py", line 1062, in async_execute_cell await self._check_raise_for_error(cell, cell_index, exec_reply) File "/root/miniconda-23.5.2/envs/emission/lib/python3.9/site-packages/nbclient/client.py", line 918, in _check_raise_for_error raise CellExecutionError.from_cell_and_msg(cell, exec_reply_content) nbclient.exceptions.CellExecutionError: An error occurred while executing the following cell: ------------------ expanded_ct_inferred, file_suffix_inferred, quality_text_inferred, debug_df_inferred = await scaffolding.load_viz_notebook_inferred_data(year, month, program, study_type, dynamic_labels, dic_re, dic_pur=dic_pur, include_test_users=include_test_users) ------------------ ----- stderr ----- DEBUG:root:In get_filter_query, returning query {'data.start_local_dt.year': {'$gte': 2023, '$lte': 2023}, 'data.start_local_dt.month': {'$gte': 11, '$lte': 11}} ----- stderr ----- DEBUG:root:curr_query = {'invalid': {'$exists': False}, '$or': [{'metadata.key': 'analysis/confirmed_trip'}], 'data.start_local_dt.year': {'$gte': 2023, '$lte': 2023}, 'data.start_local_dt.month': {'$gte': 11, '$lte': 11}}, sort_key = None ----- stderr ----- DEBUG:root:orig_ts_db_keys = [], analysis_ts_db_keys = ['analysis/confirmed_trip'] ----- stderr ----- DEBUG:root:finished querying values for [], count = 0 ----- stderr ----- DEBUG:root:In get_filter_query, returning query {'data.start_local_dt.year': {'$gte': 2023, '$lte': 2023}, 'data.start_local_dt.month': {'$gte': 11, '$lte': 11}} ----- stderr ----- DEBUG:root:finished querying values for ['analysis/confirmed_trip'], count = 0 ----- stderr ----- DEBUG:root:orig_ts_db_matches = 0, analysis_ts_db_matches = 0 ----- stderr ----- DEBUG:root:Found 0 results ----- stdout ----- Loaded all confirmed trips of length 0 ------------------ --------------------------------------------------------------------------- AttributeError Traceback (most recent call last) Cell In[7], line 1 ----> 1 expanded_ct_inferred, file_suffix_inferred, quality_text_inferred, debug_df_inferred = await scaffolding.load_viz_notebook_inferred_data(year, 2 month, 3 program, 4 study_type, 5 dynamic_labels, 6 dic_re, 7 dic_pur=dic_pur, 8 include_test_users=include_test_users) File /usr/src/app/saved-notebooks/scaffolding.py:243, in load_viz_notebook_inferred_data(year, month, program, study_type, dynamic_labels, dic_re, dic_pur, include_test_users) 241 tq = get_time_query(year, month) 242 participant_ct_df = load_all_participant_trips(program, tq, include_test_users) --> 243 inferred_ct = esdt.has_final_labels_df(participant_ct_df) 244 print("Hellooooooooo") 245 expanded_it = esdt.expand_finallabels(inferred_ct) File /usr/src/app/emission/storage/decorations/trip_queries.py:343, in has_final_labels_df(df) 340 def has_final_labels_df(df): 341 # print(df.expectation) 342 # print(pd.DataFrame(df.expectation.to_list(), index=df.index)) --> 343 to_list_series = pd.DataFrame(df.expectation.to_list(), index=df.index).to_label 344 return df[(df.user_input != {}) 345 | (to_list_series == False)] File ~/miniconda-23.5.2/envs/emission/lib/python3.9/site-packages/pandas/core/generic.py:5902, in NDFrame.__getattr__(self, name) 5895 if ( 5896 name not in self._internal_names_set 5897 and name not in self._metadata 5898 and name not in self._accessors 5899 and self._info_axis._can_hold_identifiers_and_holds_name(name) 5900 ): 5901 return self[name] -> 5902 return object.__getattribute__(self, name) AttributeError: 'DataFrame' object has no attribute 'expectation' (emission) root@22f87f8030df:/usr/src/app/saved-notebooks# ```

iantei commented 2 months ago

Well, upon a further investigation. I found the following:

userinput/expectations.py

# This is a placeholder. TODO: implement the real algorithm
def _get_expectation_for_trip(trip):
    raw_expectation = eace.get_expectation(trip)
    # For now, expect always labeling unless the config file specifies no labeling at all
    processed_expectation = not raw_expectation["type"] == "none"
    return {"to_label": processed_expectation}

Seemingly, the algorithm for expectation is not implemented yet, therefore it makes sense why it was giving same result for both "Labeled" and "Labeled and Inferred Labels" bar.

shankari commented 2 months ago

@iantei

Seemingly, the algorithm for expectation is not implemented yet,

This is not correct. We do return a non-zero value, it is just not as sophisticated as we would like

While executing the server approach, I encountered the below issue for cortezebikes, will look more into it:

This is probably a data load error since if we have cleaned trips, we have confirmed trips

I have used confidence_threshold to filter out after knowing there's no user_input, that's different than the approach of using expectation column for the trip.

The functionality in the server code and your code is similar but not identical

As I said earlier

but note that there is a similar function in the server when we handle metrics to include trips with confidence > 90% as "confirmed"

The way in which the data is handled is similar but the filtering is different. You need to try handling the data in the same way but changing the filter at the end.

iantei commented 2 months ago

This is probably a data load error since if we have cleaned trips, we have confirmed trips

Yes, that's a data load error. I used the same has_final_labels_df(df) with an additional check of

def has_final_labels_df(df):
    if len(df) == 0:
        return df

It worked fine

iantei commented 2 months ago

@shankari

Changed the confidence threshold to 90%.
Updated the filter at the end, with the use of dropna()

I have submitted the PR #158 for review. Please let me know if you wanted anything different.

shankari commented 2 months ago

@iantei please discuss with @Abby-Wheelis when she returns on Monday. The 90% threshold is the filter. Changing the public dashboard to use the 90% threshold will not move the needle on inferred trips.

Abby-Wheelis commented 1 month ago

Observed in the dataset that all trips have expectation: {to_label - True}, this aligns with the comment in the server code where # For now, expect always labeling unless the config file specifies no labeling at all where all will be expected to label.

Observed in testExpandFinalLabelsPandasFunctionsNestedPostFilter that sample data does not expect labels if already labeled by user or if p value of a set of inferred labels is >= 0.9. However, we have not found in the code where the expectation to label is set to false if the trip is labeled or there is a p value greater than 0.9 present. Should we update the server code to change to_label to false once user input or reasonable inferred labels are present? Did we overlook the code where this already exists?

Found relevant PR for expanding the labels here: #846 The commit which added the expectation to label was 3 years ago here

It seems like using the confidence_threshold, like the phone does, or using the approach from the server seem like they should be separate approaches. It seems like ideally we would use the server approach, but the hangup with the expectation.to_label makes that complicated.

e-mission / em-public-dashboard

🧹 ✈️ 👤 Pending cleanups #149