Open Abby-Wheelis opened 1 month ago
For one of the problematic deployments, I found an error in the logs when trying to run generic-metrics
:
expanded_ct_sensed, file_suffix_sensed, quality_text_sensed, debug_df_sensed = scaffolding.load_viz_notebook_sensor_inference_data(...
The same call in generic_metrics_sensed
also throws an error, and in generic_timeseries
Clearly there is a but with the load_viz_notebook_sensor_inference_data
call, I think my next step is to try and reproduce locally.
Looking at the details a bit more while I wait for the data to load, it looks like the pie chart was generated on 5/10, so is likely from just before the changes were merged. I'm guessing the two charts have the same name.
Reproduced and have the full error now! Ran out of time today but will resume tomorrow
Loaded expanded_ct with length 41449 for None
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
Cell In[6], line 1
----> 1 expanded_ct_sensed, file_suffix_sensed, quality_text_sensed, debug_df_sensed = scaffolding.load_viz_notebook_sensor_inference_data(year,
2 month,
3 program,
4 include_test_users,
5 sensed_algo_prefix)
File /usr/src/app/saved-notebooks/scaffolding.py:243, in load_viz_notebook_sensor_inference_data(year, month, program, include_test_users, sensed_algo_prefix)
241 print(f"Loaded expanded_ct with length {len(expanded_ct)} for {tq}")
242 if len(expanded_ct) > 0:
--> 243 expanded_ct["primary_mode_non_other"] = participant_ct_df.cleaned_section_summary.apply(lambda md: max(md["distance"], key=md["distance"].get))
244 expanded_ct.primary_mode_non_other.replace({"ON_FOOT": "WALKING"}, inplace=True)
245 valid_sensed_modes = ["WALKING", "BICYCLING", "IN_VEHICLE", "AIR_OR_HSR", "UNKNOWN"]
File ~/miniconda-23.5.2/envs/emission/lib/python3.9/site-packages/pandas/core/series.py:4771, in Series.apply(self, func, convert_dtype, args, **kwargs)
4661 def apply(
4662 self,
4663 func: AggFuncType,
(...)
4666 **kwargs,
4667 ) -> DataFrame | Series:
4668 """
4669 Invoke function on values of Series.
4670
(...)
4769 dtype: float64
4770 """
-> 4771 return SeriesApply(self, func, convert_dtype, args, kwargs).apply()
File ~/miniconda-23.5.2/envs/emission/lib/python3.9/site-packages/pandas/core/apply.py:1123, in SeriesApply.apply(self)
1120 return self.apply_str()
1122 # self.f is Callable
-> 1123 return self.apply_standard()
File ~/miniconda-23.5.2/envs/emission/lib/python3.9/site-packages/pandas/core/apply.py:1174, in SeriesApply.apply_standard(self)
1172 else:
1173 values = obj.astype(object)._values
-> 1174 mapped = lib.map_infer(
1175 values,
1176 f,
1177 convert=self.convert_dtype,
1178 )
1180 if len(mapped) and isinstance(mapped[0], ABCSeries):
1181 # GH#43986 Need to do list(mapped) in order to get treated as nested
1182 # See also GH#25959 regarding EA support
1183 return obj._constructor_expanddim(list(mapped), index=obj.index)
File ~/miniconda-23.5.2/envs/emission/lib/python3.9/site-packages/pandas/_libs/lib.pyx:2924, in pandas._libs.lib.map_infer()
File /usr/src/app/saved-notebooks/scaffolding.py:243, in load_viz_notebook_sensor_inference_data.<locals>.<lambda>(md)
241 print(f"Loaded expanded_ct with length {len(expanded_ct)} for {tq}")
242 if len(expanded_ct) > 0:
--> 243 expanded_ct["primary_mode_non_other"] = participant_ct_df.cleaned_section_summary.apply(lambda md: max(md["distance"], key=md["distance"].get))
244 expanded_ct.primary_mode_non_other.replace({"ON_FOOT": "WALKING"}, inplace=True)
245 valid_sensed_modes = ["WALKING", "BICYCLING", "IN_VEHICLE", "AIR_OR_HSR", "UNKNOWN"]
TypeError: 'float' object is not subscriptable
The issue is that one of the rows has nan
as the entry for participant_ct_df.cleaned_section_summary
and nan
is not subscriptable
I wonder if this and https://github.com/e-mission/op-admin-dashboard/issues/120 are related.
Really curious about why we are getting nan
; maybe this is a backwards compat issue that we didn't address?!
It does seem to be present mainly in older deployments, so a missing backwards compat issue would make sense to me
I have been able to recover from the error by defaulting to UNKNOWN
when the summary is nan
expanded_ct["primary_mode_non_other"] = participant_ct_df.cleaned_section_summary.apply(lambda md: max(md["distance"], key=md["distance"].get) if not isinstance(md, float) else "UNKNOWN")
High rate of "UNKNOWN" now
With a bit of checking months and rates of unknown (and NAN counts): | Month | %UNKNOWN sensed | num NAN |
---|---|---|---|
8/2022 | 33% | 250 | |
12/2022 | 49% | 291 | |
2/2023 | 39% | 222 | |
6/2023 | 41% | 497 | |
7/2023 | 19% | 184 | |
8/2023 | 13% | 15 | |
2/2024 | 23% | 0 |
It looks like there might be higher rates for older data?
I think we should go ahead with this fix, but should file a follow up issue to investigate the older data
I tried to check the open-access dashboard recently, and noticed it is broken in a strange way, a few others I've found are broken in the same way:
This is what it looks like:
or
Which is very confusing because the pie charts have been retired for a while, and some have the bar charts
I wonder what could be causing this behavior? Is it because they are all old studies/programs?