e-mission / em-public-dashboard

A simple and stupid public dashboard prototype.
BSD 3-Clause "New" or "Revised" License
1 stars 23 forks source link

Some deployments are broken? #142

Open Abby-Wheelis opened 1 month ago

Abby-Wheelis commented 1 month ago

I tried to check the open-access dashboard recently, and noticed it is broken in a strange way, a few others I've found are broken in the same way:

This is what it looks like: Screenshot 2024-07-31 at 2 57 38 PM

or

Screenshot 2024-07-31 at 2 58 46 PM

Which is very confusing because the pie charts have been retired for a while, and some have the bar charts

I wonder what could be causing this behavior? Is it because they are all old studies/programs?

Abby-Wheelis commented 1 week ago

For one of the problematic deployments, I found an error in the logs when trying to run generic-metrics :

expanded_ct_sensed, file_suffix_sensed, quality_text_sensed, debug_df_sensed = scaffolding.load_viz_notebook_sensor_inference_data(...

The same call in generic_metrics_sensed also throws an error, and in generic_timeseries

Clearly there is a but with the load_viz_notebook_sensor_inference_data call, I think my next step is to try and reproduce locally.

Abby-Wheelis commented 1 week ago

Looking at the details a bit more while I wait for the data to load, it looks like the pie chart was generated on 5/10, so is likely from just before the changes were merged. I'm guessing the two charts have the same name. Screenshot 2024-09-05 at 2 17 48 PM

Abby-Wheelis commented 1 week ago

Reproduced and have the full error now! Ran out of time today but will resume tomorrow

Loaded expanded_ct with length 41449 for None
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
Cell In[6], line 1
----> 1 expanded_ct_sensed, file_suffix_sensed, quality_text_sensed, debug_df_sensed = scaffolding.load_viz_notebook_sensor_inference_data(year,
      2                                                                             month,
      3                                                                             program,
      4                                                                             include_test_users,
      5                                                                             sensed_algo_prefix)

File /usr/src/app/saved-notebooks/scaffolding.py:243, in load_viz_notebook_sensor_inference_data(year, month, program, include_test_users, sensed_algo_prefix)
    241 print(f"Loaded expanded_ct with length {len(expanded_ct)} for {tq}")
    242 if len(expanded_ct) > 0:
--> 243     expanded_ct["primary_mode_non_other"] = participant_ct_df.cleaned_section_summary.apply(lambda md: max(md["distance"], key=md["distance"].get))
    244     expanded_ct.primary_mode_non_other.replace({"ON_FOOT": "WALKING"}, inplace=True)
    245     valid_sensed_modes = ["WALKING", "BICYCLING", "IN_VEHICLE", "AIR_OR_HSR", "UNKNOWN"]

File ~/miniconda-23.5.2/envs/emission/lib/python3.9/site-packages/pandas/core/series.py:4771, in Series.apply(self, func, convert_dtype, args, **kwargs)
   4661 def apply(
   4662     self,
   4663     func: AggFuncType,
   (...)
   4666     **kwargs,
   4667 ) -> DataFrame | Series:
   4668     """
   4669     Invoke function on values of Series.
   4670 
   (...)
   4769     dtype: float64
   4770     """
-> 4771     return SeriesApply(self, func, convert_dtype, args, kwargs).apply()

File ~/miniconda-23.5.2/envs/emission/lib/python3.9/site-packages/pandas/core/apply.py:1123, in SeriesApply.apply(self)
   1120     return self.apply_str()
   1122 # self.f is Callable
-> 1123 return self.apply_standard()

File ~/miniconda-23.5.2/envs/emission/lib/python3.9/site-packages/pandas/core/apply.py:1174, in SeriesApply.apply_standard(self)
   1172     else:
   1173         values = obj.astype(object)._values
-> 1174         mapped = lib.map_infer(
   1175             values,
   1176             f,
   1177             convert=self.convert_dtype,
   1178         )
   1180 if len(mapped) and isinstance(mapped[0], ABCSeries):
   1181     # GH#43986 Need to do list(mapped) in order to get treated as nested
   1182     #  See also GH#25959 regarding EA support
   1183     return obj._constructor_expanddim(list(mapped), index=obj.index)

File ~/miniconda-23.5.2/envs/emission/lib/python3.9/site-packages/pandas/_libs/lib.pyx:2924, in pandas._libs.lib.map_infer()

File /usr/src/app/saved-notebooks/scaffolding.py:243, in load_viz_notebook_sensor_inference_data.<locals>.<lambda>(md)
    241 print(f"Loaded expanded_ct with length {len(expanded_ct)} for {tq}")
    242 if len(expanded_ct) > 0:
--> 243     expanded_ct["primary_mode_non_other"] = participant_ct_df.cleaned_section_summary.apply(lambda md: max(md["distance"], key=md["distance"].get))
    244     expanded_ct.primary_mode_non_other.replace({"ON_FOOT": "WALKING"}, inplace=True)
    245     valid_sensed_modes = ["WALKING", "BICYCLING", "IN_VEHICLE", "AIR_OR_HSR", "UNKNOWN"]

TypeError: 'float' object is not subscriptable
Abby-Wheelis commented 1 week ago

The issue is that one of the rows has nan as the entry for participant_ct_df.cleaned_section_summary and nan is not subscriptable

shankari commented 1 week ago

I wonder if this and https://github.com/e-mission/op-admin-dashboard/issues/120 are related. Really curious about why we are getting nan; maybe this is a backwards compat issue that we didn't address?!

Abby-Wheelis commented 1 week ago

It does seem to be present mainly in older deployments, so a missing backwards compat issue would make sense to me

I have been able to recover from the error by defaulting to UNKNOWN when the summary is nan

 expanded_ct["primary_mode_non_other"] = participant_ct_df.cleaned_section_summary.apply(lambda md: max(md["distance"], key=md["distance"].get) if not isinstance(md, float) else "UNKNOWN")
Abby-Wheelis commented 1 week ago

High rate of "UNKNOWN" now

image

With a bit of checking months and rates of unknown (and NAN counts): Month %UNKNOWN sensed num NAN
8/2022 33% 250
12/2022 49% 291
2/2023 39% 222
6/2023 41% 497
7/2023 19% 184
8/2023 13% 15
2/2024 23% 0

It looks like there might be higher rates for older data?

shankari commented 1 week ago

I think we should go ahead with this fix, but should file a follow up issue to investigate the older data