cmu-delphi / delphi-epidata

An open API for epidemiological data.
https://cmu-delphi.github.io/delphi-epidata/
MIT License
100 stars 68 forks source link

Fix covid_hosp state_daily #1225

Open krivard opened 1 year ago

krivard commented 1 year ago

covid_hosp state daily has been failing since June 17 with the following error:

Traceback (most recent call last):
  File "/home/automation/.pyenv/versions/3.8.2/lib/python3.8/runpy.py", line 193, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/home/automation/.pyenv/versions/3.8.2/lib/python3.8/runpy.py", line 86, in _run_code
    exec(code, run_globals)
  File "/home/automation/driver/delphi/epidata/acquisition/covid_hosp/state_daily/update.py", line 42, in <module>
    Utils.launch_if_main(Update.run, __name__)
  File "/home/automation/driver/delphi/epidata/acquisition/covid_hosp/common/utils.py", line 38, in launch_if_main
    entrypoint()
  File "/home/automation/driver/delphi/epidata/acquisition/covid_hosp/state_daily/update.py", line 38, in run
    return Utils.update_dataset(Database, network)
  File "/home/automation/driver/delphi/epidata/acquisition/covid_hosp/common/utils.py", line 220, in update_dataset
    dataset = Utils.merge_by_key_cols([network.fetch_dataset(url, logger=logger) for url, _ in revisions],
  File "/home/automation/driver/delphi/epidata/acquisition/covid_hosp/common/utils.py", line 162, in merge_by_key_cols
    dfs = [df.set_index(key_cols) for df in dfs
  File "/home/automation/driver/delphi/epidata/acquisition/covid_hosp/common/utils.py", line 162, in <listcomp>
    dfs = [df.set_index(key_cols) for df in dfs
  File "/home/automation/.pyenv/versions/3.8.2/lib/python3.8/site-packages/pandas/core/frame.py", line 4727, in set_index
    raise KeyError(f"None of {missing} are in the columns")
KeyError: "None of ['reporting_cutoff_start'] are in the columns"

This suggests the file format changed for state daily. Indeed, there's a line on the state-daily healthdata.gov site that says the name of this column is now date:

image

We should:

melange396 commented 1 year ago

the last file to use reporting_cutoff_start is 6xf2-c3ie_2023-06-16T01-05-09.csv the first file to use date is 6xf2-c3ie_2023-06-16T12-07-16.csv

both were published on the same day. in fact, there are 2 files with each version of the column names, with all 4 files date stamped 16 June.

the typo on the healthdata.gov site is that it lists "26" which should be "16"