CDCgov / wastewater-informed-covid-forecasting

Wastewater-informed COVID-19 forecasting models submitted to the COVID-19 Forecast Hub
https://cdcgov.github.io/wastewater-informed-covid-forecasting/
Apache License 2.0
44 stars 8 forks source link

Add real-time exclusions to eval pipeline exclusions #179

Closed kaitejohnson closed 1 month ago

kaitejohnson commented 1 month ago

We already had the infrastructure in place to manually exclude certain forecast date locations from the evaluation pipeline due to poor/ unrealistic wastewater data. This now adds functionality to exclude the forecast date: locations that we chose to exclude in real-time Hub submission.

This is because we produeced a metadata.yaml within the forecasts folder for each forecast date where we listed the locations we chose not to submit the wastewater model for.

Can use this to create the following table in our config:

> table
$location
 [1] "MN" "OH" "IL" "OH" "CO" "TX" "OH" "PA" "FL" "NY" "FL" "OH" "PA" "VA" "IN" "IL" "ID" "OH" "UT" "VA"
[21] "KS" "NY" "UT" "OH" "MN" "MN" "MN"

$forecast_date
 [1] "2024-02-05" "2024-02-05" "2024-02-05" "2024-02-12" "2024-02-12" "2024-02-12" "2024-02-19" "2024-02-19"
 [9] "2024-02-19" "2024-02-19" "2024-02-26" "2024-02-26" "2024-02-26" "2024-02-26" "2024-02-26" "2024-02-26"
[17] "2024-03-04" "2024-03-04" "2024-03-04" "2024-03-04" "2024-03-11" "2024-03-11" "2024-03-11" "2024-03-11"
[25] "2024-01-15" "2024-01-22" "2024-01-29"

Note, I know this is somewhat hardcoded to current implementation. I think thats okay for now, can DRYify later if needed to make more generalizable but I see this as mostly a one off use case for only the purpose of gathering the info in these yaml files in this format

kaitejohnson commented 1 month ago

@dylanhmorris I merged this into fix-targets-pipeline bc this is the branch with the wwinference and other latest changes and didn't want to not have this incorporated