CDCgov / pyrenew-hew

Models and infrastructure for forecasting COVID-19 and Flu hospitalizations using wastewater data with PyRenew
Apache License 2.0
3 stars 0 forks source link

Set up NWSS ETL #15

Closed kaitejohnson closed 2 weeks ago

kaitejohnson commented 1 month ago

Goal

Last year's production pipeline relied on manual API pulls (every Saturday and Monday) from the DCIPHER platform. It would be ideal if we set up an automated ETL pipeline to save time-stamped vintaged wastewater datasets from flu and COVID to Azure blob storage, which would allow us to perform retrospective evaluation on model performance across historic dates properly. This is particularly important because the data does not contain a field for report date, and does not have a consistent reporting lag across jurisdictions and wastewater treatment plants.

Requirements

@amondal2 @kgostic @dylanhmorris @damonbayer perhaps we could kick off with a 30 minute meeting to plan out how we'd want to divide up tasks.

damonbayer commented 2 weeks ago

Closing as it is out of scope for this repo and is being discussed elsewhere.