Utilities Pipeline - Githubissues

AFg6K7h4fhy2 commented 1 month ago

Do something akin to the following for forecasttools:

https://github.com/CDCgov/pyrenew-hew/issues/32

AFg6K7h4fhy2 commented 1 month ago

NOTE: Clicked edited above to see earlier versions or corrections to the below diagram.

Possible pipeline:

%%{init: {"theme": "neutral", "themeVariables": { "fontFamily": "Iosevka", "fontSize": "25px", "lineColor": "#808b96", "arrowheadColor": "#808b96", "edgeStrokeWidth": "10px", "arrowheadLength": "20px"}}}%%
flowchart TD
    A1[COVID-19 Data _from forecasttools_] --> A4[NumPyro Model]
    A2[Influenza Data _from forecasttools_] --> A4[NumPyro Model]
    A3[External Dataset] --> A4[NumPyro Model]
    A4[NumPyro Model] -->|_arviz.from_numpyro_| A5[Forecast As InferenceData Object wo/ Dates]
    A5[Forecast As InferenceData Object wo/ Dates] -->|_Add Dates To InferenceData_ - done| A6[InferenceData Object w/ Dates]
    A6[InferenceData Object w/ Dates] -->|_Convert To Tidy-Like Dataframe_ - done| A7[Polars Forecast Dataframe w/ Draws]
    A7[Polars Forecast Dataframe w/ Draws] -->|_Convert To Hubverse Formatted Dataframe_ - done| A8[FluSight Submission Dataframe]
    A7[Polars Forecast Dataframe w/ Draws] -->|_Convert To ScoringUtils Formatted Dataframe_ - in progress| A9[ScoringUtils DataFrame]
    A7[Polars Forecast Dataframe w/ Draws] -->|_Save_| A10[Parquet File]
    A8[FluSight Submission Dataframe] -->|_Save_| A11[Parquet File]
    A9[ScoringUtils DataFrame] -->|_Save_| A12[Parquet File]
    A8[FluSight Submission Dataframe] -->|_Convert To ScoringUtils Formatted Dataframe_ - in progress| A9[ScoringUtils DataFrame]
    A12[Parquet File] -->|_Get scores in R_| A13[Forecast Scores]
    A11[Parquet File] -->|_Model Forecast Hypothesis Testing_| A14[Model Comparison Report]

    B1[Pulled Parquet Hubverse Submissions] -->|_Model Forecast Hypothesis Testing_| A14[Model Comparison Report]

    linkStyle default stroke: #808b96
    linkStyle default stroke-width: 2.0px

AFg6K7h4fhy2 commented 1 month ago

@dylanhmorris Would appreciate feedback on this (possibly you including your mental model of the workflow as another diagram). Also, are the arrows visible on your GitHub Appearance? It worked for me on high contrast white background but not on another setting.

AFg6K7h4fhy2 commented 1 month ago

I can see how the Convert To ScoringUtils Ready DataFrame can come from some intermediate step involved in Convert To FluSight Submission.

AFg6K7h4fhy2 commented 1 month ago

@SamuelBrand1 Would appreciate a check in on this as well, Sam.

AFg6K7h4fhy2 commented 2 weeks ago

The author will flesh out this comment more during the Spring [November 11, November 22] and is simply adding what exists here as a placeholder and so as not to lose any writing.

Both comments https://github.com/CDCgov/forecasttools-py/issues/16#issuecomment-2432415729 and https://github.com/CDCgov/forecasttools-py/issues/16#issuecomment-2432550848 still stand unaddressed.

Some thoughts: I believe forecasttools-py can come to facilitate aspects of pre- and post-processing in the Real Time Monitoring (hereafter RTM) branch's pipelines. Presently, the utilities offered by forecasttools-py cover narrow needs of the Short Term Forecasts team's workflows. These workflows include formatting NumPyro forecast model output into Hubverse's submission format. At present, pyrenew-hew has utilities for formatting parts of az.InferenceData as being ready for tidy_draws (and spread_draws) in tidybayes and for making use of R's scoringutils. There are changes that can be made in forecasttools to require of the user writing as little post-processing (forecast scoring) code as possible. #36 and #9 exist in this regard.

CDCgov / forecasttools-py

Utilities Pipeline #16