CDCgov / forecasttools-py

A Python package for common pre- and post-processing operations done by CFA Predict for short term forecasting, nowcasting, and scenario modeling.
Apache License 2.0
4 stars 0 forks source link

Connect `scoringutils` To `forecasttools` #9

Open AFg6K7h4fhy2 opened 1 month ago

AFg6K7h4fhy2 commented 1 month ago

This depends on #30 and #28 .

The scope of this PR includes convert a forecast idata with time representation to a ScoringUtils-indigestible parquet file.

AFg6K7h4fhy2 commented 1 month ago

(Hubverse submission dataframe → ScoringUtils ready dataframe) not deemed a priority?

seabbs commented 1 month ago

Noting you can do this via HubEval if you want

AFg6K7h4fhy2 commented 1 month ago

Noting you can do this via HubEval if you want

Hadn't seen; thank you, Sam.

Had been thinking mostly in terms of something like the following:

data = {
    "location": ["DE", "DE", "AL", "AL"],
    "forecast_date": ["2021-01-01", "2021-01-01", "2021-07-12", "2021-07-12"],
    "target_end_date": ["2021-01-02", "2021-01-02", "2021-07-24", "2021-07-24"],
    "target_type": ["Cases", "Deaths", "Deaths", "Deaths"],
    "model": [None, None, "epiforecasts-EpiNow2", "epiforecasts-EpiNow2"],
    "horizon": [None, None, 2, 2],
    "quantile_level": [None, None, 0.975, 0.990],
    "predicted": [None, None, 611, 719],
    "observed": [127300, 4534, 78, 78]
}

# convert data to pl.DataFrame, then to forecasts_to_score.parquet

Then in R, something akin to

df <- read_parquet("forecasts_to_score.parquet")

forecast_quantile <- df |>
  as_forecast_quantile(
    forecast_unit = c(
      <insert col names>
    )
  )

Would appreciate an examination of this workflow by @SamuelBrand1 @dylanmorris .

AFg6K7h4fhy2 commented 1 week ago

There are still likely considerations for ScoringUtils 2.0 that need to be accounted for in this PR.

AFg6K7h4fhy2 commented 1 week ago

Also, this PR partially depends on the utilities featured in #34 .