nikosbosse commented 11 months ago

This issue collects all questions surrounding the workflow to create and validate forecast objects.

Overall questions

Which functions should exist and what should they be called?
What should these functions do exactly?
What should the overall workflow look like?

Old behaviour (before the update)

Previously, we had a function check_forecasts() that

ran input checks
created an object of class scoringutils_check, which was a list with different components
- unique_values (a data.frame with the number of unique values per column (i.e, data[, lapply(.SD, FUN = function(x) length(unique(x))), by = "model"]
- forecast_unit
- target_type (we don't really have that anymore)
- prediction_type (same)
- messages (any messages that come up during checking)
- warnings
- errors (check_forecasts caught errors)
had a print method, print.scoringutils_check() that made the output look nice

There were several potential worklflows:

data |> check_forecasts() --> get info
data |> check_forecasts() |> score() --> just run in pipeline
dats |> score() --> call to check_forecasts() would happen internally within score()

New behaviour

we now have classes for the different forecast types (see https://github.com/epiforecasts/scoringutils/issues/473).
we now therefore need functions to construct and validate those classes

What functions should exist?

Advanced R suggests that every class should have

A low-level constructor, new_myclass(), that efficiently creates new objects with the correct structure.

A validator, validate_myclass(), that performs more computationally expensive checks to ensure that the object has correct values.

A user-friendly helper, myclass(), that provides a convenient way for others to create objects of your class.

Usually,

the validator returns the input, or throws an error
the constructor is mostly used internally and can be very simple
the helper function has the same name as the class, calls the constructor and the validator and gives verbose messages.

Other things that I've seen:

is.myclass() and is_myclass() as a test function that returns TRUE or FALSE
as.myclass() and as_myclass() which is I guess the helper function?

Proposal:

(this now assumes that the classes are going to be called forecast_something, see #473

Have a constructor, new_forecast(forecast_type) that constructs an object of class forecast_[target_type]. it
- assures that the data has a model column
- assigns a class
- assigns attributes
  - forecast_type
  - forecast_unit
Have a (validator? check function?) generic called is_forecast()
- is_forecast().default() just returns FALSE
- is_forecast.forecast_binary etc. runs input checks to validate the input. If all is well it returns TRUE. Otherwise it errors.
Have a helper function as_forecast(). It
- checks the forecast type
- creates the class using new_forecast()
- validates the class using is_forecast()
[x] Q1: Given that Hadley et al. suggest to have the validator be called validate_myclass, should we call it validate_forecast() (returning the validated object) instead of is_forecast() (returning TRUE/FALSE)? Should we have both? is_forecast() could wrap validate_forecast()
[x] Q2: which of these functions should be exported?
[x] Q3: Should the validator function add messages / warnings as an attribute similar to what check_forecasts() did? In addition to throwing them of course.

Output diagnostics

The proposal above currently does not capture everything the old check_forecasts() did. In particular, it doesn't provide output diagnostics.

We could likely get much of the desired behaviour by creating a print method for forecast_binary etc. This print method could then use the stored attributes to return something like

dataframe

This is a forecast of type `binary`. 

The unit of a single forecast is defined by 
[1] "location" "target_end_date" "target_type" 
[4] "location_name" "forecast_date" "model" 
[7] "horizon"

And then when the forecast is summarised or if other attributes are added, these could be added to the output as well

The output has been summarised by 
[1] "model"

This doesn't give us everything the old check_forecasts() had. E.g. it doesn't give us a list (which is probably easier to access than the attributes). It also doesn't give us the unique_values component that the old check_forecasts() had. I did like that, but it can also just be an extra function.

[x] Q4: Can I get a confirmation that we think the print method is the way to go? This was the one-Sam-one-vote consensus from #342. The alternative would be an extra function, e.g. diagnose or something like that.

nikosbosse commented 11 months ago

@seabbs @sbfnk @Bisaloo

nikosbosse commented 11 months ago

Decisions taken:

we have

a class constructor new_forecast(data, forecast_type)
a generic validator function validate_forecast() with methods corresponding to the different classes
a helper function as_forecast() that
- determines the forecast type
- calls the class constructor
- calls the validator

Illustration:

as_forecast isn't fully generic, i.e. hard for other users to work with who would want to create new classes. But I think that's fine for now / don't have a good idea for how to make it better in a non-crazy way.

Every class gets a print method that provides useful information on the output.

Please reopen if you disagree with anything.

epiforecasts / scoringutils

Discussion: workflow for creating and validating forecast objects #472

Old behaviour (before the update)

New behaviour

What functions should exist?

Proposal:

Output diagnostics