Implement ways to immediately warn when a forecast object becomes invalid

epiforecasts / scoringutils

Utilities for Scoring and Assessing Predictions

https://epiforecasts.io/scoringutils/

Other

48 stars 20 forks source link

Implement ways to immediately warn when a forecast object becomes invalid #816

Closed nikosbosse closed 3 weeks ago

nikosbosse commented 4 months ago

I see it has been discussed a bit in https://github.com/epiforecasts/scoringutils/issues/507 but I am uncomfortable with the fact we re-validate the class in the print() method. Not just because it’s inefficient and non-standard, but mostly because it may be the symptom of a deeper issue. Once it has been created, we should ensure (to the best of our ability) that it remains valid no matter what. And the moment it stopps being valid, we should throw an error immediately, not wait for the print() method to be called.

This is an inherent weakness of S3 classes, which are loosely defined, but we can add safeguards. This could take the form of a custom [ method which ensures crucial columns are never dropped.

Originally posted by @Bisaloo in https://github.com/epiforecasts/scoringutils/pull/791#pullrequestreview-2003401135

nikosbosse commented 4 months ago

Some ways in which a forecast object can become invalid:

you delete one of the protected columns (model, observed, predicted, quantile_level...)
you delete a column that is part of the forecast unit in a way that now a single forecast is not uniquely identified anymore (meaning that get_duplicate_forecasts() would fail
you add additional columns (e.g. you end up with both a quantile_level and a sample_id column)
you rbind additional rows, leading to the same error
you delete rows such that none remain or that only rows remain that have an NA value somewhere (meaning they get filtered out)

I'm not sure we'd be able to catch all of these immediately. What do you think @Bisaloo @seabbs @sbfnk

Also is this something we should do before the CRAN release, or possibly afterwards?

seabbs commented 4 months ago

I am also not sure how we should do this. Suggestions @Bisaloo?

Bisaloo commented 4 months ago

As far as I can tell, all the changes mentioned here are done via [, or a function that internally uses it (to be confirmed for rbind()).

So with a custom [.forecast() method, you should be able to immediately error when the object becomes invalid:

Run NextMethod() ([.data.frame() in most/all cases)
Validate the object
Error if invalid OR return is valid

Also is this something we should do before the CRAN release, or possibly afterwards?

I'd say it's strictly speaking a breaking change but it doesn't really change anything in the API so it can possibly wait.

nikosbosse commented 4 months ago

@Bisaloo would you possibly be willing to take on this PR? I imagine you'd by far be best placed to tackle it.

Bisaloo commented 4 months ago

I'm fully booked until mid-July but could have a go after. Would this work with your timeline?

nikosbosse commented 3 months ago

Yes, that would be perfect, merci beaucoup!

nikosbosse commented 2 months ago

@Bisaloo what does your current schedule look like, would you have a chance to look at this again? Thank you!

Bisaloo commented 2 months ago

I had a quick look last week but got blocked by an infinite loop of functions calling one another.

I'll try to have another look on Friday if possible :crossed_fingers:

nikosbosse commented 2 months ago

Merci!

nikosbosse commented 2 months ago

@Bisaloo what do you think about the next CRAN release? My current understanding is that this is something we should address before the next release, is that right?

Bisaloo commented 2 months ago

I think it's probably slightly better indeed as it's likely to cause subtle breaking changes, which would fit well under 2.0.0.

Note that submission to CRAN is closed for the next two weeks anyways for the summer holiday & maintenance.

nikosbosse commented 1 month ago

A nice additional benefit of this: we can get rid of the function validate_forecast(). The function is essentially the same as assert_forecast(), but returns the input data instead of an invisible NULL.

validate_forecast <- function(forecast, forecast_type = NULL, verbose = TRUE) {
  assert_forecast(forecast, forecast_type, verbose)
  return(forecast)
}

nikosbosse commented 3 weeks ago

I think this is done now thanks to @Bisaloo! Please reopen if not fully done.