epiforecasts / EpiNow2

Estimate Realtime Case Counts and Time-varying Epidemiological Parameters
https://epiforecasts.io/EpiNow2/dev/
Other
113 stars 32 forks source link

Model validation #368

Open sbfnk opened 1 year ago

sbfnk commented 1 year ago

At the moment in the tests we only validate the model itself in a few specificy ways (e.g. update_infectiousness, generate_infections). There is also the synthetic validation but it requires a manual step of figure checking etc. It might be good to have a test where the exact output of a model run (with a set random seed) is checked for equality with the expectation.

As an example, PR #150 introduced a bug (fixed in a1885c5) that would have had drastic impact on outputs but passed all the tests and showed up somewhat coincidentally in the checks.

seabbs commented 1 year ago

Forecast.vocs and epinowcast both have examples of approaches to doing this that might help when designing an approach here.

Runtime constraints and stochastic variation are both things that need to be considered when testing the complete model.

An option we could use would be to test the CRPS in the synthetic validation and throw warnings if changing based on some benchmark. This would be better than what we currently have but still not ideal.

seabbs commented 1 year ago

The new touchstone setup could also be helpful here (it's primary use case is testing runtimes) but this is not quite working at the moment.

sbfnk commented 1 year ago

Runtime constraints and stochastic variation are both things that need to be considered when testing the complete model.

If setting a seed we shouldn't get stochastic variation, right?

An option we could use would be to test the CRPS in the synthetic validation and throw warnings if changing based on some benchmark. This would be better than what we currently have but still not ideal.

I agree, that is a good idea.

seabbs commented 1 year ago

If setting a seed we shouldn't get stochastic variation, right?

I've struggled in the past to make stan be deterministic but also there is a question of meaningful stochastic variation (i.e when we make algs unstable but on average faster).