Closed sbfnk closed 3 months ago
if we call it overprediction_sample here then maybe it should also be called overprediction_quantile for the WIS?
Wouldn't the better solution within the existing package framework be to define them as S3 methods for the different forecast types?
Wouldn't the better solution within the existing package framework be to define them as S3 methods for the different forecast types?
the problem is that this function is called on a vector/matrix so that's not directly amenable to S3.
Are the CRPS overprediction/underprediction values equivalent to those of the WIS? I.e. if I took a sample-based forecast and converted it into a sample-based forecast, should I expect the overprediction/underprediction/dispersion values to roughly match? (I was wondering whether it would make sense to call the output column something like underprediction_wis/underprediction_crps (or maybe wis_underprediction...)).
My suggest would be:
underprediction_quantile
etc.As for your question, yes, they should be the same subject to the information reduction from converting to quantiles, and this seems largely the case:
library("scoringutils")
#> Note: scoringutils is currently undergoing major development changes (with an update planned for the first quarter of 2024). We would very much appreciate your opinions and feedback on what should be included in this major update: https://github.com/epiforecasts/scoringutils/discussions/333
library("tidyr")
library("dplyr")
#>
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#>
#> filter, lag
#> The following objects are masked from 'package:base':
#>
#> intersect, setdiff, setequal, union
library("ggplot2")
nreplicates <- 30
nsamples <- 1000
observed <- rnorm(nreplicates, mean = seq_len(nreplicates))
predicted <- replicate(
nsamples, rnorm(n = nreplicates, mean = seq_len(nreplicates))
)
crps <- crps_sample(
observed = observed,
predicted = predicted
)
dcrps <- dispersion_sample(observed, predicted)
ocrps <- overprediction_sample(observed, predicted)
ucrps <- underprediction_sample(observed, predicted)
levels <- seq(0.01, 0.99, by = 0.01)
quantiles <- t(apply(predicted, 1, quantile, probs = levels))
dwis <- dispersion(observed, quantiles, levels)
owis <- overprediction(observed, quantiles, levels)
uwis <- underprediction(observed, quantiles, levels)
df <- bind_rows(
data.frame(
metric = "CRPS",
dispersion = dcrps,
overprediction = ocrps,
underprediction = ucrps
),
data.frame(
metric = "WIS",
dispersion = dwis,
overprediction = owis,
underprediction = uwis
)
) |>
group_by(metric) |>
mutate(replicate = seq_len(n())) |>
ungroup() |>
pivot_longer(c(-metric, -replicate)) |>
pivot_wider(names_from = metric)
ggplot(df, aes(x = CRPS, y = WIS, colour = name)) +
geom_point() +
scale_colour_brewer(palette = "Set1") +
theme_minimal() +
geom_abline(slope = 1, intercept = 0, linetype = "dashed")
Created on 2024-07-16 with reprex v2.1.0
My suggest would be:
- rename functions to underprediction_quantile etc.
- keep columns as they are
Sounds good to me! Also I think it would be nice to mention in the docs that the components correspond to the WIS components.
Let me know if you're fine with making these changes - alternatively I'm also happy to take over and do the leg work.
Over to you!
the problem is that this function is called on a vector/matrix so that's not directly amenable to S3.
I don't understand this point? the input arg of an s3 method (i.e x etc) can change type to be whatever you want. If it couldn't how would summary
etc work
I'd much prefer this to be done via s3 vs adding custom functionality like this but as there are other lots of specific sample methods I guess that can be a battle for another day.
I think we would ideally modularise the current implementation as as it stands its quite compute wasteful and duplicative
https://github.com/epiforecasts/scoringutils/pull/854#issuecomment-2230209168
Seems like a very good unit/integration test
the problem is that this function is called on a vector/matrix so that's not directly amenable to S3.
I don't understand this point? the input arg of an s3 method (i.e x etc) can change type to be whatever you want. If it couldn't how would
summary
etc work
But this is a standalone scoring function that can in principle be called on a vector/matrix (and within score()
, we do just pass a vector/matrix to it). That vector/matrix doesn't know anything about the class system we implemented for data.frames.
We could think about changing that, but this would be quite a big design change for the entire package.
We could think about changing that, but this would be quite a big design change for the entire package.
Got you. Is there any issue to address? I think rather than a design decision I would call it technical debt!
Got you. Is there any issue to address? I think rather than a design decision I would call it technical debt!
Well I think to a large extent the fact that you can call the scoringutils functions in other packages and that you can use functions from other packages is a feature, not a bug!
Well I think to a large extent the fact that you can call the scoringutils functions in other packages and that you can use functions from other packages is a feature, not a bug!
technical debt isn't a bug. My point is more that I don't think a active decision as to the right approach has been taken here it is more that this is just where we are right now.
@sbfnk I made the suggested changes in #869.
Agree with Seb and would suggest addressing the code duplication in a separate issue (https://github.com/epiforecasts/scoringutils/issues/870). Other than that, any objections to this getting merged?
wohoo! Thanks a lot @sbfnk, and Johannes! This is a really cool PR.
I appreciate outvoted here and its only minor but I really don't think we should be merging duplicated code etc with features at this stage of scoringutils lifecycle. In many ways, some of these practices are what led to us needing such a large scale refactor (though that has ended up having lots of benefits).
Appreciate the point. My thinking at the moment is that it would be good to finally get this out on CRAN to avoid the co-existence of an outdated version and an obscure dev version, so I'd tend to prioritise public-facing code changes over internal refactoring for now and then circle back to another refactor/clean-up.
But then again you're right and according to this logic we maybe should not have introduced this feature at all.
Description
This PR closes #853.
Created on 2024-07-11 with reprex v2.1.0
Checklist
lintr::lint_package()
to check for style issues introduced by my changes.