Closed damonbayer closed 2 months ago
Thank you for opening this issue, @damonbayer! This is really helpful.
I see several potential ways of addressing this:
1) converting the data.frame
into a data.table
in summarise_scores()
2) strictly enforcing a scores
object as an input.
Some context:
The scores
object is essentially just a data.table
with an additional metrics
attribute (which is just a vector of column names).
The reason we need this metrics
attribute is that (as opposed to the current CRAN version) users can now freely choose the names given to the metrics. summarise_scores()
therefore can't rely on a set of known metric names to distinguish between a metric and a column that defines the forecast unit. The metrics
attribute solves that in a somewhat clunky way, unfortunately (the only alternative I can imagine is prefixing every column name with something like metric_
).
In #805, I opted for option 1) for user convenience. The updated function
data.frame
data.table
if it isn't also a data.table
metrics
attribute
Implicitly, this ensures that summarise_scores()
is working on the equivalent of a scores
object. The benefit is that something like the following still works (since the metrics
attribute apparently is preserved):
library(scoringutils)
ex_scores <- example_quantile %>% as_forecast() %>% score()
ex_scores_df <- as.data.frame(ex_scores)
summarize_scores(ex_scores_df, by = c("model", "target_type"))
The above would be more annoying if we went with option 2) and strictly enforced a scores
object as input.
I imagine your use cases (writing to a file and summarising afterwards) won't work so easily. Compare, e.g.
ex_scores_df2 <- as.data.frame(as.matrix(ex_scores))
summarize_scores(ex_scores_df2, by = c("model", "target_type"))
which doesn't work because the metrics
attribute is missing.
Currently, the only option unfortunately is to set the attribute manually again via attr(scores, "metrics") <- ...
.
I'm very curious to hear your thoughts on this and whether you think there would be a way to improve user experience.
Thanks for working on this!
I'm not able to check since I'm away from a computer. What happens when the metrics
attribute is missing and you try to use summarize_scores
? It would be helpful if it errored, mentioned that the attribute was missing, and told the user how to add the attribute.
Yes if the object is missing you'll get an error pointing you to get_metrics()
. I updated the docs there to make it clearer what's happening and what the user is expected to do. Let me know in case that doesn't make sense to you.
It is documented that
summarize_scores
expects an object of classscores
. If one instead supplies adata.frame
ortibble
,summarize_scores
attempts to run and produces an unhelpful error. Converting adata.frame
ortibble
into adata.table
does not produce an error, even though we have not supplied ascores
object.This issue came up for me when writing scores to a tabular text file and reading them back in as tibbles to be summarized later. I was using the CRAN version of
scoringutils
, but have written the issue up for the development version.Try with a
data.frame
:Convert the
data.frame
to adata.table
:Try with a
tibble
:Convert the
tibble
to adata.table
:Created on 2024-05-03 with reprex v2.1.0