epiforecasts / scoringutils

Utilities for Scoring and Assessing Predictions
https://epiforecasts.io/scoringutils/
Other
48 stars 20 forks source link

Issue #744 - Fix weird .internal.selfref error by making a copy in `check_number_per_forecast()` #745

Closed nikosbosse closed 5 months ago

nikosbosse commented 5 months ago

Description

This PR closes #744.

From #744:

I found a very weird error when analysing some data:

Invalid .internal.selfref detected and fixed by taking a (shallow) copy of the data.table so that := can add this new column by reference. At an earlier point, this data.table has been copied by R (or was created manually using structure() or similar). Avoid names<- and attr<- which in R currently (and oddly) may copy the whole data.table. Use set* syntax instead to avoid copying: ?set, ?setnames and ?setattr. If this message doesn't help, please report your use case to the data.table issue tracker so the root cause can be fixed or this message improved. ” 

This has something to do with a mix of dplyr and data.table and occurred when I called mutate() on a forecast object. I was analysing some private data and therefore don't have a reprex (I failed to reproduce the error given the package example data). But the error seems to be due to check_number_per_forecast() (illustrating again what I hero I am when it comes to naming functions...). The function makes changes to the input without making a copy first.

This PR fixes the error by adding an ensure_data.table() statement to check_number_per_forecast() in which the input gets copied once. That causes some overhead, but it fixes the error 🤷 .

I briefly thought about adding an input check for the forecast_unit argument. But then I remembered that I am a good boi and don't do unrelated changes and reverted it. Maybe worth having a general discussion about whether or not we want input checks for internal functions? On the one hand they don't seem strictly necessary as we can control their use, but then again it doesn't cause much overhead.

Checklist

codecov[bot] commented 5 months ago

Codecov Report

All modified and coverable lines are covered by tests :white_check_mark:

Project coverage is 95.36%. Comparing base (488fa25) to head (f30b961).

Additional details and impacted files ```diff @@ Coverage Diff @@ ## main #745 +/- ## ======================================= Coverage 95.35% 95.36% ======================================= Files 21 21 Lines 1616 1617 +1 ======================================= + Hits 1541 1542 +1 Misses 75 75 ```

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

nikosbosse commented 5 months ago

Probably the overhead is mostly code duplication (i.e. a function with checks calling another function with checks)