Issue #744 - Fix weird .internal.selfref error by making a copy in `check_number_per_forecast()`

nikosbosse commented 5 months ago

Description

This PR closes #744.

From #744:

I found a very weird error when analysing some data:

Invalid .internal.selfref detected and fixed by taking a (shallow) copy of the data.table so that := can add this new column by reference. At an earlier point, this data.table has been copied by R (or was created manually using structure() or similar). Avoid names<- and attr<- which in R currently (and oddly) may copy the whole data.table. Use set* syntax instead to avoid copying: ?set, ?setnames and ?setattr. If this message doesn't help, please report your use case to the data.table issue tracker so the root cause can be fixed or this message improved. ”

This has something to do with a mix of dplyr and data.table and occurred when I called mutate() on a forecast object. I was analysing some private data and therefore don't have a reprex (I failed to reproduce the error given the package example data). But the error seems to be due to check_number_per_forecast() (illustrating again what I hero I am when it comes to naming functions...). The function makes changes to the input without making a copy first.

This PR fixes the error by adding an ensure_data.table() statement to check_number_per_forecast() in which the input gets copied once. That causes some overhead, but it fixes the error 🤷 .

I briefly thought about adding an input check for the forecast_unit argument. But then I remembered that I am a good boi and don't do unrelated changes and reverted it. Maybe worth having a general discussion about whether or not we want input checks for internal functions? On the one hand they don't seem strictly necessary as we can control their use, but then again it doesn't cause much overhead.

Checklist

[x] My PR is based on a package issue and I have explicitly linked it.
[x] I have included the target issue or issues in the PR title as follows: issue-number: PR title
[x] I have tested my changes locally.
[ ] I have added or updated unit tests where necessary.
[ ] I have updated the documentation if required.
[x] I have built the package locally and run rebuilt docs using roxygen2.
[x] My code follows the established coding standards and I have run lintr::lint_package() to check for style issues introduced by my changes.
[ ] I have added a news item linked to this PR.
[ ] I have reviewed CI checks for this PR and addressed them as far as I am able.

codecov[bot] commented 5 months ago

Codecov Report

All modified and coverable lines are covered by tests :white_check_mark:

Project coverage is 95.36%. Comparing base (488fa25) to head (f30b961).

Additional details and impacted files

```diff @@ Coverage Diff @@ ## main #745 +/- ## ======================================= Coverage 95.35% 95.36% ======================================= Files 21 21 Lines 1616 1617 +1 ======================================= + Hits 1541 1542 +1 Misses 75 75 ```

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

nikosbosse commented 5 months ago

Probably the overhead is mostly code duplication (i.e. a function with checks calling another function with checks)

epiforecasts / scoringutils