Open seabbs opened 1 year ago
@seabbs could use a second set of eyes on https://github.com/epinowcast/coerceDT/blob/main/R/coerceDT.R - thoughts on the interface offered?
Not quite done w/ implement + test for it, but am thinking:
@seabbs I think coerceDT + tests is basically done, moving on to checkDT.
@seabbs this is now approaching 0.1.0 release readiness.
How do we want to check that it works how we want it to work? Propose an epinowcast & scoringutils branch => introduce this as a dependency => modify those library codebases (likely also this library code to do more of what we want for them) => iterate until we're happy with what this does for those libraries.
Related: there's definitely some documentation issues to deal with - I think that's best dealt with by another set of eyes, at least to identify the WTF bits. Useful candidates for that?
Tagging @nikosbosse - I understand you've got an internal helper function like this in scoringutils. If it potentially makes sense to substitute this in, please advise (with the helper function name(s)) and I'll have a look.
The version in scoringutils
is merely this:
scoringutils:::ensure_data.table
function(data) {
if (is.data.table(data)) {
data <- copy(data)
} else {
data <- as.data.table(data)
}
return(data)
}
Maybe there is room for more sophistication :)
Regarding the version you linked to: It seems quite complicated to me and the function does a lot of different things (selecting / dropping columns, reading in data, conversion to data.table, selecting whether or not that happens by reference...). For me as a newbie user I think it would be helpful to separate these things a bit more.
Also since we're currently reworking scoringutils
in terms of S3 methods I now see them everywhere... But having coerceDT be a generic seems like it could be a good idea.
The version in
scoringutils
is merely this:scoringutils:::ensure_data.table function(data) { if (is.data.table(data)) { data <- copy(data) } else { data <- as.data.table(data) } return(data) }
Maybe there is room for more sophistication :)
Indeed! In terms of just replacing that function, that would literally just be: ensure_data.table(data)
=> coerceDT(data)
(though now data could be a file path or any other string fread
just handles).
If you have places internally in scoringutils where you don't need to copy, those would become coerceDT(data, copy = FALSE)
.
I did fiddle a bit with S3 methods - found it easier to just handle dispatch based on type in the internal function logic.
@nikosbosse had a quick peek at scoringutils - for example check_forecasts
might become something like
check_forecasts <- function(data) {
# create lists to store results ----------------------------------------------
out <- list()
warnings <- list()
errors <- list()
messages <- list()
data <- makeDT(data, require = c("true_value", "prediction"), forbid = available_metrics())
# ... other more complex checks, starting from the bit looking at models
The helper function, ensure_data.table
linked above is currently on the develop
branch - the current version does the checks for required columns independently of that, but in principle it could also be a single function (there is currently an internal function check_columns_present()
)
@seabbs per discussion re nomenclature - how about asDT
?
We have had a fair few months with no progress on this (for understandable reasons). Shall we setup a meeting with those interested in contributing and hash out some issues that we can PR against to at least get an MVP version that can replace the functionality in
epinowcast
.