alan-turing-institute / eider

eider: an R package for processing health records declaratively
https://alan-turing-institute.github.io/eider/
Other
2 stars 0 forks source link

Warning if features / responses use data after / before (resp.) cutoff date #46

Open yongrenjie opened 8 months ago

yongrenjie commented 8 months ago

It would be nice to pass a cutoff date to transform() and then have the library warn the user if any features (i.e. training X) use data after the cutoff date, or responses (i.e. training Y) use data before the cutoff date, to prevent data leakage.

This is not technically difficult to implement, but is a little bit annoying because we have to know the name of the column where the dates are stored before we can perform any checks. So that column name has to be specified somewhere too, and presumably it may not be the same name for each of the input tables being used.