It would be nice to pass a cutoff date to transform() and then have the library warn the user if any features (i.e. training X) use data after the cutoff date, or responses (i.e. training Y) use data before the cutoff date, to prevent data leakage.
This is not technically difficult to implement, but is a little bit annoying because we have to know the name of the column where the dates are stored before we can perform any checks. So that column name has to be specified somewhere too, and presumably it may not be the same name for each of the input tables being used.
It would be nice to pass a cutoff date to
transform()
and then have the library warn the user if any features (i.e. training X) use data after the cutoff date, or responses (i.e. training Y) use data before the cutoff date, to prevent data leakage.This is not technically difficult to implement, but is a little bit annoying because we have to know the name of the column where the dates are stored before we can perform any checks. So that column name has to be specified somewhere too, and presumably it may not be the same name for each of the input tables being used.