Closed gengyabc closed 4 years ago
There is no cross validation after that line -- a few lines later this entire data is used for fitting. Test and score widget cannot do anything after that line. Splitting occurrs earlier and preprocessing is applied only to training data: https://github.com/biolab/orange3/blob/master/Orange/evaluation/testing.py#L432.
Of course you have to construct a correct schema:
If you do the following, it's wrong, but the problem is not in the line you mention.
Learns auto impute missing values I think this may cause data leakage from training to testing dataset if we do not split the data before hand but use the cross validation/ random sampling in "test and score" widget.
https://github.com/biolab/orange3/blob/b3c5fdf3615173ac81146ab632f55ee9cc1726a7/Orange/base.py#L113
the missing value is imputed before split train and test, which will be done in "test and score" widget