byu-dml / metalearn

BYU's python library of useable tools for metalearning
MIT License
22 stars 6 forks source link

Landmarkers, cross validation, and seeds #124

Closed bjschoenfeld closed 6 years ago

bjschoenfeld commented 6 years ago

All the landmarking metafeatures use cross validation which requires a seed to generate random folds. Each landmarker uses its seed for the cross validation. This is a problem for two reasons.

  1. The seed for cross validation should be different that the seed for the landmarker classifier. We do not know what kinds of biases might be created by sharing a seed.
  2. All of the landmarkers should share the same cross validation splits, i.e. use the same seed for cross validation. This would make the landmarkers more comparable within a single dataset.

A solution would be to create a special seed just for the cross validation. Then, any metafeature (e.g. landmarkers) which use cross validation will get the same cross validation seed and thus the same folds. Futhermore, this would require that any metafeatures which require a seed and cross validation would now get two seeds: the seed for the metafeature and the seed for cross validation.

bjschoenfeld commented 6 years ago

handled by a08c6b85fd1c64f076a3526ae1b900f23fcfc907