JuliaStats / MLBase.jl

A set of functions to support the development of machine learning algorithms
MIT License
185 stars 63 forks source link

Fix Kfold iterator #5

Closed lendle closed 10 years ago

lendle commented 10 years ago

The Kfold iterator was returning the validation set indexes instead of the training set indexes, which was inconsistent with the other CrossValGenerator.

Commit 60de1f1 fixed the cross_validate function for the broken Kfold iterator but broke it for the other CrossValGenerators.

This pull request fixes the Kfold iterator and reverts 60de1f1.

BigCrunsh commented 10 years ago

Isn't that fixed by https://github.com/JuliaStats/MLBase.jl/pull/4?

lendle commented 10 years ago

According to the readme, the CrossValGenerators should return "indices of samples selected for training", which is not what Kfold was doing. (Unless you wanted to train on 1/k of the observations and test on 1-1/k of the observations, but that's not what people usually mean by K-fold CV in my experience.)

4 made the cross_validation function use the test on the indexes returned by gen and train the others, but that would not be correct for the other CrossValGenerators. For example for LOOCV, that would train on one obs at a time and test on the remaining n-1.

Disclaimer: I've stared at this enough that there's a good chance I've confused myself and this was correct to begin with. If so, sorry!

BigCrunsh commented 10 years ago

Right. It works both -- I don't care. However, it should be consistent and thus the README has to be corrected anyways, because the example below that description suggests the opposite :smile:

lendle commented 10 years ago

Oops forgot about the README. Is it more clear now?

lindahua commented 10 years ago

Thanks!