EducationalTestingService / skll

SciKit-Learn Laboratory (SKLL) makes it easy to run machine learning experiments.
http://skll.readthedocs.org
Other
550 stars 69 forks source link

Inconsistent use of ids in FeatureSet.split_by_ids #727

Closed tamarl08 closed 1 year ago

tamarl08 commented 1 year ago

The method split_by_ids receives two sets of ints: ids_for_split1, ids_for_split2, according to the docstring these are the actual ids (assuming the ids are integers which is not necessarily the case).

However the method uses ids_for_split1 as indices of ids in the id list in one case (L519-521), and as the actual ids in the other (L523). ids_for_split2 is used as indices (L527-529).

https://github.com/EducationalTestingService/skll/blob/1e5d4723f1873a2fa50400845b342ad0d9c6d937/skll/data/featureset.py#L479