add disjoint-set splitting to train_test_validate

PNNL-CompBio / coderdata

Automation scripts and benchmark dataset package for cancer drug prediction deep learning models.

Other

11 stars 3 forks source link

add disjoint-set splitting to train_test_validate #212

Closed ymahlich closed 1 month ago

ymahlich commented 2 months ago

In the current state of train_test_validate disjoint-set splitting is not implemented.

Multi-label splitting is a non-trivial problem. The most straight forward approach to solving this is most likely an iterative approach, potentially something akin to skmultilearn's iterative stratification.

sgosline commented 1 month ago

I think we can agree to skip disjoint set at this point, and put it off to a later version, after someone request it.