datamol-io / splito

Machine Learning dataset splitting for life sciences.
https://splito-docs.datamol.io/
Apache License 2.0
23 stars 2 forks source link

SIMPD new implementation #2

Open hadim opened 1 year ago

hadim commented 1 year ago

See https://github.com/valence-labs/polaris/pull/20#issuecomment-1646952840 for context


In short the idea is to develop a new approach allowing to optimize a dataset split according to multiple objectives and constraints. Such example of GA approach has been proposed at https://chemrxiv.org/engage/chemrxiv/article-details/6406049e6642bf8c8f10e189 and is already implemented at partitio.simpd.