datamol-io / splito

Machine Learning dataset splitting for life sciences.
https://splito-docs.datamol.io/
Apache License 2.0
23 stars 2 forks source link

Make a general function for all splitting strategies. #8

Closed DomInvivo closed 6 months ago

DomInvivo commented 9 months ago

Similar to the sklearn.train_test_split method, implement another function general_split or train_test_split that does any kind of splitting.

def general_split(
    mols: Union[datamol.Mol, str], 
    test_size: Union[float, int], 
    splitting_method: Literal["random", "scaffold", "kmeans"], 
    random_state: int = 42, 
    n_jobs:int=0, 
    *args, 
    **kwargs)

    print("Do some magic")
    return train_idx, test_idx
cwognum commented 6 months ago

@DomInvivo Sorry for taking a while to address this, but I made a PR that should address this in #12 . Could you let me know what you think of the new API? Does this meet your expectations?