apple / turicreate

Turi Create simplifies the development of custom machine learning models.
BSD 3-Clause "New" or "Revised" License
11.19k stars 1.14k forks source link

Is there are any plans to add cross validation module? #117

Open Kagandi opened 6 years ago

Kagandi commented 6 years ago

In the previous version of turicreate (graphlab-create-2.1) were a cross validation module that included cross_validation and KFold. I wasn't able to find them anywhere in the current documentation or code. It would be great to have cross validation and KFold as part of Turi.

srikris commented 6 years ago

We don't have it now. This is a great feature request!

Kagandi commented 6 years ago

For one of my own projects I have implemented a cross-validation and kfold that works with turicreate.

igiloh commented 6 years ago

@Kagandi, Thank you - we will definitely have a look. Feel free to submit this as a Pull Request.

hrit-ikkumar commented 4 years ago

will you add cross_validation.KFold in turicreate or not? @igiloh @Kagandi @znation @srikris @hoytak @afranklin

znation commented 4 years ago

Not sure why this was closed. It's a reasonable feature request. Reopening.

TobyRoseman commented 4 years ago

We still don't have cross validation support. However we did just add a shuffle method for SFrame. That should make it simpler to do cross validation yourself.

To do k-fold cross validation: call shuffle on your SFrame then divide it into k equal segments.

TobyRoseman commented 4 years ago

Here is a function I wrote to do cross validation:

def get_cross_validation_generator(sf, k):
    '''
    Parameters
    ----------
    sf : SFrame
        The SFrame on which to do cross validation

    k : int
        The number of folds

    Returns
    -------
    out : generator
        The generator yields a tuple with two members. The first
        member of the tuple is the train set SFrame. The second member
        is the test set.
    '''
    sf = sf.shuffle()
    fold_size = len(sf) // k

    for i in range(k-1):
        test_set_start = i * fold_size
        test_set_end = (i+1) * fold_size

        cur_test = sf[test_set_start:test_set_end]
        cur_train = sf[:test_set_start] + sf[test_set_end:]

        yield cur_train, cur_test

    # Add any left over portion to the final test set
    final_divide = (k-1) * fold_size
    yield sf[:final_divide], sf[final_divide:]

Here is an example of using it:

# Test get_cross_validation_generator
import turicreate as tc
sf = tc.SFrame({'a': range(11)})
for train, test in get_fold(sf, 5):
    print(train)
    print(test)
    print("\n\n")