Whats the best way to do KFold manually?

adriangb / scikeras

Scikit-Learn API wrapper for Keras.

https://www.adriangb.com/scikeras/

MIT License

239 stars 47 forks source link

Whats the best way to do KFold manually? #283

Closed asafarevich closed 1 year ago

asafarevich commented 2 years ago

I'm trying kfold, but it seems that I get tf out of memory issue with GPU, after running for several kfold iterations... rough code that I'm running. Any ways I can un-initialize a model? (I think del is not clearing it)

    model = KerasClassifier(get_net(len(label_map)), epochs=10,
                            batch_size=256,
                            callbacks=tf.keras.callbacks.EarlyStopping(
                                monitor='loss',
                                verbose=1,
                                mode='auto',
                                restore_best_weights=True,
                            ))
    num_crossval_folds = 3
    kf = KFold(n_splits=num_crossval_folds)
    pred_probs = np.zeros([len(X), len(label_map)])
    for train_index, test_index in kf.split(train_indices):
        kfold_model: Union[KerasClassifier, object] = clone(model)
        print("TRAIN:", train_index, "TEST:", test_index)
        X_train, X_test = X[train_index], X[test_index]
        y_train, y_test = y[train_index], y[test_index]
        kfold_model.fit(X_train, y_train)
        pred_probs[test_index] = kfold_model.predict_proba(X_test)
        pred_probs[test_indices] += kfold_model.predict_proba(X[test_indices])
        del kfold_model

adriangb commented 2 years ago

I'm not sure what you mean by un-initialize the model. Do you mean free the memory it was using? Could you help me by posting a minimal reproducible example, ideally one that crashes on Google Colab?

asafarevich commented 2 years ago

Do you mean free the memory it was using?

yes, gpu memory to be specific.

Could you help me by posting a minimal reproducible example, ideally one that crashes on Google Colab?

Not sure if I'd be able to... its because my machine has 6gb ram gpu. I'm able to load a resnet model with batch size of 256

adriangb commented 2 years ago

We might need to tweak the batch size or dataset size to reproduce. I think this will be necessary because we have different hardware and I don't see any obvious user error that would cause things.

In the meantime, a couple of ideas to try to narrow in the problem:

generate the splits without training the model (or train it on only the first row of each split)
make a trivial model (just an input and output)
Try passing in a KerasClassifier(model=<func>) instead of KerasClassifier(model=<instance>)

asafarevich commented 2 years ago

I've had a couple more, experiments with it. This issue seems to occur in cross_val_predict as well. Specifically, I try a 3 fold, and second fold will go to cpu instead og gpu. (most likely because the first model is not cleaned up)

asafarevich commented 2 years ago

Also I'm already doing the third thing, I'm passing a function. Here is the function in question

def get_net(n_labels):
    resnet_model = ResNet50(include_top=False, pooling='avg', input_shape=[64, 64, 3])
    outputs = tf.keras.layers.Dense(n_labels, 'softmax')(resnet_model.output)
    model = tf.keras.Model(resnet_model.inputs, outputs)
    model.compile(loss=tf.keras.losses.sparse_categorical_crossentropy,
                  optimizer=tf.keras.optimizers.Adam(),
                  metrics=['accuracy'])
    return model

adriangb commented 2 years ago

In your first post you have KerasClassifier(get_net(len(label_map)), ...) so what you are passing in is the value returned from get_net() which is a tensorflow.keras.Model instance, not a function.

asafarevich commented 2 years ago

ah, got it, let me try that. Thank you, I let you know if that works.

asafarevich commented 1 year ago

I tried it. Was still having issues. I eventually just reduced the input size. Its a ductape solution. Unfortunetly don't have to explore further.

The bigger issue, it seems, is that Garbage collection of a model doesn't always release the GPU resources. Cause I'd get the gpu memory shortage error on cross_validation script as well. after the 2nd or 3rd fold... it was inconsistent.