Closed asafarevich closed 1 year ago
I'm not sure what you mean by un-initialize the model. Do you mean free the memory it was using? Could you help me by posting a minimal reproducible example, ideally one that crashes on Google Colab?
Do you mean free the memory it was using?
yes, gpu memory to be specific.
Could you help me by posting a minimal reproducible example, ideally one that crashes on Google Colab?
Not sure if I'd be able to... its because my machine has 6gb ram gpu. I'm able to load a resnet model with batch size of 256
We might need to tweak the batch size or dataset size to reproduce. I think this will be necessary because we have different hardware and I don't see any obvious user error that would cause things.
In the meantime, a couple of ideas to try to narrow in the problem:
KerasClassifier(model=<func>)
instead of KerasClassifier(model=<instance>)
I've had a couple more, experiments with it. This issue seems to occur in cross_val_predict as well. Specifically, I try a 3 fold, and second fold will go to cpu instead og gpu. (most likely because the first model is not cleaned up)
Also I'm already doing the third thing, I'm passing a function. Here is the function in question
def get_net(n_labels):
resnet_model = ResNet50(include_top=False, pooling='avg', input_shape=[64, 64, 3])
outputs = tf.keras.layers.Dense(n_labels, 'softmax')(resnet_model.output)
model = tf.keras.Model(resnet_model.inputs, outputs)
model.compile(loss=tf.keras.losses.sparse_categorical_crossentropy,
optimizer=tf.keras.optimizers.Adam(),
metrics=['accuracy'])
return model
In your first post you have KerasClassifier(get_net(len(label_map)), ...)
so what you are passing in is the value returned from get_net()
which is a tensorflow.keras.Model
instance, not a function.
ah, got it, let me try that. Thank you, I let you know if that works.
I tried it. Was still having issues. I eventually just reduced the input size. Its a ductape solution. Unfortunetly don't have to explore further.
The bigger issue, it seems, is that Garbage collection of a model doesn't always release the GPU resources. Cause I'd get the gpu memory shortage error on cross_validation script as well. after the 2nd or 3rd fold... it was inconsistent.
I'm trying kfold, but it seems that I get tf out of memory issue with GPU, after running for several kfold iterations... rough code that I'm running. Any ways I can un-initialize a model? (I think del is not clearing it)