amaiya / ktrain

ktrain is a Python library that makes deep learning and AI more accessible and easier to apply
Apache License 2.0
1.23k stars 268 forks source link

Memory leak when using predict_filename in lambda function #520

Closed nburner96 closed 8 months ago

nburner96 commented 8 months ago

I have trained a resnet50 model and am attempting to make predictions on images whose file paths are derived from a data frame. After each prediction is made, the predicted value for each image is appended into a new column. I am attempting to do this using the apply function and am running into the issue of large memory leakage during this process. Through various validation set sizes I have found that for every 1,000 images predicted, 1 GB is used. As a result my computer will run out of memory when validating image sets containing ~50,000 files.

Below is the relevant code:

# predict based on ID
def predict_diff(row):
    id = row['id']
    fname = f'{img_dir}/{id}.tif'
    pred = round(predictor.predict_filename(fname, return_proba=False, verbose=0)[0])
    return pred

# Create training and validation image generators
(train_img, val_img, preproc) = images_from_df(train_df=train, image_column='id', label_columns='DIFF',
                                              directory=img_dir, val_directory=img_dir,
                                              suffix='.tif', val_df=val_short, is_regression=True,target_size=dim,
                                              color_mode='rgb')

# Create Model
model = vis.image_regression_model(name=model_name, train_data=train_img, val_data=val_img,
                                  freeze_layers=None, metrics=['mse'])

# Loading weights from checkpoint
model.load_weights(weights_path)

learner = ktrain.get_learner(model=model, train_data=train_img, val_data=val_img,
                             workers=2, use_multiprocessing=True, batch_size=batch_size)

# Create Predictor instance
predictor = ktrain.get_predictor(model, preproc)

# import data frame containing image file names
results = pd.read_csv(out)

# masking certain rows not included in validation set
mask = results.id.isin(val.id)

# Use apply to use lambda function for predict_diff function (defined above) to get prediction for each file name
results.loc[mask, f'Run_{runs}'] = results[mask].apply(lambda row: predict_diff(row), axis=1)

# save csv with results
results.to_csv(out, index=False)

Do have an idea as to what might be causing the data leak?

Thanks

amaiya commented 8 months ago

Are you using Microsoft Windows?

nburner96 commented 8 months ago

No this is on Linux. During the training process the epoch times have been very consistent. Ideally I would like to export the validation results after the final epoch but I have not found a way to do that in the documentation.

amaiya commented 8 months ago

Hello: I'm not observing the memory leak issue that you're seeing. If there's a memory leak, perhaps it is related to a dependency, in which case, you might try upgrading all dependencies.

That said, predict_filename is a convenience method to easily make a prediction on a single file and is not recommended when making predictions on a large number of images. You should really do the predictions in batches using one of the other methods in predictor (e.g., predict_folder or predict_generator, or predict). You can control the batch size with the predictor.batch_size parameter.

However, if all you need is to save the validation scores for your image regression problem, you can always just do this:

learner.validate()
# [('mae', 5.971834)]

I'll close this issue for now, but feel free to respond to this thread if you have further issues or questions.