BloodAxe / pytorch-toolbelt

PyTorch extensions for fast R&D prototyping and Kaggle farming
MIT License
1.52k stars 122 forks source link

Getting out of memory by using inference on huge images #41

Closed Diyago closed 4 years ago

Diyago commented 4 years ago

I have tried pretty small slices but get cuda out of memory on ---> 23 pred_batch = best_model(tiles_batch)[:, 0:1, :,:] As I can see it finally preceded few steps but failed. I have GPU with 8 GB, model it`s unet but wuth heavy encoders. Image shape (6300, 6304, 3)

import numpy as np
import torch
import cv2
from tqdm import tqdm_notebook
from pytorch_toolbelt.inference.tiles import ImageSlicer, CudaTileMerger
from pytorch_toolbelt.utils.torch_utils import tensor_from_rgb_image, to_numpy

image = img_to_predict

# Cut large image into overlapping tiles
tiler = ImageSlicer(image.shape, tile_size=(64, 64), tile_step=(64, 64), weight='pyramid')

# HCW -> CHW. Optionally, do normalization here
tiles = [tensor_from_rgb_image(tile) for tile in tiler.split(image)]

# Allocate a CUDA buffer for holding entire mask
merger = CudaTileMerger(tiler.target_shape, 1, tiler.weight)

# Run predictions for tiles and accumulate them
for tiles_batch, coords_batch in tqdm_notebook(DataLoader(list(zip(tiles, tiler.crops)), batch_size=1, pin_memory=True)):
    tiles_batch = tiles_batch.float().cuda()
    pred_batch = best_model(tiles_batch)[:, 0:1, :,:] # taking only first channel

    merger.integrate_batch(pred_batch, coords_batch)

# Normalize accumulated mask and convert back to numpy
merged_mask = np.moveaxis(to_numpy(merger.merge()), 0, -1).astype(np.uint8)
merged_mask = tiler.crop_to_orignal_size(merged_mask)
BloodAxe commented 4 years ago

The code looks pretty much legit from my point of view. I don’t see any issues with it straight away. But there are a few things you can check: 1) is model in eval mode and code is executing in torch.no_grad() scope? 2) I see you are using it inside notebook. Maybe there is some dangling pointers to cuda tensor a are left somewhere that prevents freeing unused cuda memory? 3) what’s is the actual GPU memory utilization before doing the inference?

чт, 27 февр. 2020 г. в 8:25 AM, Инсаф Ашрапов notifications@github.com:

I have tried pretty small slices but get cuda out of memory on ---> 23 pred_batch = best_model(tiles_batch)[:, 0:1, :,:] As I can see it finally preceded few steps but failed :

import numpy as np import torch import cv2 from tqdm import tqdm_notebook from pytorch_toolbelt.inference.tiles import ImageSlicer, CudaTileMerger from pytorch_toolbelt.utils.torch_utils import tensor_from_rgb_image, to_numpy

image = img_to_predict

Cut large image into overlapping tiles

tiler = ImageSlicer(image.shape, tile_size=(64, 64), tile_step=(64, 64), weight='pyramid')

HCW -> CHW. Optionally, do normalization here

tiles = [tensor_from_rgb_image(tile) for tile in tiler.split(image)]

Allocate a CUDA buffer for holding entire mask

merger = CudaTileMerger(tiler.target_shape, 1, tiler.weight)

Run predictions for tiles and accumulate them

for tiles_batch, coords_batch in tqdm_notebook(DataLoader(list(zip(tiles, tiler.crops)), batch_size=1, pin_memory=True)): tiles_batch = tiles_batch.float().cuda() pred_batch = best_model(tiles_batch)[:, 0:1, :,:] # taking only first channel

merger.integrate_batch(pred_batch, coords_batch)

Normalize accumulated mask and convert back to numpy

merged_mask = np.moveaxis(to_numpy(merger.merge()), 0, -1).astype(np.uint8) merged_mask = tiler.crop_to_orignal_size(merged_mask)

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/BloodAxe/pytorch-toolbelt/issues/41?email_source=notifications&email_token=AAEB6YESFRKCXVN2DCEV2PTRE5MGLA5CNFSM4K4UTQM2YY3PNVWWK3TUL52HS4DFUVEXG43VMWVGG33NNVSW45C7NFSM4IQVYRGA, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAEB6YEFYL3KT4J4OJBDXYTRE5MGLANCNFSM4K4UTQMQ .

Diyago commented 4 years ago

You were right, no_grad did the trick! I had only this:

best_model = torch.load('best_model_{}.pth'.format(ENCODER))
best_model.eval()

Added this - worked like a charm, thank you:

...
with torch.no_grad():
    tiler = ImageSlicer(image.shape, tile_size=(512, 512), tile_step=(256, 256), weight='pyramid')
...
Diyago commented 4 years ago

In addition, in the initial code model output needed to be multiplied by 255, otherwise I got zero mask

for tiles_batch, coords_batch in tqdm_notebook(DataLoader(list(zip(tiles, tiler.crops)), batch_size=1, pin_memory=True)):
        tiles_batch = tiles_batch.float().cuda()
        pred_batch = 255*best_model(tiles_batch)[:, 0:1, :,:]