Closed Diyago closed 4 years ago
The code looks pretty much legit from my point of view. I don’t see any issues with it straight away. But there are a few things you can check: 1) is model in eval mode and code is executing in torch.no_grad() scope? 2) I see you are using it inside notebook. Maybe there is some dangling pointers to cuda tensor a are left somewhere that prevents freeing unused cuda memory? 3) what’s is the actual GPU memory utilization before doing the inference?
чт, 27 февр. 2020 г. в 8:25 AM, Инсаф Ашрапов notifications@github.com:
I have tried pretty small slices but get cuda out of memory on ---> 23 pred_batch = best_model(tiles_batch)[:, 0:1, :,:] As I can see it finally preceded few steps but failed :
import numpy as np import torch import cv2 from tqdm import tqdm_notebook from pytorch_toolbelt.inference.tiles import ImageSlicer, CudaTileMerger from pytorch_toolbelt.utils.torch_utils import tensor_from_rgb_image, to_numpy
image = img_to_predict
Cut large image into overlapping tiles
tiler = ImageSlicer(image.shape, tile_size=(64, 64), tile_step=(64, 64), weight='pyramid')
HCW -> CHW. Optionally, do normalization here
tiles = [tensor_from_rgb_image(tile) for tile in tiler.split(image)]
Allocate a CUDA buffer for holding entire mask
merger = CudaTileMerger(tiler.target_shape, 1, tiler.weight)
Run predictions for tiles and accumulate them
for tiles_batch, coords_batch in tqdm_notebook(DataLoader(list(zip(tiles, tiler.crops)), batch_size=1, pin_memory=True)): tiles_batch = tiles_batch.float().cuda() pred_batch = best_model(tiles_batch)[:, 0:1, :,:] # taking only first channel
merger.integrate_batch(pred_batch, coords_batch)
Normalize accumulated mask and convert back to numpy
merged_mask = np.moveaxis(to_numpy(merger.merge()), 0, -1).astype(np.uint8) merged_mask = tiler.crop_to_orignal_size(merged_mask)
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/BloodAxe/pytorch-toolbelt/issues/41?email_source=notifications&email_token=AAEB6YESFRKCXVN2DCEV2PTRE5MGLA5CNFSM4K4UTQM2YY3PNVWWK3TUL52HS4DFUVEXG43VMWVGG33NNVSW45C7NFSM4IQVYRGA, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAEB6YEFYL3KT4J4OJBDXYTRE5MGLANCNFSM4K4UTQMQ .
You were right, no_grad did the trick! I had only this:
best_model = torch.load('best_model_{}.pth'.format(ENCODER))
best_model.eval()
Added this - worked like a charm, thank you:
...
with torch.no_grad():
tiler = ImageSlicer(image.shape, tile_size=(512, 512), tile_step=(256, 256), weight='pyramid')
...
In addition, in the initial code model output needed to be multiplied by 255, otherwise I got zero mask
for tiles_batch, coords_batch in tqdm_notebook(DataLoader(list(zip(tiles, tiler.crops)), batch_size=1, pin_memory=True)):
tiles_batch = tiles_batch.float().cuda()
pred_batch = 255*best_model(tiles_batch)[:, 0:1, :,:]
I have tried pretty small slices but get cuda out of memory on
---> 23 pred_batch = best_model(tiles_batch)[:, 0:1, :,:]
As I can see it finally preceded few steps but failed. I have GPU with 8 GB, model it`s unet but wuth heavy encoders. Image shape (6300, 6304, 3)