BloodAxe / pytorch-toolbelt

PyTorch extensions for fast R&D prototyping and Kaggle farming
MIT License
1.51k stars 119 forks source link

integrate_batch throws error: RuntimeError: The size of tensor a (6) must match the size of tensor b (928) ... #68

Closed jokober closed 2 years ago

jokober commented 2 years ago

Hi, I'm trying to use your tiling tools with my yolov5 model but in the following line I get following error:

https://github.com/BloodAxe/pytorch-toolbelt/blob/cab4fc4e209d9c9e5db18cf1e01bb979c65cf08b/pytorch_toolbelt/inference/tiles.py#L341

RuntimeError: The size of tensor a (6) must match the size of tensor b (928) at non-singleton dimension 2

The debugger shows a tile tensor size of (52983,6) and a weight tensor size of (1, 928,928). What could be the reason for the difference in the tensor size?

Some more infos: model size: 928x928 image size is 3840*2160 I am leading the model using DetectMultiBackend from yolov5

BloodAxe commented 2 years ago

Hi! Can you please attach the code snippet that I can use to reproduce the issue?

Пт, 7 янв. 2022 г. в 18:27, Jokober @.***>:

Hi, I'm trying to use your tiling tools with my yolov5 model but in the following line I get following error:

https://github.com/BloodAxe/pytorch-toolbelt/blob/cab4fc4e209d9c9e5db18cf1e01bb979c65cf08b/pytorch_toolbelt/inference/tiles.py#L341

RuntimeError: The size of tensor a (6) must match the size of tensor b (928) at non-singleton dimension 2

The debugger shows a tile tensor size of (52983,6) and a weight tensor size of (1, 928,928). What could be the reason for the difference in the tensor size?

Some more infos: model size: 928x928 image size is 3840*2160 I am leading the model using DetectMultiBackend from yolov5

— Reply to this email directly, view it on GitHub https://github.com/BloodAxe/pytorch-toolbelt/issues/68, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAEB6YDH3EPBVTZFQIXKCFDUU4H6BANCNFSM5LPER2IA . You are receiving this because you are subscribed to this thread.Message ID: @.***>

jokober commented 2 years ago

Sure, it is pretty much the code from your readme:

weights = "./yolov5/runs/train/exp9/weights/best.pt"
device = 1
img_path = "./yolov5_playground/images/10054_721234.png"

image = cv2.imread(img_path)
device = select_device(device)
model = DetectMultiBackend(weights, device=device, dnn=True)

# Cut large image into overlapping tiles
tiler = ImageSlicer(image.shape, tile_size=(928, 928), tile_step=(626,728))

# HCW -> CHW. Optionally, do normalization here
tiles = [tensor_from_rgb_image(tile) for tile in tiler.split(image)]

# Allocate a CUDA buffer for holding entire mask
merger = CudaTileMerger(tiler.target_shape, 1, tiler.weight)

# Run predictions for tiles and accumulate them
for tiles_batch, coords_batch in DataLoader(list(zip(tiles, tiler.crops)), batch_size=15, pin_memory=True):
    tiles_batch = tiles_batch.float().cuda()
    pred_batch = model(tiles_batch)

    merger.integrate_batch(pred_batch, coords_batch)

# Normalize accumulated mask and convert back to numpy
merged_mask = np.moveaxis(to_numpy(merger.merge()), 0, -1).astype(np.uint8)
merged_mask = tiler.crop_to_orignal_size(merged_mask)
BloodAxe commented 2 years ago

I'm not quite familiar with YOLO architecture, but I feel this would not work out of the box. The example from the README assumes the model returns a tensor of shape [B, Co, H, W] for input tensor of shape [B, Ci, H, W]. In other words, it expects the model returns same-sized segmentation map (for example). In case when returned feature map is smaller than original image you want to instantiate CudaTileMerger with shape scaled down according to the output stride of your model.

So what you can do: 1) Find the place in the model where you still have the tensor [B,C,H',W'] before you reshape it. This output should be accumulated into tile merger. 2) Run all tiles and generated final output features map 3) Feed this feature map to remaining decoder layers / nms / whatever comes next to get predictions from the entire image.

I believe this is the only correct way to run detection on arbitrary large images. One can run detection for each patch independently, but by doing so one has another problem of merging detection around patch edges.

jokober commented 2 years ago

Thanks for your helpful explanation! It totally makes sense. I was looking into the yolov5 implementation to get the feature map but as I already did the tiling and dataset preparation using DarkHelp and DarkMark (which support tiling) I just switched to the yolov4 darknet implementation. However I will very likely use try your suggestions for a segmentation task soon.