BGU-CS-VIL / WTConv

Wavelet Convolutions for Large Receptive Fields. ECCV 2024.
MIT License
323 stars 12 forks source link

RuntimeERORR: size of tensor (21)must match the size of tensor b (20) at non-singleton dimension 3 #32

Open Anthony0106 opened 1 week ago

Anthony0106 commented 1 week ago

Dear professor, I'm sorry to bother you but I have a problem when using your TWConv2D to replace some basic Conv structure in Yolov7. The problem is: when I replace the particular layers by WTCon2D, I can train COCO dataset normally, but when I want to use test.py to see the training results, the program comes out an error saying size of tensor (21)must match the size of tensor b (20) at non-singleton dimension 3. Looking forward to your reply. the structure of my network and the error information are as below: 098e1c98fff650b92a4a262276fc669 3389e458a420397bf0f448dc4c9b93c

Anthony0106 commented 1 week ago

I find the point is that the test imgs must have the same size with the trained size like 640*640. I think it is because of the tensor used in forward function as a python bool so when using trace model it cannot be changed from the trained params, right? And here's another question, can WTConv adapt to different size of input? And how can I acheive that? Looking for your reply!!

shahaffind commented 1 week ago

Hi @Anthony0106, Using different image sizes is possible. I have done so before and had no issues.

The only way I can think of image sizes being an issue is if you use too many levels compared to the input resolution. For example, let's say I have an input tensor of size [1,32,8,8] the spatial resolution is 8,8. If I use 2 levels of wavelet, the highest level convolution will operate on a 2x2 spatial resolution. Having 3 levels means it will operate on a 1x1 area (which is meaningless for spatial convolution), and problems can occur when I set 4 levels.

Other than that, I need more information to help solve the issue.

Anthony0106 commented 1 week ago

`# parameters nc: 8 # number of classes depth_multiple: 1.0 # model depth multiple width_multiple: 1.0 # layer channel multiple

anchors

anchors:

yolov7-tiny backbone

backbone:

[from, number, module, args] c2, k=1, s=1, p=None, g=1, act=True

[[-1, 1, Conv, [32, 3, 2, None, 1, nn.LeakyReLU(0.1)]], # 0-P1/2

[-1, 1, Conv, [64, 3, 2, None, 1, nn.LeakyReLU(0.1)]], # 1-P2/4

[-1, 1, Conv, [32, 1, 1, None, 1, nn.LeakyReLU(0.1)]], [-2, 1, Conv, [32, 1, 1, None, 1, nn.LeakyReLU(0.1)]], [-1, 1, WTConv2d, [32, 3, 1]], [-1, 1, WTConv2d, [32, 3, 1]], [[-1, -2, -3, -4], 1, Concat, [1]], [-1, 1, Conv, [64, 1, 1, None, 1, nn.LeakyReLU(0.1)]], # 7

[-1, 1, MP, []], # 8-P3/8 8080 [-1, 1, Conv, [64, 1, 1, None, 1, nn.LeakyReLU(0.1)]], [-2, 1, Conv, [64, 1, 1, None, 1, nn.LeakyReLU(0.1)]], [-1, 1, WTConv2d, [64, 3, 1]], [-1, 1, WTConv2d, [64, 3, 1]], [[-1, -2, -3, -4], 1, Concat, [1]], [-1, 1, Conv, [128, 1, 1, None, 1, nn.LeakyReLU(0.1)]], # 14 8080*128

[-1, 1, MP, []], # 15-P4/16 40*40 [-1, 1, Conv, [128, 1, 1, None, 1, nn.LeakyReLU(0.1)]], [-2, 1, Conv, [128, 1, 1, None, 1, nn.LeakyReLU(0.1)]], [-1, 1, WTConv2d, [128, 3, 1]], [-1, 1, WTConv2d, [128, 3, 1]], [[-1, -2, -3, -4], 1, Concat, [1]], [-1, 1, Conv, [256, 1, 1, None, 1, nn.LeakyReLU(0.1)]], # 21

[-1, 1, MP, []], # 22-P5/32 20*20 [-1, 1, Conv, [256, 1, 1, None, 1, nn.LeakyReLU(0.1)]], [-2, 1, Conv, [256, 1, 1, None, 1, nn.LeakyReLU(0.1)]], [-1, 1, WTConv2d, [256, 3, 1]], [-1, 1, WTConv2d, [256, 3, 1]], [[-1, -2, -3, -4], 1, Concat, [1]], [-1, 1, Conv, [512, 1, 1, None, 1, nn.LeakyReLU(0.1)]], # 28 ]

yolov7-tiny head

head: [[-1, 1, Conv, [256, 1, 1, None, 1, nn.LeakyReLU(0.1)]], # 2020256 [-2, 1, Conv, [256, 1, 1, None, 1, nn.LeakyReLU(0.1)]], [-1, 1, SP, [5]], [-2, 1, SP, [9]], [-3, 1, SP, [13]], [[-1, -2, -3, -4], 1, Concat, [1]], [-1, 1, Conv, [256, 1, 1, None, 1, nn.LeakyReLU(0.1)]], [[-1, -7], 1, Concat, [1]], [-1, 1, Conv, [256, 1, 1, None, 1, nn.LeakyReLU(0.1)]], # 37->

[-1, 1, Conv, [128, 1, 1, None, 1, nn.LeakyReLU(0.1)]], [-1, 1, nn.Upsample, [None, 2, 'nearest']], [21, 1, Conv, [128, 1, 1, None, 1, nn.LeakyReLU(0.1)]], # route backbone P4 4040256 改 [[-1, -2], 1, Concat, [1]],

[-1, 1, Conv, [64, 1, 1, None, 1, nn.LeakyReLU(0.1)]], [-2, 1, Conv, [64, 1, 1, None, 1, nn.LeakyReLU(0.1)]], [-1, 1, Conv, [64, 3, 1, None, 1, nn.LeakyReLU(0.1)]], [-1, 1, Conv, [64, 3, 1, None, 1, nn.LeakyReLU(0.1)]], [[-1, -2, -3, -4], 1, Concat, [1]], [-1, 1, Conv, [128, 1, 1, None, 1, nn.LeakyReLU(0.1)]], # 47->

[-1, 1, Conv, [64, 1, 1, None, 1, nn.LeakyReLU(0.1)]], [-1, 1, nn.Upsample, [None, 2, 'nearest']], [14, 1, Conv, [64, 1, 1, None, 1, nn.LeakyReLU(0.1)]], # route backbone P3 8080128 改 [[-1, -2], 1, Concat, [1]],

[-1, 1, Conv, [32, 1, 1, None, 1, nn.LeakyReLU(0.1)]], [-2, 1, Conv, [32, 1, 1, None, 1, nn.LeakyReLU(0.1)]], [-1, 1, Conv, [32, 3, 1, None, 1, nn.LeakyReLU(0.1)]], [-1, 1, Conv, [32, 3, 1, None, 1, nn.LeakyReLU(0.1)]], [[-1, -2, -3, -4], 1, Concat, [1]], [-1, 1, Conv, [64, 1, 1, None, 1, nn.LeakyReLU(0.1)]], # 57->

[-1, 1, Conv, [128, 3, 2, None, 1, nn.LeakyReLU(0.1)]], [[-1, 47], 1, Concat, [1]],

[-1, 1, Conv, [64, 1, 1, None, 1, nn.LeakyReLU(0.1)]], [-2, 1, Conv, [64, 1, 1, None, 1, nn.LeakyReLU(0.1)]], [-1, 1, Conv, [64, 3, 1, None, 1, nn.LeakyReLU(0.1)]], [-1, 1, Conv, [64, 3, 1, None, 1, nn.LeakyReLU(0.1)]], [[-1, -2, -3, -4], 1, Concat, [1]], [-1, 1, Conv, [128, 1, 1, None, 1, nn.LeakyReLU(0.1)]], # 65->

[-1, 1, Conv, [256, 3, 2, None, 1, nn.LeakyReLU(0.1)]], [[-1, 37], 1, Concat, [1]], # 37改为:

[-1, 1, Conv, [128, 1, 1, None, 1, nn.LeakyReLU(0.1)]], [-2, 1, Conv, [128, 1, 1, None, 1, nn.LeakyReLU(0.1)]], [-1, 1, Conv, [128, 3, 1, None, 1, nn.LeakyReLU(0.1)]], [-1, 1, Conv, [128, 3, 1, None, 1, nn.LeakyReLU(0.1)]], [[-1, -2, -3, -4], 1, Concat, [1]], [-1, 1, Conv, [256, 1, 1, None, 1, nn.LeakyReLU(0.1)]], # 73->

[57, 1, Conv, [128, 3, 1, None, 1, nn.LeakyReLU(0.1)]], [65, 1, Conv, [256, 3, 1, None, 1, nn.LeakyReLU(0.1)]], [73, 1, Conv, [512, 3, 1, None, 1, nn.LeakyReLU(0.1)]],

[[74,75,76], 1, IDetect, [nc, anchors]], # Detect(P3, P4, P5) ] ` Hi,professor, this is my yaml doc fitting to yolo-tiny model. I just replace those CBS blocks by WTConv where the input and output channel are supposed to be the same. And the params I set the same as the original Yolo-tiny, with kernel-size=3 and stride=1, I did nothing with the wt_levels param and it is supposed to be 1. The model is trained smoothly (but not a better performance compared with original yolo-tiny, though), but failed to run test.py for the error like this issue's title. the above is WTConv added to the model: a65535da0ba2b939d941746b3d5ddff the above two figures are the error information when i try to run test.py 098cb922eabca9ed2faa754059401dc bc5fca2db35f4aa3270657ef915dca0 I notice that there is a warning in definition of forward() and I wonder whether it impact the unpredictible result.

Thanks for your comment and looking for your reply!

shahaffind commented 1 week ago

Unfortunately, I am unfamiliar with the YOLO codebase to give precise answers. However, since it trains well, the warning you showed might be the key to the problem. Did you try converting the boolean tensor to float? Does the tensor have to be boolean?

Anthony0106 commented 1 week ago

@shahaffind thanks for your answer, I tried to convert the tensor back, but it doesn't work. I might try other methods in future and if it works I will tell you then.

Anthony0106 commented 1 week ago

@shahaffind thanks for your answer, I tried to convert the tensor back, but it doesn't work. I might try other methods in future and if it works I will tell you then.

shahaffind commented 6 days ago

From the error it seems like some operations don't work well when converting to (or from) Boolean. Another option is that the error might come from the padding. When either of the spatial dimensions is odd (therefore can not be divided by 2) we add zero padding of size 1 to that dimension. This padding operation might not fit a Boolean tensor.