aau-cns / yolov4

Wrapper for Scaled-YOLOv4
GNU General Public License v3.0
0 stars 1 forks source link

Yolo error with image size 1280x720 #2

Closed lishiyu0088 closed 1 year ago

lishiyu0088 commented 1 year ago

Dear authors, I tried to my own dataset with different image resolution. When I try to load the pretrained models with image resolution of 1280x720, I meet this error: RuntimeError: Sizes of tensors must match except in dimension 2. Got 46 and 45 (The offending index is 0) My network is like follows: 0/162 Sequential - [16, 32, 720, 1280] 1/162 Sequential - [16, 64, 360, 640] 2/162 Sequential - [16, 64, 360, 640] 3/162 FeatureConcat - [16, 64, 360, 640] >> layer 2 [16, 64, 360, 640] + layer -2 [16, 64, 360, 640] 4/162 Sequential - [16, 64, 360, 640] 5/162 Sequential - [16, 32, 360, 640] 6/162 Sequential - [16, 64, 360, 640] 7/162 WeightedFeatureFusion - [16, 64, 360, 640] >> layer 6 [16, 64, 360, 640] + layer -3 [16, 64, 360, 640] 8/162 Sequential - [16, 64, 360, 640] 9/162 FeatureConcat - [16, 128, 360, 640] >> layer 8 [16, 64, 360, 640] + layer -1 [16, 64, 360, 640] + layer -7 [16, 64, 360, 640] 10/162 Sequential - [16, 64, 360, 640] 11/162 Sequential - [16, 128, 180, 320] 12/162 Sequential - [16, 64, 180, 320] 13/162 FeatureConcat - [16, 128, 180, 320] >> layer 12 [16, 64, 180, 320] + layer -2 [16, 128, 180, 320] 14/162 Sequential - [16, 64, 180, 320] 15/162 Sequential - [16, 64, 180, 320] 16/162 Sequential - [16, 64, 180, 320] 17/162 WeightedFeatureFusion - [16, 64, 180, 320] >> layer 16 [16, 64, 180, 320] + layer -3 [16, 64, 180, 320] 18/162 Sequential - [16, 64, 180, 320] 19/162 Sequential - [16, 64, 180, 320] 20/162 WeightedFeatureFusion - [16, 64, 180, 320] >> layer 19 [16, 64, 180, 320] + layer -3 [16, 64, 180, 320] 21/162 Sequential - [16, 64, 180, 320] 22/162 FeatureConcat - [16, 128, 180, 320] >> layer 21 [16, 64, 180, 320] + layer -1 [16, 64, 180, 320] + layer -10 [16, 64, 180, 320] 23/162 Sequential - [16, 128, 180, 320] 24/162 Sequential - [16, 256, 90, 160] 25/162 Sequential - [16, 128, 90, 160] 26/162 FeatureConcat - [16, 256, 90, 160] >> layer 25 [16, 128, 90, 160] + layer -2 [16, 256, 90, 160] 27/162 Sequential - [16, 128, 90, 160] 28/162 Sequential - [16, 128, 90, 160] 29/162 Sequential - [16, 128, 90, 160] 30/162 WeightedFeatureFusion - [16, 128, 90, 160] >> layer 29 [16, 128, 90, 160] + layer -3 [16, 128, 90, 160] 31/162 Sequential - [16, 128, 90, 160] 32/162 Sequential - [16, 128, 90, 160] 33/162 WeightedFeatureFusion - [16, 128, 90, 160] >> layer 32 [16, 128, 90, 160] + layer -3 [16, 128, 90, 160] 34/162 Sequential - [16, 128, 90, 160] 35/162 Sequential - [16, 128, 90, 160] 36/162 WeightedFeatureFusion - [16, 128, 90, 160] >> layer 35 [16, 128, 90, 160] + layer -3 [16, 128, 90, 160] 37/162 Sequential - [16, 128, 90, 160] 38/162 Sequential - [16, 128, 90, 160] 39/162 WeightedFeatureFusion - [16, 128, 90, 160] >> layer 38 [16, 128, 90, 160] + layer -3 [16, 128, 90, 160] 40/162 Sequential - [16, 128, 90, 160] 41/162 Sequential - [16, 128, 90, 160] 42/162 WeightedFeatureFusion - [16, 128, 90, 160] >> layer 41 [16, 128, 90, 160] + layer -3 [16, 128, 90, 160] 43/162 Sequential - [16, 128, 90, 160] 44/162 Sequential - [16, 128, 90, 160] 45/162 WeightedFeatureFusion - [16, 128, 90, 160] >> layer 44 [16, 128, 90, 160] + layer -3 [16, 128, 90, 160] 46/162 Sequential - [16, 128, 90, 160] 47/162 Sequential - [16, 128, 90, 160] 48/162 WeightedFeatureFusion - [16, 128, 90, 160] >> layer 47 [16, 128, 90, 160] + layer -3 [16, 128, 90, 160] 49/162 Sequential - [16, 128, 90, 160] 50/162 Sequential - [16, 128, 90, 160] 51/162 WeightedFeatureFusion - [16, 128, 90, 160] >> layer 50 [16, 128, 90, 160] + layer -3 [16, 128, 90, 160] 52/162 Sequential - [16, 128, 90, 160] 53/162 FeatureConcat - [16, 256, 90, 160] >> layer 52 [16, 128, 90, 160] + layer -1 [16, 128, 90, 160] + layer -28 [16, 128, 90, 160] 54/162 Sequential - [16, 256, 90, 160] 55/162 Sequential - [16, 512, 45, 80] 56/162 Sequential - [16, 256, 45, 80] 57/162 FeatureConcat - [16, 512, 45, 80] >> layer 56 [16, 256, 45, 80] + layer -2 [16, 512, 45, 80] 58/162 Sequential - [16, 256, 45, 80] 59/162 Sequential - [16, 256, 45, 80] 60/162 Sequential - [16, 256, 45, 80] 61/162 WeightedFeatureFusion - [16, 256, 45, 80] >> layer 60 [16, 256, 45, 80] + layer -3 [16, 256, 45, 80] 62/162 Sequential - [16, 256, 45, 80] 63/162 Sequential - [16, 256, 45, 80] 64/162 WeightedFeatureFusion - [16, 256, 45, 80] >> layer 63 [16, 256, 45, 80] + layer -3 [16, 256, 45, 80] 65/162 Sequential - [16, 256, 45, 80] 66/162 Sequential - [16, 256, 45, 80] 67/162 WeightedFeatureFusion - [16, 256, 45, 80] >> layer 66 [16, 256, 45, 80] + layer -3 [16, 256, 45, 80] 68/162 Sequential - [16, 256, 45, 80] 69/162 Sequential - [16, 256, 45, 80] 70/162 WeightedFeatureFusion - [16, 256, 45, 80] >> layer 69 [16, 256, 45, 80] + layer -3 [16, 256, 45, 80] 71/162 Sequential - [16, 256, 45, 80] 72/162 Sequential - [16, 256, 45, 80] 73/162 WeightedFeatureFusion - [16, 256, 45, 80] >> layer 72 [16, 256, 45, 80] + layer -3 [16, 256, 45, 80] 74/162 Sequential - [16, 256, 45, 80] 75/162 Sequential - [16, 256, 45, 80] 76/162 WeightedFeatureFusion - [16, 256, 45, 80] >> layer 75 [16, 256, 45, 80] + layer -3 [16, 256, 45, 80] 77/162 Sequential - [16, 256, 45, 80] 78/162 Sequential - [16, 256, 45, 80] 79/162 WeightedFeatureFusion - [16, 256, 45, 80] >> layer 78 [16, 256, 45, 80] + layer -3 [16, 256, 45, 80] 80/162 Sequential - [16, 256, 45, 80] 81/162 Sequential - [16, 256, 45, 80] 82/162 WeightedFeatureFusion - [16, 256, 45, 80] >> layer 81 [16, 256, 45, 80] + layer -3 [16, 256, 45, 80] 83/162 Sequential - [16, 256, 45, 80] 84/162 FeatureConcat - [16, 512, 45, 80] >> layer 83 [16, 256, 45, 80] + layer -1 [16, 256, 45, 80] + layer -28 [16, 256, 45, 80] 85/162 Sequential - [16, 512, 45, 80] 86/162 Sequential - [16, 1024, 23, 40] 87/162 Sequential - [16, 512, 23, 40] 88/162 FeatureConcat - [16, 1024, 23, 40] >> layer 87 [16, 512, 23, 40] + layer -2 [16, 1024, 23, 40] 89/162 Sequential - [16, 512, 23, 40] 90/162 Sequential - [16, 512, 23, 40] 91/162 Sequential - [16, 512, 23, 40] 92/162 WeightedFeatureFusion - [16, 512, 23, 40] >> layer 91 [16, 512, 23, 40] + layer -3 [16, 512, 23, 40] 93/162 Sequential - [16, 512, 23, 40] 94/162 Sequential - [16, 512, 23, 40] 95/162 WeightedFeatureFusion - [16, 512, 23, 40] >> layer 94 [16, 512, 23, 40] + layer -3 [16, 512, 23, 40] 96/162 Sequential - [16, 512, 23, 40] 97/162 Sequential - [16, 512, 23, 40] 98/162 WeightedFeatureFusion - [16, 512, 23, 40] >> layer 97 [16, 512, 23, 40] + layer -3 [16, 512, 23, 40] 99/162 Sequential - [16, 512, 23, 40] 100/162 Sequential - [16, 512, 23, 40] 101/162 WeightedFeatureFusion - [16, 512, 23, 40] >> layer 100 [16, 512, 23, 40] + layer -3 [16, 512, 23, 40] 102/162 Sequential - [16, 512, 23, 40] 103/162 FeatureConcat - [16, 1024, 23, 40] >> layer 102 [16, 512, 23, 40] + layer -1 [16, 512, 23, 40] + layer -16 [16, 512, 23, 40] 104/162 Sequential - [16, 1024, 23, 40] 105/162 Sequential - [16, 512, 23, 40] 106/162 Sequential - [16, 1024, 23, 40] 107/162 Sequential - [16, 512, 23, 40] 108/162 MaxPool2d - [16, 512, 23, 40] 109/162 FeatureConcat - [16, 512, 23, 40] >> layer 108 [16, 512, 23, 40] + layer -2 [16, 512, 23, 40] 110/162 MaxPool2d - [16, 512, 23, 40] 111/162 FeatureConcat - [16, 512, 23, 40] >> layer 110 [16, 512, 23, 40] + layer -4 [16, 512, 23, 40] 112/162 MaxPool2d - [16, 512, 23, 40] 113/162 FeatureConcat - [16, 2048, 23, 40] >> layer 112 [16, 512, 23, 40] + layer -1 [16, 512, 23, 40] + layer -3 [16, 512, 23, 40] + layer -5 [16, 512, 23, 40] + layer -6 [16, 512, 23, 40] 114/162 Sequential - [16, 512, 23, 40] 115/162 Sequential - [16, 1024, 23, 40] 116/162 Sequential - [16, 512, 23, 40] 117/162 Sequential - [16, 256, 23, 40] 118/162 Upsample - [16, 256, 46, 80] 119/162 FeatureConcat - [16, 512, 45, 80] >> layer 118 [16, 256, 46, 80] + layer 85 [16, 512, 45, 80] 120/162 Sequential - [16, 256, 45, 80]

In the last layer of featureConcat, channel 45 does not match with 46 in layer 118. Is there an option in cns_yolo.py to select image size?

Great thanks in advance!

tgjantos commented 1 year ago

Dear @lishiyu0088,

Scaled-YOLOv4 needs image sizes to be divisible by 32, hence the dimension mismatch you experience due to 720 not being divisible by 32. If you check test.py or train.py, you will find that they assert that the image size is correct (divisible by 32), otherwise they resize the image to make it fit. Therefore, to use YOLO with your custom image size you have to ensure that the images are correctly resized in the code, e.g. the dataloader.

I hope this answers your questions!

Best, Thomas