Size mismatch, unable to train base model

quanvuong commented 4 years ago

Hi, sorry to bother you! I ran into the following error when trying to train the base model. I am using pytorch 0.3.1 and python 2.7. I attached the full log of stdout and the modified code to print out the size.

(featurereweight) quan@Bayes:~/few_shot/Fewshot_Detection$ python train_meta.py cfg/metayolo.data cfg/darknet_dynamic.cfg cfg/reweighting_net.cfg darknet19_448.conv.23 /home/quan/few_shot/Fewshot_Detection/data/coco.names ('save_interval', 10) ['bird', 'bus', 'cow', 'motorbike', 'sofa'] ('base_ids', [0, 1, 3, 4, 6, 7, 8, 10, 11, 12, 14, 15, 16, 18, 19]) logging to backup/metayolo_novel0_neg1 ('class_scale', 1) /home/quan/few_shot/Fewshot_Detection/cfg.py:455: UserWarning: src is not broadcastable to dst, but they have the same number of elements. Falling back to deprecated pointwise behavior. convmodel.weight.data.copy(torch.from_numpy(buf[start:start+num_w])); start = start + num_w layer filters size input output 0 conv 32 3 x 3 / 1 416 x 416 x 3 -> 416 x 416 x 32 1 max 2 x 2 / 2 416 x 416 x 32 -> 208 x 208 x 32 2 conv 64 3 x 3 / 1 208 x 208 x 32 -> 208 x 208 x 64 3 max 2 x 2 / 2 208 x 208 x 64 -> 104 x 104 x 64 4 conv 128 3 x 3 / 1 104 x 104 x 64 -> 104 x 104 x 128 5 conv 64 1 x 1 / 1 104 x 104 x 128 -> 104 x 104 x 64 6 conv 128 3 x 3 / 1 104 x 104 x 64 -> 104 x 104 x 128 7 max 2 x 2 / 2 104 x 104 x 128 -> 52 x 52 x 128 8 conv 256 3 x 3 / 1 52 x 52 x 128 -> 52 x 52 x 256 9 conv 128 1 x 1 / 1 52 x 52 x 256 -> 52 x 52 x 128 10 conv 256 3 x 3 / 1 52 x 52 x 128 -> 52 x 52 x 256 11 max 2 x 2 / 2 52 x 52 x 256 -> 26 x 26 x 256 12 conv 512 3 x 3 / 1 26 x 26 x 256 -> 26 x 26 x 512 13 conv 256 1 x 1 / 1 26 x 26 x 512 -> 26 x 26 x 256 14 conv 512 3 x 3 / 1 26 x 26 x 256 -> 26 x 26 x 512 15 conv 256 1 x 1 / 1 26 x 26 x 512 -> 26 x 26 x 256 16 conv 512 3 x 3 / 1 26 x 26 x 256 -> 26 x 26 x 512 17 max 2 x 2 / 2 26 x 26 x 512 -> 13 x 13 x 512 18 conv 1024 3 x 3 / 1 13 x 13 x 512 -> 13 x 13 x1024 19 conv 512 1 x 1 / 1 13 x 13 x1024 -> 13 x 13 x 512 20 conv 1024 3 x 3 / 1 13 x 13 x 512 -> 13 x 13 x1024 21 conv 512 1 x 1 / 1 13 x 13 x1024 -> 13 x 13 x 512 22 conv 1024 3 x 3 / 1 13 x 13 x 512 -> 13 x 13 x1024 23 conv 1024 3 x 3 / 1 13 x 13 x1024 -> 13 x 13 x1024 24 conv 1024 3 x 3 / 1 13 x 13 x1024 -> 13 x 13 x1024 25 route 16 26 conv 64 1 x 1 / 1 26 x 26 x 512 -> 26 x 26 x 64 27 reorg / 2 26 x 26 x 64 -> 13 x 13 x 256 28 route 27 24 29 conv 1024 3 x 3 / 1 13 x 13 x1280 -> 13 x 13 x1024 30 dconv 1024 1 x 1 / 1 13 x 13 x1024 -> 13 x 13 x1024 31 conv 30 1 x 1 / 1 13 x 13 x1024 -> 13 x 13 x 30 32 detection

layer filters size input output 0 conv 32 3 x 3 / 1 416 x 416 x 4 -> 416 x 416 x 32 1 max 2 x 2 / 2 416 x 416 x 32 -> 208 x 208 x 32 2 conv 64 3 x 3 / 1 208 x 208 x 32 -> 208 x 208 x 64 3 max 2 x 2 / 2 208 x 208 x 64 -> 104 x 104 x 64 4 conv 128 3 x 3 / 1 104 x 104 x 64 -> 104 x 104 x 128 5 max 2 x 2 / 2 104 x 104 x 128 -> 52 x 52 x 128 6 conv 256 3 x 3 / 1 52 x 52 x 128 -> 52 x 52 x 256 7 max 2 x 2 / 2 52 x 52 x 256 -> 26 x 26 x 256 8 conv 512 3 x 3 / 1 26 x 26 x 256 -> 26 x 26 x 512 9 max 2 x 2 / 2 26 x 26 x 512 -> 13 x 13 x 512 10 conv 1024 3 x 3 / 1 13 x 13 x 512 -> 13 x 13 x1024 11 max 2 x 2 / 2 13 x 13 x1024 -> 6 x 6 x1024 12 conv 1024 3 x 3 / 1 6 x 6 x1024 -> 6 x 6 x1024 13 glomax 6 x 6 / 1 6 x 6 x1024 -> 1 x 1 x1024 1 14554 80200 64 10 ===> Number of samples (before filtring): 4952 ===> Number of samples (after filtring): 4952 ('num classes: ', 15) factor: 3.0 ===> Number of samples (before filtring): 14554 ===> Number of samples (after filtring): 14554 ('num classes: ', 15) 2019-12-02 20:30:00 epoch 0/353, processed 0 samples, lr 0.000033 ('nA', 5) ('nC', 1) ('nH', 13) ('nW', 13) ('bs', 64) ('cs', 15) ('output.shape', (1280L, 30L, 13L, 13L)) ('cls.shape', (1280L, 5L, 6L, 13L, 13L)) ('cls.shape', (1280L, 5L, 13L, 13L)) Traceback (most recent call last): File "train_meta.py", line 325, in train(epoch) File "train_meta.py", line 221, in train loss = region_loss(output, target) File "/home/quan/miniconda3/envs/featurereweight/lib/python2.7/site-packages/torch/nn/modules/module.py", line 357, in call result = self.forward(*input, *kwargs) File "/home/quan/few_shot/Fewshot_Detection/region_loss.py", line 277, in forward cls = cls.view(bs, cs, nA nC nH nW).transpose(1, 2).contiguous().view(bs nA nC nH nW, cs) RuntimeError: invalid argument 2: size '[64 x 15 x 845]' is invalid for input with 1081600 elements at /opt/conda/conda-bld/pytorch_1518238581238/work/torch/lib/TH/THStorage.c:41

--- Please find below the sequences of print statement

    print('nA', nA)
    print('nC', nC)
    print('nH', nH)
    print('nW', nW)
    print('bs', bs)
    print('cs', cs)

    print('output.shape', output.shape)
    cls = output.view(output.size(0), nA, (5 + nC), nH, nW)

    print('cls.shape', cls.shape)
    cls = cls.index_select(2, Variable(torch.linspace(5, 5 + nC - 1, nC).long().cuda())).squeeze()

    print('cls.shape', cls.shape)
    cls = cls.view(bs, cs, nA * nC * nH * nW).transpose(1, 2).contiguous().view(bs * nA * nC * nH * nW, cs)

    print('cls.shape', cls.shape)

christegho commented 4 years ago

in train_meta.py, add the following:

@@ -107,6 +107,7 @@ test_loader = torch.utils.data.DataLoader(
     dataset.listDataset(testlist, shape=(init_width, init_height),
                    shuffle=False,
                    transform=transforms.Compose([
+                       transforms.Resize(416),
                        transforms.ToTensor(),
                    ]), train=False),
     batch_size=batch_size, shuffle=False, **kwargs)
@@ -174,6 +175,7 @@ def train(epoch):
         dataset.listDataset(trainlist, shape=(init_width, init_height),
                        shuffle=False,
                        transform=transforms.Compose([
+                           transforms.Resize(416),
                            transforms.ToTensor(),
                        ]),
                        train=True,

quanvuong commented 4 years ago

Thanks for the hint!

Does this affect performance significantly? Were you able to reproduce the results in the paper after resizing the input image? Also, why resize?

christegho commented 4 years ago

I am not sure why some images cause this error while others do not, but resizing all images will make sure the input has the same dimensions as what is expected, and which is defined in cfg/darknet_dynamic.cfg with 416 for both height and width.

On Tue, Dec 3, 2019 at 7:28 PM Quan Vuong notifications@github.com wrote:

Thanks for the hint!

Does this affect performance significantly? Were you able to reproduce the results in the paper after resizing the input image? Also, why resize?

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/bingykang/Fewshot_Detection/issues/21?email_source=notifications&email_token=ABZDIBKYZWQZ5BBZMQUOHXDQW2XN5A5CNFSM4JURY6ZKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEF2RAFI#issuecomment-561319957, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABZDIBMEXX56IFAHVLUAQ23QW2XN5ANCNFSM4JURY6ZA .

quanvuong commented 4 years ago

I see. Thank you!

Leaving this post open because the following questions have not been answered:

Why do we need resizing?
Were the results in the paper obtained with resizing to (416, 416)?

christegho commented 4 years ago

yes the paper uses images resized 416x416. see implementation details page 10.

the inputs to a network need to all have a fixed size input. You cannot have images of different sizes going in. Although not sure how the authors managed without adding the resizing transform.

On Tue, Dec 3, 2019 at 10:06 PM Quan Vuong notifications@github.com wrote:

I see. Thank you!

Leaving this post open because the following questions have not been answered:

Why do we need resizing?

Were the results in the paper obtained with resizing to (416, 416)?

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/bingykang/Fewshot_Detection/issues/21?email_source=notifications&email_token=ABZDIBPEHTV7GXH6YDCFGULQW3J5RA5CNFSM4JURY6ZKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEF27JFA#issuecomment-561378452, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABZDIBINGYSUBTCAACEDWRDQW3J5RANCNFSM4JURY6ZA .

quanvuong commented 4 years ago

Thanks!

bingykang / Fewshot_Detection

Size mismatch, unable to train base model #21