train error like this: - Githubissues

dmuqlzhang commented 4 years ago

train error like this, i trained on python2.7, can you help me fix it ?

jychoi118 commented 4 years ago

your trying to reshape "cls" to 64x15x845, but your "cls" is 1081600=64x20x845. Checkout your "num_anchors" and "num_classes" which should be 5 and 1 respectively. Or checkout the filter number at last conv layer, which should be 30.

dmuqlzhang commented 4 years ago

your trying to reshape "cls" to 64x15x845, but your "cls" is 1081600=64x20x845. Checkout your "num_anchors" and "num_classes" which should be 5 and 1 respectively. Or checkout the filter number at last conv layer, which should be 30.

thanks for your reply, I check your said config like this: however , the config is right like your advice, but the same error was happened：

jychoi118 commented 4 years ago

I'm not sure, but maybe your "meta loader" is using full 20 classes, instead of 15 base classes. Class MetaDataset prints out number of classes like below. If it prints out 20 instead of 15, then maybe your missing voc_novels.txt in data folder.

dmuqlzhang commented 4 years ago

I'm not sure, but maybe your "meta loader" is using full 20 classes, instead of 15 base classes. Class MetaDataset prints out number of classes like below. If it prints out 20 instead of 15, then maybe your missing voc_novels.txt in data folder.

dmuqlzhang commented 4 years ago

thanks for your quick reply! i check your advice like this:

should I delete othor 15 base class only retain 5 novel in voc_novels.txt?

dmuqlzhang commented 4 years ago

can I check your network structure?

1081600 = 1280 5 13 * 13 , what meaning of 1280?

jychoi118 commented 4 years ago

Hmm... strange... :( Given dataloader automatically select base classes according to voc_novel.txt so you don't need to erase anything. Output of layer 29 is concat of layer 24 and 27. 1280=1024+256. My backbone network architecture is same as yours.

dmuqlzhang commented 4 years ago

Hmm... strange... :( Given dataloader automatically select base classes according to voc_novel.txt so you don't need to erase anything. Output of layer 29 is concat of layer 24 and 27. 1280=1024+256. My backbone network architecture is same as yours.

Hmm... strange... :( Given dataloader automatically select base classes according to voc_novel.txt so you don't need to erase anything. Output of layer 29 is concat of layer 24 and 27. 1280=1024+256. My backbone network architecture is same as yours.

I print the shape of cls before reshape: and i want to know should i delete 5 novel in voc_traindict_full.txt?

jychoi118 commented 4 years ago

I get 960x5x13x13 for cls shape before cls.view, which is same value as 64x15x845. And you don't need to delete novel classes in voc_traindict_full.txt

In dynamic convolution (implemented in dynamic_conv.py) before last layer, output is reweighting vector for each base class. That's why the batch size of output of dynamic_conv is 15 times of original batch size(64). (960=15x64) Maybe something is wrong in dynamic conv...?

dmuqlzhang commented 4 years ago

I get 960x5x13x13 for cls shape before cls.view, which is same value as 64x15x845. And you don't need to delete novel classes in voc_traindict_full.txt

In dynamic convolution (implemented in dynamic_conv.py) before last layer, output is reweighting vector for each base class. That's why the batch size of output of dynamic_conv is 15 times of original batch size(64). (960=15x64) Maybe something is wrong in dynamic conv...?

thanks！ I found the reason from DataParallel， and i choose gpu id beyond my gpu number...., have fix it, now can run..

eghouti commented 4 years ago

Hello, I have the same issue. Can you please explain to me what did you do? Thank you

Miracle-hpf commented 4 years ago

@dmuqlzhang 您好，我也遇到了和你一样的问题，我能请教一下你是怎么解决的吗？请问能给我你的联系方式吗？打搅了！

dmuqlzhang commented 4 years ago

@dmuqlzhang 您好，我也遇到了和你一样的问题，我能请教一下你是怎么解决的吗？请问能给我你的联系方式吗？打搅了！

qq 1379721621

ZhangXG001 commented 3 years ago

@eghouti @liuzhuang13 @jychoi118 In original metatune.data, gpus = 1,2,3,4, you should change it according to your device(mine: gpus = 0,1,2,3 please set the correct gpu id) . It might help you.

bingykang / Fewshot_Detection

train error like this: #28