apache / mxnet

Lightweight, Portable, Flexible Distributed/Mobile Deep Learning with Dynamic, Mutation-aware Dataflow Dep Scheduler; for Python, R, Julia, Scala, Go, Javascript and more
https://mxnet.apache.org
Apache License 2.0
20.77k stars 6.79k forks source link

Fine-tune the mxnet ssd get mismatchfrom.shape() error #6474

Closed liumusicforever closed 6 years ago

liumusicforever commented 7 years ago

Hi,

  1. I created my own .rec with one class dataset (people) , (which reference to train.rec (created from prepare_dataset.py))
  2. I use this .rec to fine-tune with pre-trained model (vgg16_ssd_300_voc0712_trainval) , and I pass the following command : python train.py --gpus 0,1,2 --batch-size 100 --train-path ~/path/to/my/own/train.rec --val-path ~/path/to/my/own/val.rec --num-example 10000 --end-epoch 1000 --prefix=model/ssd --batch-size 32 --class-names people --num-class 1 --finetune 1

3.Excuse me why I get the following error : mxnet.base.MXNetError: [11:58:35] src/ndarray/ndarray.cc:299: Check failed: from.shape() == to->shape() operands shape mismatchfrom.shape = (126,) to.shape=(12,)

Stack trace returned 10 entries: [bt] (0) /usr/local/lib/python2.7/dist-packages/mxnet-0.10.0-py2.7.egg/mxnet/libmxnet.so(_ZN4dmlc15LogMessageFatalD1Ev+0x3c) [0x7f440199efbc] [bt] (1) /usr/local/lib/python2.7/dist-packages/mxnet-0.10.0-py2.7.egg/mxnet/libmxnet.so(_ZN5mxnet10CopyFromToERKNS_7NDArrayEPS0_i+0x105) [0x7f4402474eb5] [bt] (2) /usr/local/lib/python2.7/dist-packages/mxnet-0.10.0-py2.7.egg/mxnet/libmxnet.so(+0x115ac54) [0x7f44024d2c54] [bt] (3) /usr/local/lib/python2.7/dist-packages/mxnet-0.10.0-py2.7.egg/mxnet/libmxnet.so(MXImperativeInvoke+0x2cd) [0x7f44023533fd] [bt] (4) /usr/lib/x86_64-linux-gnu/libffi.so.6(ffi_call_unix64+0x4c) [0x7f440dfc6adc] [bt] (5) /usr/lib/x86_64-linux-gnu/libffi.so.6(ffi_call+0x1fc) [0x7f440dfc640c] [bt] (6) /usr/lib/python2.7/lib-dynload/_ctypes.x86_64-linux-gnu.so(_ctypes_callproc+0x48e) [0x7f440e1dd5fe] [bt] (7) /usr/lib/python2.7/lib-dynload/_ctypes.x86_64-linux-gnu.so(+0x15f9e) [0x7f440e1def9e] [bt] (8) python(PyEval_EvalFrameEx+0x98d) [0x5244dd] [bt] (9) python(PyEval_EvalCodeEx+0x2b1) [0x555551]

4.(126,) to.shape=(12,) above ,I guess this may meens the ord model has 20 class and one backgroung (20+1)6=126 ,and my one data only have one class and one background (1+1)6=12 ,but I put args "--num-class 1" and "fine-tune 1" already , why still show this error?

5.please help me to fix it , thanks!!!

adrianloy commented 7 years ago

Could you solve this? I get the same error all the time I try to use the demo for a model that I trained myself

hungpt297 commented 7 years ago

@liumusicforever @adrianloy Could you solve this? I get same error. I saw that Release-v0.2-beta provided 2 models:

Please help me. Many thanks.

adrianloy commented 7 years ago

Yeah I could. Check if the CLASSES are set correctly at training and in demo.py. If you train with a single class, you need to set CLASS =[classname,] otherwise len(CLASS) returns a wrong value.

lijuan123 commented 7 years ago

@adrianloy Hi , I have set the num_class and class_names in train.py, but i still have the erro. So i want to know if there are other places i need to change the setting, thank you very much!

adrianloy commented 7 years ago

When I had to do with this project, I also had to adjust it in demo.py and when preparing the dataset. But I do not know if that is still the case, they changed some stuff in the last month and I am not up to date anymore.

liumusicforever commented 7 years ago

I solve from making sure the num_class is equal with calling by symbol.py (importlib)"" and loading from model params file (load_checkpoint).

lijuan123 commented 7 years ago

@liumusicforever oh sorry, can you tell me more about it. I don't understand well with what you mean. The symbol used is resnet50, and i load the pretrained model from epoch 0 . Thank you again

liumusicforever commented 7 years ago

Did your classes number of pretrained model is same as classes number of symbol ?

liumusicforever commented 7 years ago

sorry , I make a mistake , I mean numbers of class not shape of data above.

lijuan123 commented 7 years ago

@liumusicforever thank you!

szha commented 6 years ago

@apache/mxnet-committers: This issue has been inactive for the past 90 days. It has no label and needs triage.

For general "how-to" questions, our user forum (and Chinese version) is a good place to get help.

lanking520 commented 6 years ago

Hi @liumusicforever , are you still facing this issue?

liumusicforever commented 6 years ago

@lanking520 I solved it by checking dimension on pretrained model , symbol and datas.

lanking520 commented 6 years ago

Thank you @liumusicforever . Are we safe to close this issue?

liumusicforever commented 6 years ago

yes !