Issue with vgg_vd_16_reduced for mcn ssd pascal_demo_train

albanie / mcnSSD

A matconvnet implementation of the Single Shot Detector

MIT License

36 stars 23 forks source link

Issue with vgg_vd_16_reduced for mcn ssd pascal_demo_train #19

Open kankanar opened 6 years ago

kankanar commented 6 years ago

Hi, I am trying to run ssd_pascal_train.m. Matconvnet is compiled using beta version 25 with GPU support and with Visual studio 2015. Demo code with pretrained network is running fine. But when trying to run ssd_pascal_train.m, it loads vgg_vd_16_reduced.mat. It seems incompatible somehow. Can you please suggest some solution. I am pasting the error I am getting

Warning: The model appears to be simplenn model. Using fromSimpleNN instead.

In dagnn.DagNN.loadobj (line 19) In ssd_zoo (line 29) In ssd_init (line 27) In ssd_train (line 19) In ssd_pascal_train (line 217) Error using Layer.fromDagNN (line 65) Input must be a DagNN or SimpleNN.

Error in Net (line 93) objects = Layer.fromDagNN(objects{:}) ;

Error in ssd_init (line 177) net = Net(all_losses) ;

Error in ssd_train (line 19) net = opts.modelOpts.net_init(opts) ;

Error in ssd_pascal_train (line 217) ssd_train(expDir, opts) ;

zacr0 commented 6 years ago

Exact same problem here. Any solutions?

albanie commented 6 years ago

Apologies for the silence - unfortunately dealing with upcoming deadlines atm, but will be able to take a look in a few days (feel free to buzz the issue again then).

zacr0 commented 6 years ago

Ok @albanie, thank you very much for your time!

I've been trying to debug the problem, and I think I've found a possible source of it: In ssd_init.m, from line 162 to 164 we have the following code: multiloss = add_loss(opts, gtBoxes, gtLabels, ... fusedPriors, fusedConfs, fusedLocs) ; all_losses = {multiloss};

The all_loses variable is contained in a cell, which later causes that the following if sentence fails at Net.m, line 92: if isscalar(objects) && ~isa(objects{1}, 'Layer') This if expects a Layer, but as all_loses is a level deeper, the output is 0 and then the program thinks this variable is a DagNN, which leads to the fatal error. Then, if we replace all_losses = {multiloss}; for all_loses = multiloss; (no cell), the error no longer happens. Nevertheless, I can't confirm this is definitely fixed, I didn't get to start training yet.

Come on, we got this! :)

UPDATE: after making the change commented above and this one I've made some progress.

kankanar commented 6 years ago

Yes, thank you zacr0 for finding out the bug. It solves the problem. After that, we need to change one line in cnn_train_autonn in line no 136 to opts.extractStatsFn = @(stats, net, batchSize) fn(stats, net, sel, batchSize) ; or we can change function definition itself. It started training after that. Once again thank you albanie for developing it Matlab version of ssd detector and zacr0 for debugging it.

zacr0 commented 6 years ago

Thanks for the finding @kankanar ! I was getting this warning, followed by an error: `Warning: The most recent version of vl_nnloss normalizes the loss by the batch size. The current version does not. A workaround is being used, but consider updating MatConvNet.

In cnn_train_autonn (line 32) In ssd_train (line 20) In ssd_pascal_train (line 213) `

which seems related to what you said. I'll try the change you have mentioned as soon as possible, thanks again!

UPDATE 1: ok that warning is caused because I'm using matconvnet-beta25, which vl_nnloss version is outdated and does not support that option yet.

UPDATE 2: yes, the change pointed by @kankanar works and the script starts training. Thanks again.

albanie commented 6 years ago

I've pushed a fix - let me know if there are still issues.

zacr0 commented 6 years ago

It seems it works fine now, amazing job! I've run the training script for some iterations and got NaN as result, but I've noticed there were previous issues related that pointed the problem is that the default learning rate value is too high, so is not related to the current issue.

Thank you very much!