Runtime error - Githubissues

farshidfarhat commented 7 years ago

Could you please let me know the issue with my demo?

error.txt ... I1016 22:46:16.365223 24943 net.cpp:816] Ignoring source layer loss_loc I1016 22:46:16.374922 24943 net.cpp:816] Ignoring source layer loss_next save dir /gpfs/work/f/fuf111/deepcut/data/mpii-multiperson/scoremaps/test testing from net file /gpfs/work/f/fuf111/deepcut/data/caffe-models/ResNet-101-mpii-multiperson.caffemodel deepcut: test (MPII multiperson test) 2/1758 F1016 22:46:17.488354 24943 syncedmem.cpp:136] Cannot use GPU in CPU-only Caffe: check mode. * Check failure stack trace: *

eldar commented 7 years ago

Hi, can you try changing this line https://github.com/eldar/deepcut/blob/master/lib/pose/cnn_cache_features.m#L47 to caffe.set_mode_cpu(); ? I always use GPU, but it never occured to me that people might not have GPUs with large enough memory, sorry!

eldar commented 7 years ago

It's actually very difficult to say from this log, what the error is. I've never seen anything like that. So how exactly did you build caffe? "After applying the solution from issue 1799" - what was this fix?

farshidfarhat commented 7 years ago

here https://github.com/eldar/deepcut-cnn/blob/9b5de9cb70a0a440311178f26fbd6984d81e5c54/models/finetune_flickr_style/solver.prototxt#L17, I uncommented the last line to solve the issue about "Cannot use GPU in CPU-only Caffe".

Actually I installed Caffe locally (without SUDO/ROOT access) on a Redhat-based cluster. I changed Makefile.config as follows based on my system config: CXXFLAGS += -std=c++11 CPU_ONLY := 1 BLAS := mkl

I commented the following part https://github.com/eldar/deepcut-cnn/blob/9b5de9cb70a0a440311178f26fbd6984d81e5c54/src/caffe/layers/softmax_loss_vec_layer.cpp#L236-L251 similar to softmax_loss_layer.cpp by myself.

I couldn't "make solver-callback" from your instructions, as there was no "solver-callback:" in Makefile!

Also I made your change "caffe.set_mode_cpu();" in https://github.com/eldar/deepcut/blob/master/lib/pose/cnn_cache_features.m#L47

eldar commented 7 years ago

"make solver-callback" - this will have to be executed not in the directory of caffe, but of directory of the solver.

Can you run the CNN-only demo as described here: https://github.com/eldar/deepcut-cnn/#installation-instructions adding the use_cpu flag like so:

python ./pose_demo.py image.png --out_name=prediction

This will ensure that you got the CNN running, at the very least.

farshidfarhat commented 7 years ago

After debugging, I could run "python ./pose_demo.py image.png --out_name=prediction". But "make solver-callback" gives the following log: [ 50%] Building CXX object CMakeFiles/solver-callback.dir/src/pose/research/solver-callback.cxx.o cc1plus: error: unrecognized command line option "-std=c++11" make[3]: * [CMakeFiles/solver-callback.dir/src/pose/research/solver-callback.cxx.o] Error 1 make[2]: * [CMakeFiles/solver-callback.dir/all] Error 2 make[1]: * [CMakeFiles/solver-callback.dir/rule] Error 2 make: * [solver-callback] Error 2

farshidfarhat commented 7 years ago

I used this command to solve the above error:

cmake . -DCMAKE_C_COMPILER=gcc -DCMAKE_CXX_COMPILER=c++ -DGUROBI_ROOT_DIR=/usr/global/gurobi/gurobi651/linux64 -DGUROBI_VERSION=65

GCC and GUROBI should be compatible in this case. Finally I made it on my system.

make.err.txt

farshidfarhat commented 7 years ago

Segmentation fault after running the demo:

... I1020 11:20:43.944026 15336 net.cpp:228] conv1 does not need backward computation. I1020 11:20:43.944032 15336 net.cpp:270] This network produces output loc_pred I1020 11:20:43.944036 15336 net.cpp:270] This network produces output next_pred I1020 11:20:43.944042 15336 net.cpp:270] This network produces output prob I1020 11:20:43.944288 15336 net.cpp:283] Network initialization done. I1020 11:20:44.850095 15336 net.cpp:816] Ignoring source layer data I1020 11:20:44.850126 15336 net.cpp:816] Ignoring source layer label_data_1_split I1020 11:20:44.902542 15336 net.cpp:816] Ignoring source layer res4b4_up_pose I1020 11:20:44.902570 15336 net.cpp:816] Ignoring source layer crop_res4b4 I1020 11:20:44.902576 15336 net.cpp:816] Ignoring source layer loss_part_res4b4 I1020 11:20:44.902582 15336 net.cpp:816] Ignoring source layer res4b12_up_pose I1020 11:20:44.902587 15336 net.cpp:816] Ignoring source layer crop_res4b12 I1020 11:20:44.902593 15336 net.cpp:816] Ignoring source layer loss_part_res4b12 I1020 11:20:44.902909 15336 net.cpp:816] Ignoring source layer loss_part_res5c I1020 11:20:44.903682 15336 net.cpp:816] Ignoring source layer loss_loc I1020 11:20:44.912511 15336 net.cpp:816] Ignoring source layer loss_next save dir /gpfs/work/f/fuf111/deepcut/data/mpii-multiperson/scoremaps/test testing from net file /gpfs/work/f/fuf111/deepcut/data/caffe-models/ResNet-101-mpii-multiperson.caffemodel deepcut: test (MPII multiperson test) 2/1758 /usr/global/matlab/R2015a/bin/matlab: line 1: 15216 Segmentation fault pbs_taskset matlab-bin $@

eldar commented 7 years ago

Hey, I can't see from the log what exactly is the problem, but it could be that you didn't set the gurobi license file appropriately. This is where the location is set in the code https://github.com/eldar/deepcut/blob/master/lib/pose/exp_params.m#L18, you can modify it. You can obtain the academic license for free from Gurobi website.

P.S. In the next couple of days we will update the repository with completely new solver, that runs fast and also doesn't require any license.

farshidfarhat commented 7 years ago

Hi Eldar,

Thanks for your reply. Actually I did all the instructions as you posted in README.md as well as Gurobi license. I don't know Matlab version matters or not. But there is an error when I run ./start_matlab.sh as:

                                                           < M A T L A B (R) >
                                                 Copyright 1984-2015 The MathWorks, Inc.
                                                 R2015a (8.5.0.197613) 64-bit (glnxa64)
                                                            February 12, 2015

To get started, type one of these: helpwin, helpdesk, or demo. For product information, visit www.mathworks.com.

Pose startup done

Academic License

Error using dbstop Not enough input arguments.

eldar commented 7 years ago

Can you modify start_matlab.sh script or just start it with this command instead?

LD_PRELOAD=/usr/lib/x86_64-linux-gnu/libstdc++.so.6 matlab

farshidfarhat commented 7 years ago

Yes. I ran "dbstop if error" later inside Matlab, and the error is as follows:

... I1021 11:12:10.756536 2446 net.cpp:270] This network produces output next_pred I1021 11:12:10.756551 2446 net.cpp:270] This network produces output prob I1021 11:12:10.757047 2446 net.cpp:283] Network initialization done. Unexpected Standard exception from MEX file. What() is:basic_string::append ..

Error in caffe.Net/copyfrom (line 123) caffe('net_copy_from', self.hNet_self, weights_file);

Error in caffe.get_net (line 34) net.copy_from(weights_file);

Error in caffe.Net (line 31) self = caffe.get_net(varargin{:});

Error in cnn_cache_features (line 52) net = caffe.Net(net_def_file, net_bin_file, 'test');

Error in demo_multiperson (line 9) cnn_cache_features( experiment_index, 'test', image_index, 1);

123 caffe_('net_copy_from', self.hNet_self, weights_file);

eldar commented 7 years ago

Can you stop the debugger on this line:

Error in cnn_cache_features (line 52)
net = caffe.Net(net_def_file, net_bin_file, 'test');

and check if net_def_file points to existing model definition file (somewhere in /models) and net_bin_file points to correct caffe binary weights fiel (something.caffe)?

farshidfarhat commented 7 years ago

It seems fine! May it be related to copy a huge model file?

...

Cleared 0 solvers and 0 stand-alone nets 52 net = caffe.Net(net_def_file, net_bin_file, 'test');

K>> net_def_file net_def_file = /gpfs/work/f/fuf111/deepcut/models/ResNet-101-FCN_out_14_sigmoid_locreg_allpairs_test.prototxt

K>> net_bin_file net_bin_file = /gpfs/work/f/fuf111/deepcut/data/caffe-models/ResNet-101-mpii-multiperson.caffemodel

eldar commented 7 years ago

Sorry, it's quite difficult to say what's wrong without proper error log. The model definitely fits on a 12Gb GPU. Maybe the file was corrupted during download? Here's the hash for mine:

deepercut-models$ md5sum ResNet-101-mpii-multiperson.caffemodel
a1aa7fb45c4f1a0e90087d6ddac24cf1  ResNet-101-mpii-multiperson.caffemodel

eldar / deepcut

Runtime error #5