Error with Ex. 5 Image Captioning

thommiano commented 7 years ago

@anguyen8 I'm trying to run the fifth example, but I encounter the following error:

root@8c4e9b11f13b:~/ppgn# ./5_caption_conditional_sampling.sh a_church_steeple_that_has_a_clock_on_it
libdc1394 error: Failed to initialize libdc1394
-------------
 sentence: a_church_steeple_that_has_a_clock_on_it
 n_iters: 200
 reset_every: 0
 save_every: 0
 threshold: 0.0
 epsilon1: 0.001
 epsilon2: 1.0
 epsilon3: 1e-17
 start learning rate: 1.0
 end learning rate: 1e-10
 seed: 0
 opt_layer: fc6
 act_layer: fc8
 init_file: None
-------------
 output dir: output/fc8_eps1_1e-3_eps3_1e-17/a_church_steeple_that_has_a_clock_on_it
 net weights: nets/lrcn/lrcn_caffenet_iter_110000.caffemodel
 net definition: nets/caffenet/caffenet.prototxt
 captioner definition: nets/lrcn/lrcn_word_to_preds.deploy.prototxt
-------------
WARNING: Logging before InitGoogleLogging() is written to STDERR
F0123 16:11:34.156642  3730 embed_layer.cu:61] Check failed: !propagate_down[0] Can't backpropagate to EmbedLayer input.
*** Check failure stack trace: ***
./5_caption_conditional_sampling.sh: line 52:  3730 Aborted                 (core dumped) python ./sampling_caption.py --act_layer ${act_layer} --opt_layer ${opt_layer} --sentence ${sentence} --xy ${xy} --n_iters ${n_iters} --save_every ${save_every} --reset_every ${reset_every} --lr ${lr} --lr_end ${lr_end} --seed ${seed} --output_dir ${output_dir} --init_file ${init_file} --epsilon1 ${epsilon1} --epsilon2 ${epsilon2} --epsilon3 ${epsilon3} --threshold ${threshold} --net_weights ${net_weights} --net_definition ${net_definition} --captioner_definition ${captioner_definition}
libdc1394 error: Failed to initialize libdc1394

Which is followed by:

WARNING: Logging before InitGoogleLogging() is written to STDERR
F0123 16:11:38.973918  3734 embed_layer.cu:61] Check failed: !propagate_down[0] Can't backpropagate to EmbedLayer input.
*** Check failure stack trace: ***
./5_caption_conditional_sampling.sh: line 52:  3734 Aborted                 (core dumped) python ./sampling_caption.py --act_layer ${act_layer} --opt_layer ${opt_layer} --sentence ${sentence} --xy ${xy} --n_iters ${n_iters} --save_every ${save_every} --reset_every ${reset_every} --lr ${lr} --lr_end ${lr_end} --seed ${seed} --output_dir ${output_dir} --init_file ${init_file} --epsilon1 ${epsilon1} --epsilon2 ${epsilon2} --epsilon3 ${epsilon3} --threshold ${threshold} --net_weights ${net_weights} --net_definition ${net_definition} --captioner_definition ${captioner_definition}
libdc1394 error: Failed to initialize libdc1394

And then:

WARNING: Logging before InitGoogleLogging() is written to STDERR
F0123 16:11:43.888489  3738 embed_layer.cu:61] Check failed: !propagate_down[0] Can't backpropagate to EmbedLayer input.
*** Check failure stack trace: ***
./5_caption_conditional_sampling.sh: line 52:  3738 Aborted                 (core dumped) python ./sampling_caption.py --act_layer ${act_layer} --opt_layer ${opt_layer} --sentence ${sentence} --xy ${xy} --n_iters ${n_iters} --save_every ${save_every} --reset_every ${reset_every} --lr ${lr} --lr_end ${lr_end} --seed ${seed} --output_dir ${output_dir} --init_file ${init_file} --epsilon1 ${epsilon1} --epsilon2 ${epsilon2} --epsilon3 ${epsilon3} --threshold ${threshold} --net_weights ${net_weights} --net_definition ${net_definition} --captioner_definition ${captioner_definition}
montage.im6: unable to open image `output/fc8_eps1_1e-3_eps3_1e-17/a_church_steeple_that_has_a_clock_on_it/fc8_*.jpg': No such file or directory @ error/blob.c/OpenBlob/2641.
montage.im6: missing an image filename `output/fc8_eps1_1e-3_eps3_1e-17/a_church_steeple_that_has_a_clock_on_it/a_church_steeple_that_has_a_clock_on_it.jpg' @ error/montage.c/MontageImageCommand/1790.
convert.im6: unable to open image `output/fc8_eps1_1e-3_eps3_1e-17/a_church_steeple_that_has_a_clock_on_it/a_church_steeple_that_has_a_clock_on_it.jpg': No such file or directory @ error/blob.c/OpenBlob/2641.
convert.im6: no images defined `output/fc8_eps1_1e-3_eps3_1e-17/a_church_steeple_that_has_a_clock_on_it/a_church_steeple_that_has_a_clock_on_it.jpg' @ error/convert.c/ConvertImageCommand/3044.
convert.im6: unable to open image `output/fc8_eps1_1e-3_eps3_1e-17/a_church_steeple_that_has_a_clock_on_it/a_church_steeple_that_has_a_clock_on_it.jpg': No such file or directory @ error/blob.c/OpenBlob/2641.
convert.im6: no images defined `output/fc8_eps1_1e-3_eps3_1e-17/a_church_steeple_that_has_a_clock_on_it/a_church_steeple_that_has_a_clock_on_it.jpg' @ error/convert.c/ConvertImageCommand/3044.
/root/ppgn/output/fc8_eps1_1e-3_eps3_1e-17/a_church_steeple_that_has_a_clock_on_it/a_church_steeple_that_has_a_clock_on_it.jpg

I cloned the the caffe_lrcn from the source you linked, and I updated the settings.py file accordingly: caffe_root = "~/caffe_lrcn/python" where I have my tree structured as the following:

drwxr-xr-x 20 root root 4096 Jan 23 16:05 caffe
drwxr-xr-x 13 root root 4096 Jan 23 15:33 caffe_lrcn
drwxr-xr-x 13 root root 4096 Jan 23 16:17 ppgn

I don't have any problems running the other four examples. Also, I'm getting the Failed to initialize libdc1394 error because I'm running this in a Docker container, and others have reported a similar problem, though it usually doesn't appear to cause any problems.

Any thoughts? Thanks!

anguyen8 commented 7 years ago

@thommiano : Can you try expanding ~/caffe_lrcn/python into the actual path (/home/.../caffe_lrcn/python)? And have you tried using this caffe_lrcn and running example 1?

thommiano commented 7 years ago

@anguyen8 I tried using both ~/caffe_lrcn/python and the actual path /root/caffe_lrcn/python (and I did the same for my other caffe directory as well). In all cases, I could get examples 1 through 4 to work, but I couldn't get example 5 to work.

anguyen8 commented 7 years ago

@thommiano : Thanks for letting me know! How about this version of caffe_lrcn? http://www.cs.uwyo.edu/~anguyen8/share/caffe_lrcn_joel.tar.gz

I had someone try and report that this works.

thommiano commented 7 years ago

@anguyen8 The one you linked to works with examples 1 through 4, but not 5. This makes me think that there could be an issue with my version of of CUDA or CuDNN, or that there could be a conflict with some other file path in another file. This other issue appears to be closely related.

anguyen8 commented 7 years ago

@thommiano : yes, could be one of those reasons. I have never tried CUDA 8 with Caffe and I don't seem to be able to reproduce your problem. My setup is cuda 7.5/cudnn 4.0 here.

Perhaps trying in CPU mode?

thommiano commented 7 years ago

CPU mode on caffe and caffe_lrcn works for example 1 through 4, but not 5.

I believe I found the source of the error. The caffe cmake file has set(Caffe_known_gpu_archs "20 21(20) 30 35 50"), which does not include architectures 60 and 61, and on a GTX1070 with CUDA 8 I'd need 61.

I tried rebuilding the caffe_lrcn with the two new architectures to make it available for CUDA 8, but ran into the following error:

brain@machine ~/p/d/b/c/build> cmake ..
-- The C compiler identification is GNU 5.4.0
-- The CXX compiler identification is GNU 5.4.0
-- Check for working C compiler: /usr/bin/cc
-- Check for working C compiler: /usr/bin/cc -- works
-- Detecting C compiler ABI info
-- Detecting C compiler ABI info - done
-- Detecting C compile features
-- Detecting C compile features - done
-- Check for working CXX compiler: /usr/bin/c++
-- Check for working CXX compiler: /usr/bin/c++ -- works
-- Detecting CXX compiler ABI info
-- Detecting CXX compiler ABI info - done
-- Detecting CXX compile features
-- Detecting CXX compile features - done
CMake Warning at /usr/share/cmake-3.5/Modules/FindBoost.cmake:725 (message):
  Imported targets not available for Boost version
Call Stack (most recent call first):
  /usr/share/cmake-3.5/Modules/FindBoost.cmake:763 (_Boost_COMPONENT_DEPENDENCIES)
  /usr/share/cmake-3.5/Modules/FindBoost.cmake:1332 (_Boost_MISSING_DEPENDENCIES)
  cmake/Dependencies.cmake:5 (find_package)
  CMakeLists.txt:28 (include)

CMake Warning at /usr/share/cmake-3.5/Modules/FindBoost.cmake:725 (message):
  Imported targets not available for Boost version
Call Stack (most recent call first):
  /usr/share/cmake-3.5/Modules/FindBoost.cmake:763 (_Boost_COMPONENT_DEPENDENCIES)
  /usr/share/cmake-3.5/Modules/FindBoost.cmake:1332 (_Boost_MISSING_DEPENDENCIES)
  cmake/Dependencies.cmake:5 (find_package)
  CMakeLists.txt:28 (include)

-- Looking for pthread.h
-- Looking for pthread.h - found
-- Looking for pthread_create
-- Looking for pthread_create - not found
-- Looking for pthread_create in pthreads
-- Looking for pthread_create in pthreads - not found
-- Looking for pthread_create in pthread
-- Looking for pthread_create in pthread - found
-- Found Threads: TRUE  
CMake Error at /usr/share/cmake-3.5/Modules/FindBoost.cmake:1677 (message):
  Unable to find the requested Boost libraries.

  Unable to find the Boost header files.  Please set BOOST_ROOT to the root
  directory containing Boost or BOOST_INCLUDEDIR to the directory containing
  Boost's headers.
Call Stack (most recent call first):
  cmake/Dependencies.cmake:5 (find_package)
  CMakeLists.txt:28 (include)

-- Could NOT find GFlags (missing:  GFLAGS_INCLUDE_DIR GFLAGS_LIBRARY) 
-- Could NOT find Glog (missing:  GLOG_INCLUDE_DIR GLOG_LIBRARY) 
CMake Error at /usr/share/cmake-3.5/Modules/FindPackageHandleStandardArgs.cmake:148 (message):
  Could NOT find Protobuf (missing: PROTOBUF_LIBRARY PROTOBUF_INCLUDE_DIR)
Call Stack (most recent call first):
  /usr/share/cmake-3.5/Modules/FindPackageHandleStandardArgs.cmake:388 (_FPHSA_FAILURE_MESSAGE)
  /usr/share/cmake-3.5/Modules/FindProtobuf.cmake:308 (FIND_PACKAGE_HANDLE_STANDARD_ARGS)
  cmake/ProtoBuf.cmake:4 (find_package)
  cmake/Dependencies.cmake:24 (include)
  CMakeLists.txt:28 (include)

CMake Error: The following variables are used in this project, but they are set to NOTFOUND.
Please set them or make sure they are set and tested correctly in the CMake files:
Boost_INCLUDE_DIR (ADVANCED)
   used as include directory in directory /home/brain/projects/docker_projects/builds/caffe_lrcn_c8
   used as include directory in directory /home/brain/projects/docker_projects/builds/caffe_lrcn_c8
   used as include directory in directory /home/brain/projects/docker_projects/builds/caffe_lrcn_c8
   used as include directory in directory /home/brain/projects/docker_projects/builds/caffe_lrcn_c8
   used as include directory in directory /home/brain/projects/docker_projects/builds/caffe_lrcn_c8

-- Configuring incomplete, errors occurred!
See also "/home/brain/projects/docker_projects/builds/caffe_lrcn_c8/build/CMakeFiles/CMakeOutput.log".
See also "/home/brain/projects/docker_projects/builds/caffe_lrcn_c8/build/CMakeFiles/CMakeError.log".

In the meantime, I've tried rebuilding an image with CUDA 7.0 and CUDA 7.5, but I'm getting incompatibilities issues, i.e., error == cudaSuccess (10 vs. 0) and error == cudaSuccess (8 vs. 0) after I've changed the settings.py file to caffe.set_device(0) to account for the first error. I think this error comes from an incompatibility with Caffe+CUDA <8 and my display driver 367.57.

I think this is still a Caffe issue, though, because I can use this display driver to run CUDA 7.0 with Torch.

anguyen8 commented 7 years ago

@thommiano Btw, you could comment out this line entirely: caffe.set_device(0) (not needed in this case)

thommiano commented 7 years ago

@anguyen8 Ah, ok! I was wondering about that . . . thanks for letting me know.

thommiano commented 7 years ago

@anguyen8 I forked your version of caffe_lrcn, updated the cmake file to include support for CUDA 8, and successfully (I think?) rebuilt caffe_lrcn: https://github.com/thommiano/caffe_lrcn

I just noticed that despite what I set the settings.py caffe file path to (even if it's not a real file path), examples 1-4 still run... This makes me think that somehow it's failing to see the actual caffe_lrcn directory that I'm pointing it to. Any thoughts? Do I need to update my environment if the path in settings.py has changed?

anguyen8 commented 7 years ago

@thommiano : oh, ok good finding. It seems you've been accidentally using a different Caffe. Inside the python code, can you try print out the path by import sys print(sys.path)

and see if there is any path to some Caffe version there? You should remove it to not conflict with this Caffe we're using.

agermanidis commented 7 years ago

I've been facing a similar issue. I compiled caffe_lrcn and pointed to it in settings.py, and confirmed that I'm using it by printing sys.path, yet I can't get example 5 to work.

root@b9eb2299514a:~/ppgn# ./5_caption_conditional_sampling.sh a_church_steeple_that_has_a_clock_on_it
['/root/caffe_lrcn/python', '/root/ppgn', '/root/caffe_lrcn/python', '/usr/lib/python2.7', '/usr/lib/python2.7/plat-x86_64-linux-gnu', '/usr/lib/python2.7/lib-tk', '/usr/lib/python2.7/lib-old', '/usr/lib/python2.7/lib-dynload', '/usr/local/lib/python2.7/dist-packages', '/usr/lib/python2.7/dist-packages']
-------------
 sentence: a_church_steeple_that_has_a_clock_on_it
 n_iters: 200
 reset_every: 0
 save_every: 0
 threshold: 0.0
 epsilon1: 0.001
 epsilon2: 1.0
 epsilon3: 1e-17
 start learning rate: 1.0
 end learning rate: 1e-10
 seed: 0
 opt_layer: fc6
 act_layer: fc8
 init_file: None
-------------
 output dir: output/fc8_eps1_1e-3_eps3_1e-17/a_church_steeple_that_has_a_clock_on_it
 net weights: nets/lrcn/lrcn_caffenet_iter_110000.caffemodel
 net definition: nets/caffenet/caffenet.prototxt
 captioner definition: nets/lrcn/lrcn_word_to_preds.deploy.prototxt
-------------
WARNING: Logging before InitGoogleLogging() is written to STDERR
F0125 21:19:50.882262    59 embed_layer.cpp:91] Check failed: !propagate_down[0] Can't backpropagate to EmbedLayer input.

thommiano commented 7 years ago

@agermanidis What version of CUDA and nvidia display driver are you using?

agermanidis commented 7 years ago

@thommiano I'm using CPU mode above. I also repeated the process on an EC2 GPU instance with CUDA 7.5, ran into the same issue.

thommiano commented 7 years ago

@agermanidis When you ran with CUDA 7.5 on the EC2 instance did the other examples work? You said you compiled caffe_lrcn, did you manually compile it (i.e., using make or cmake), or did you download it from the download.sh file provided in the ppgn repo?

agermanidis commented 7 years ago

I manually compiled with make. I was under the impression that the download.sh scripts are for downloading the pretrained models, not caffe.

thommiano commented 7 years ago

@agermanidis Oops, yep you're right. I should have asked, did you download from the provided caffe_lrcn source. Did you compile from this source or from another source?

agermanidis commented 7 years ago

@thommiano Compiled from @anguyen8's repo

anguyen8 commented 7 years ago

@agermanidis : just double-checking, does this repo work for you? (it's the exact copy on my local) http://www.cs.uwyo.edu/~anguyen8/share/caffe_lrcn_joel.tar.gz

agermanidis commented 7 years ago

@anguyen8: Still facing the same issue with your copy... I ended up making the script work by deleting the line with the check in embed_layer.cpp and recompiling.

kenkroft commented 7 years ago

Ok something just crazy happened to me. I was having all the issues you have and struggeling to find an answer. I tried everything I found online and what you are saying here, nothing worked. Last hope, I downloaded the anh repo in order to build it again WHILE building caffe again with standard settings for gpu .... and now it works . Even when I recompiled exactly with the same settings that were giving me issues before.. So no big help here, maybe just download the repo, place it in the same folder as the other repo, and cross your fingers. Sorry for the non scientific looking of this answer but It just happened to me now so I share.

anguyen8 commented 7 years ago

@kenkroft : Thanks for letting me know! At least a few people got it working (while some others didn't). I guess it has something to do with the compatibility of that old Caffe and different platforms.

jannesss commented 7 years ago

@agermanidis Could you provide me with some morge info? I found the embed_layer.cpp file but don't know what to do with it. And when i changed it I just need to build caffe again with the make command?

SelinaChe commented 7 years ago

@jannesss delete the line CHECK(!propagate_down[0]) << "Can't backpropagate to EmbedLayer input.” in embed_layer.cpp. It worked for me.

Evolving-AI-Lab / ppgn

Error with Ex. 5 Image Captioning #6