Open thommiano opened 7 years ago
@thommiano : Can you try expanding ~/caffe_lrcn/python
into the actual path (/home/.../caffe_lrcn/python)?
And have you tried using this caffe_lrcn and running example 1?
@anguyen8 I tried using both ~/caffe_lrcn/python
and the actual path /root/caffe_lrcn/python
(and I did the same for my other caffe directory as well). In all cases, I could get examples 1 through 4 to work, but I couldn't get example 5 to work.
@thommiano : Thanks for letting me know! How about this version of caffe_lrcn? http://www.cs.uwyo.edu/~anguyen8/share/caffe_lrcn_joel.tar.gz
I had someone try and report that this works.
@anguyen8 The one you linked to works with examples 1 through 4, but not 5. This makes me think that there could be an issue with my version of of CUDA or CuDNN, or that there could be a conflict with some other file path in another file. This other issue appears to be closely related.
@thommiano : yes, could be one of those reasons. I have never tried CUDA 8 with Caffe and I don't seem to be able to reproduce your problem. My setup is cuda 7.5/cudnn 4.0 here.
Perhaps trying in CPU mode?
CPU mode on caffe and caffe_lrcn works for example 1 through 4, but not 5.
I believe I found the source of the error. The caffe cmake file has set(Caffe_known_gpu_archs "20 21(20) 30 35 50")
, which does not include architectures 60 and 61, and on a GTX1070 with CUDA 8 I'd need 61.
I tried rebuilding the caffe_lrcn with the two new architectures to make it available for CUDA 8, but ran into the following error:
brain@machine ~/p/d/b/c/build> cmake ..
-- The C compiler identification is GNU 5.4.0
-- The CXX compiler identification is GNU 5.4.0
-- Check for working C compiler: /usr/bin/cc
-- Check for working C compiler: /usr/bin/cc -- works
-- Detecting C compiler ABI info
-- Detecting C compiler ABI info - done
-- Detecting C compile features
-- Detecting C compile features - done
-- Check for working CXX compiler: /usr/bin/c++
-- Check for working CXX compiler: /usr/bin/c++ -- works
-- Detecting CXX compiler ABI info
-- Detecting CXX compiler ABI info - done
-- Detecting CXX compile features
-- Detecting CXX compile features - done
CMake Warning at /usr/share/cmake-3.5/Modules/FindBoost.cmake:725 (message):
Imported targets not available for Boost version
Call Stack (most recent call first):
/usr/share/cmake-3.5/Modules/FindBoost.cmake:763 (_Boost_COMPONENT_DEPENDENCIES)
/usr/share/cmake-3.5/Modules/FindBoost.cmake:1332 (_Boost_MISSING_DEPENDENCIES)
cmake/Dependencies.cmake:5 (find_package)
CMakeLists.txt:28 (include)
CMake Warning at /usr/share/cmake-3.5/Modules/FindBoost.cmake:725 (message):
Imported targets not available for Boost version
Call Stack (most recent call first):
/usr/share/cmake-3.5/Modules/FindBoost.cmake:763 (_Boost_COMPONENT_DEPENDENCIES)
/usr/share/cmake-3.5/Modules/FindBoost.cmake:1332 (_Boost_MISSING_DEPENDENCIES)
cmake/Dependencies.cmake:5 (find_package)
CMakeLists.txt:28 (include)
-- Looking for pthread.h
-- Looking for pthread.h - found
-- Looking for pthread_create
-- Looking for pthread_create - not found
-- Looking for pthread_create in pthreads
-- Looking for pthread_create in pthreads - not found
-- Looking for pthread_create in pthread
-- Looking for pthread_create in pthread - found
-- Found Threads: TRUE
CMake Error at /usr/share/cmake-3.5/Modules/FindBoost.cmake:1677 (message):
Unable to find the requested Boost libraries.
Unable to find the Boost header files. Please set BOOST_ROOT to the root
directory containing Boost or BOOST_INCLUDEDIR to the directory containing
Boost's headers.
Call Stack (most recent call first):
cmake/Dependencies.cmake:5 (find_package)
CMakeLists.txt:28 (include)
-- Could NOT find GFlags (missing: GFLAGS_INCLUDE_DIR GFLAGS_LIBRARY)
-- Could NOT find Glog (missing: GLOG_INCLUDE_DIR GLOG_LIBRARY)
CMake Error at /usr/share/cmake-3.5/Modules/FindPackageHandleStandardArgs.cmake:148 (message):
Could NOT find Protobuf (missing: PROTOBUF_LIBRARY PROTOBUF_INCLUDE_DIR)
Call Stack (most recent call first):
/usr/share/cmake-3.5/Modules/FindPackageHandleStandardArgs.cmake:388 (_FPHSA_FAILURE_MESSAGE)
/usr/share/cmake-3.5/Modules/FindProtobuf.cmake:308 (FIND_PACKAGE_HANDLE_STANDARD_ARGS)
cmake/ProtoBuf.cmake:4 (find_package)
cmake/Dependencies.cmake:24 (include)
CMakeLists.txt:28 (include)
CMake Error: The following variables are used in this project, but they are set to NOTFOUND.
Please set them or make sure they are set and tested correctly in the CMake files:
Boost_INCLUDE_DIR (ADVANCED)
used as include directory in directory /home/brain/projects/docker_projects/builds/caffe_lrcn_c8
used as include directory in directory /home/brain/projects/docker_projects/builds/caffe_lrcn_c8
used as include directory in directory /home/brain/projects/docker_projects/builds/caffe_lrcn_c8
used as include directory in directory /home/brain/projects/docker_projects/builds/caffe_lrcn_c8
used as include directory in directory /home/brain/projects/docker_projects/builds/caffe_lrcn_c8
-- Configuring incomplete, errors occurred!
See also "/home/brain/projects/docker_projects/builds/caffe_lrcn_c8/build/CMakeFiles/CMakeOutput.log".
See also "/home/brain/projects/docker_projects/builds/caffe_lrcn_c8/build/CMakeFiles/CMakeError.log".
In the meantime, I've tried rebuilding an image with CUDA 7.0 and CUDA 7.5, but I'm getting incompatibilities issues, i.e., error == cudaSuccess (10 vs. 0)
and error == cudaSuccess (8 vs. 0)
after I've changed the settings.py
file to caffe.set_device(0)
to account for the first error. I think this error comes from an incompatibility with Caffe+CUDA <8 and my display driver 367.57.
I think this is still a Caffe issue, though, because I can use this display driver to run CUDA 7.0 with Torch.
@thommiano Btw, you could comment out this line entirely: caffe.set_device(0)
(not needed in this case)
@anguyen8 Ah, ok! I was wondering about that . . . thanks for letting me know.
@anguyen8 I forked your version of caffe_lrcn, updated the cmake file to include support for CUDA 8, and successfully (I think?) rebuilt caffe_lrcn: https://github.com/thommiano/caffe_lrcn
I just noticed that despite what I set the settings.py caffe file path to (even if it's not a real file path), examples 1-4 still run... This makes me think that somehow it's failing to see the actual caffe_lrcn directory that I'm pointing it to. Any thoughts? Do I need to update my environment if the path in settings.py has changed?
@thommiano : oh, ok good finding. It seems you've been accidentally using a different Caffe.
Inside the python code, can you try print out the path by
import sys
print(sys.path)
and see if there is any path to some Caffe version there? You should remove it to not conflict with this Caffe we're using.
I've been facing a similar issue. I compiled caffe_lrcn
and pointed to it in settings.py
, and confirmed that I'm using it by printing sys.path
, yet I can't get example 5 to work.
root@b9eb2299514a:~/ppgn# ./5_caption_conditional_sampling.sh a_church_steeple_that_has_a_clock_on_it
['/root/caffe_lrcn/python', '/root/ppgn', '/root/caffe_lrcn/python', '/usr/lib/python2.7', '/usr/lib/python2.7/plat-x86_64-linux-gnu', '/usr/lib/python2.7/lib-tk', '/usr/lib/python2.7/lib-old', '/usr/lib/python2.7/lib-dynload', '/usr/local/lib/python2.7/dist-packages', '/usr/lib/python2.7/dist-packages']
-------------
sentence: a_church_steeple_that_has_a_clock_on_it
n_iters: 200
reset_every: 0
save_every: 0
threshold: 0.0
epsilon1: 0.001
epsilon2: 1.0
epsilon3: 1e-17
start learning rate: 1.0
end learning rate: 1e-10
seed: 0
opt_layer: fc6
act_layer: fc8
init_file: None
-------------
output dir: output/fc8_eps1_1e-3_eps3_1e-17/a_church_steeple_that_has_a_clock_on_it
net weights: nets/lrcn/lrcn_caffenet_iter_110000.caffemodel
net definition: nets/caffenet/caffenet.prototxt
captioner definition: nets/lrcn/lrcn_word_to_preds.deploy.prototxt
-------------
WARNING: Logging before InitGoogleLogging() is written to STDERR
F0125 21:19:50.882262 59 embed_layer.cpp:91] Check failed: !propagate_down[0] Can't backpropagate to EmbedLayer input.
@agermanidis What version of CUDA and nvidia display driver are you using?
@thommiano I'm using CPU mode above. I also repeated the process on an EC2 GPU instance with CUDA 7.5, ran into the same issue.
@agermanidis When you ran with CUDA 7.5 on the EC2 instance did the other examples work? You said you compiled caffe_lrcn, did you manually compile it (i.e., using make or cmake), or did you download it from the download.sh file provided in the ppgn repo?
I manually compiled with make. I was under the impression that the download.sh
scripts are for downloading the pretrained models, not caffe.
@agermanidis Oops, yep you're right. I should have asked, did you download from the provided caffe_lrcn source. Did you compile from this source or from another source?
@thommiano Compiled from @anguyen8's repo
@agermanidis : just double-checking, does this repo work for you? (it's the exact copy on my local) http://www.cs.uwyo.edu/~anguyen8/share/caffe_lrcn_joel.tar.gz
@anguyen8: Still facing the same issue with your copy... I ended up making the script work by deleting the line with the check in embed_layer.cpp
and recompiling.
Ok something just crazy happened to me. I was having all the issues you have and struggeling to find an answer. I tried everything I found online and what you are saying here, nothing worked. Last hope, I downloaded the anh repo in order to build it again WHILE building caffe again with standard settings for gpu .... and now it works . Even when I recompiled exactly with the same settings that were giving me issues before.. So no big help here, maybe just download the repo, place it in the same folder as the other repo, and cross your fingers. Sorry for the non scientific looking of this answer but It just happened to me now so I share.
@kenkroft : Thanks for letting me know! At least a few people got it working (while some others didn't). I guess it has something to do with the compatibility of that old Caffe and different platforms.
@agermanidis Could you provide me with some morge info? I found the embed_layer.cpp file but don't know what to do with it. And when i changed it I just need to build caffe again with the make command?
@jannesss delete the line CHECK(!propagate_down[0]) << "Can't backpropagate to EmbedLayer input.” in embed_layer.cpp. It worked for me.
@anguyen8 I'm trying to run the fifth example, but I encounter the following error:
Which is followed by:
And then:
I cloned the the caffe_lrcn from the source you linked, and I updated the settings.py file accordingly:
caffe_root = "~/caffe_lrcn/python"
where I have my tree structured as the following:I don't have any problems running the other four examples. Also, I'm getting the
Failed to initialize libdc1394
error because I'm running this in a Docker container, and others have reported a similar problem, though it usually doesn't appear to cause any problems.Any thoughts? Thanks!