dividiti / ck-caffe

Collective Knowledge workflow for Caffe to automate installation across diverse platforms and to collaboratively evaluate and optimize Caffe-based workloads across diverse hardware, software and data sets (compilers, libraries, tools, models, inputs):
http://cKnowledge.org
BSD 3-Clause "New" or "Revised" License
193 stars 40 forks source link

"ck compile program:caffe-time-cuda" fails #98

Open psyhtest opened 7 years ago

psyhtest commented 7 years ago
nvcc -ccbin /usr/bin/g++-5 -c    -I../ -DANDROID_USE_OPENMP=ON -DBLAS=Open -DCK_HOST_OS_NAME2_LINUX=1 -DCK_HOST_OS_NAME_LINUX=1 -DCK_TARGET_OS_NAME2_LINUX=1 -DCK_TARGET_OS_NAME_LINUX=1 -DUSE_LMDB=OFF -DUSE_OPENCV=ON    -I/home/anton/CK_TOOLS/lib-caffe-bvlc-master-cudnn-trunk-gcc-5.4.0-linux-64/install/include -I/home/anton/CK_TOOLS/lib-gflags-2.2.0-gcc-5.4.0-linux-64/install/include -I/home/anton/CK_TOOLS/lib-glog-development-gcc-5.4.0-linux-64/install/include -I/home/anton/CK_TOOLS/lib-openblas-0.2.19-gcc-5.4.0-linux-64/install/include -I/home/anton/CK_TOOLS/lib-protobuf-host-3.1.0-linux-64/install/include -I/usr/include -I/usr/include -I/usr/include/hdf5/serial -I/usr/include -I/home/anton/CK_TOOLS/lib-caffe-bvlc-master-cudnn-trunk-gcc-5.4.0-linux-64/install/.build_release/src  ../caffe.cpp  -o caffe.o
nvcc warning : The 'compute_20', 'sm_20', and 'sm_21' architectures are deprecated, and may be removed in a future release (Use -Wno-deprecated-gpu-targets to suppress warning).
../caffe.cpp: In function ‘int train(std::__cxx11::string, std::__cxx11::string, std::__cxx11::string, std::__cxx11::string, int, std::__cxx11::string, std::__cxx11::string, std::__cxx11::string)’:
../caffe.cpp:171:5: error: ‘P2PSync’ is not a member of ‘caffe’
     caffe::P2PSync<float> sync(solver, NULL, solver->param());
     ^
../caffe.cpp:171:20: error: expected primary-expression before ‘float’
     caffe::P2PSync<float> sync(solver, NULL, solver->param());
                    ^
../caffe.cpp:172:10: error: request for member ‘Run’ in ‘sync’, which is of non-class type ‘void() throw ()’
     sync.Run(gpus);
          ^

This is with:

S$ ck show env --tags=lib,caffe,vcuda
Env UID:         Target OS: Bits: Name:                        Version:      Tags:

17a825aef037ab15   linux-64    64 BVLC Caffe framework (cudnn) trunk-8007349 64bits,bvlc,caffe,host-os-linux-64,lib,target-os-linux-64,v0,v0.8007349,vcuda,vcudnn,vmaster
psyhtest commented 7 years ago

caffe::P2PSync has been removed...

psyhtest commented 7 years ago

This is not terribly important as we can always use the time_gpu command of program:caffe...

DVEfremov commented 7 years ago

Yes looks like caffe::P2PSync has been remove because it not stable at least it was not stable for opencl version as it was metiononed at some BVLC/caffe/issues

gfursin commented 7 years ago

Actually, it is important since I use it for crowd-benchmarking, but it seems like the latest versions work fine?

uriv commented 7 years ago

Hi, trying to benchmark a GPU workstation with package:lib-caffe-bvlc-master-cuda-universal using explore-batch-size-libs-models-benchmarking.py. As is, it proposes only CPU packages. I changed program='caffe-time' to program='caffe-time-cuda', but now it fails with the error above.

Is there a workaround to get the benchmark running?

Edit: it might be a problem with lib-caffe: the tags it reads are u'lib,caffe,vcpu'

Thx

psyhtest commented 7 years ago

Hi @uriv, I've committed a new script just for benchmarking Caffe with CUDA and cuDNN. The key difference is this:

<     program='caffe-time'
<     cmd_key='default'
---
>     program='caffe'
>     cmd_key='time_gpu'

Please give it a try!

psyhtest commented 7 years ago

I've checked tags for package:lib-caffe-bvlc-master-cuda-universal - they seem to be fine. Did you mean a different package? (I do recall fixing the vcpu tag in a non-CPU package lately in... CK-TensorFlow:-))

uriv commented 7 years ago

Thanks, @psyhtest. I'll give the patch a try when I get the chance. I've installed package:lib-caffe-bvlc-master-cuda-universal. If there is a command that prints more info, I can reply with the output.

psyhtest commented 7 years ago

@uriv, you don't need to patch anything. Just use the new explore-batch-size-libs-models-benchmarking-cuda.py script.

Please post the output of:

$ ck show env --tags=lib,caffe
uriv commented 7 years ago

Hi @psyhtest,

The benchmark worked with program='caffe', cmd_key='time_gpu', after fixing an non-ASCII character '\xc2' error by appending # -*- coding: utf-8 -*- to ck-analytics/module/math.variation/module.py (ref).

Here is the output of ck show env --tags=lib,caffe:

Env UID:         Target OS: Bits: Name:                       Version:        Tags:

21aeea91f24ce4a9   linux-64    64 BVLC Caffe framework (cuda) master-4efdf7ee 64bits,bvlc,caffe,host-os-linux-64,lib,target-os-linux-64,v0,v0.0,vcuda,vmaster,vno-cudnn

vno-cudnn probably implies that it runs without cudnn, though cudnn was detected during caffe installation. Any suggestions on how to troubleshoot this?

Thanks

psyhtest commented 7 years ago

@uriv The vno-cudnn tag means that package:lib-caffe-bvlc-master-cuda-universal doesn't use cuDNN. In contrast, package:lib-caffe-bvlc-master-cudnn-universal is tagged with 'vcudnn'. Basically, this is to distinguish the two packages when needed, as both are tagged with vcuda.

We used to tag the former with vcuda only and the latter with vcudnn only. But as this particular program:caffe-time-cuda can work with both, we added the vcuda tag also to the latter. So then we needed to add the vno-cudnn and vcudnn tags just in case. Hope this answers your question.

psyhtest commented 7 years ago

@gfursin Could you please check @uriv's comment about fixing an non-ASCII character?

uriv commented 7 years ago

@uriv The vno-cudnn tag means that package:lib-caffe-bvlc-master-cuda-universal doesn't use cuDNN. In contrast, package:lib-caffe-bvlc-master-cudnn-universal is tagged with 'vcudnn'. Basically, this is to distinguish the two packages when needed, as both are tagged with vcuda.

@psyhtest Thanks for the clarification no this.

gfursin commented 7 years ago

Yes, I fixed non-ASCII character - I removed it at all and added ASCII... Thanks for noting ...