dividiti / ck-caffe

Collective Knowledge workflow for Caffe to automate installation across diverse platforms and to collaboratively evaluate and optimize Caffe-based workloads across diverse hardware, software and data sets (compilers, libraries, tools, models, inputs):
http://cKnowledge.org
BSD 3-Clause "New" or "Revised" License
193 stars 40 forks source link

CUDA/cuDNN FP16 packages fail with linking errors on Tegra TX1 #57

Closed psyhtest closed 7 years ago

psyhtest commented 7 years ago

Installing CUDA/cuDNN FP16 packages on Tegra TX1 fails with linking errors against ProtoBuf and GFlags.

$ ck list package:*fp16*
lib-caffe-nvidia-fp16-cuda
lib-caffe-nvidia-fp16-cudnn
gfursin commented 7 years ago

I believe I fixed that ... Can you please try again with a clean environment ...

psyhtest commented 7 years ago

Now package:lib-caffe-nvidia-fp16-cuda works but package:lib-caffe-nvidia-fp16-cudnn still fails.

collect2: error: ld returned 1 exit status
Makefile:646: recipe for target '/home/anton/CK_TOOLS/lib-caffe-nvidia-cudnn-fp16-gcc-5.4.0-lib.cudnn-api-5.1.5-lib.openblas-0.2.19-85636ff-linux-64/src/.build_release/tools/compute_image_mean.bin' failed
make: *** [/home/anton/CK_TOOLS/lib-caffe-nvidia-cudnn-fp16-gcc-5.4.0-lib.cudnn-api-5.1.5-lib.openblas-0.2.19-85636ff-linux-64/src/.build_release/tools/compute_image_mean.bin] Error 1
Error: Building Caffe in '/home/anton/CK_TOOLS/lib-caffe-nvidia-cudnn-fp16-gcc-5.4.0-lib.cudnn-api-5.1.5-lib.openblas-0.2.19-85636ff-linux-64/src' failed!
CK error: [package] package installation failed!

I also tried removing protobuf-host v3.1.0 (which worked for for the CUDA package), and select it again during installation:

ck show env --tags=protobuf-host
Env UID:         Target OS: Bits: Name:                 Version: Tags:

780d134f56b5dcc5   linux-64    64 ProtoBuf host library 3.1.0    64bits,host-os-linux-64,lib,protobuf-host,target-os-linux-64,v3,v3.1,v3.1.0

The same was with protobuf-host v3.0.0 ...

gfursin commented 7 years ago

It's not related to protobuf but to gflags - looks like this branch is a bit outdated and has some conflicts with new glog/gflags ... I am trying to check if it will work with native one ...

gfursin commented 7 years ago

I think I fixed that - I just removed key to force using gflags package, so the one that is installed in the system can be detected and used with this package. If you already installed gflags, you can just re-detect native version just for this package (seems like other packages are working fine, so I think it's the problem with this branch which doesn't support newest versions of gflags), i.e.: $ ck detect soft:lib.gflags and then use the one that are native on Tegra ... Do you mind to check that please? Thanks!

gfursin commented 7 years ago

Hi Anton, just curious if it works now?

psyhtest commented 7 years ago

Sadly, no. CK detected gflags via ck detect soft:lib.gflags, which I then specified during the installation of package:lib-caffe-nvidia-fp16-cudnn. The problem again seems to be with linking protobuf:

o: undefined reference to `google::protobuf::internal::RepeatedPtrFieldBase::InternalExtend(int)'
/home/anton/CK_TOOLS/lib-caffe-nvidia-cudnn-fp16-gcc-5.4.0-lib.cudnn-api-5.1.5-lib.openblas-0.2.19-85636ff-linux-64/src/.build_release/lib/libcaffe.s
o: undefined reference to `google::protobuf::internal::ArenaStringPtr::AssignWithDefault(std::__cxx11::basic_string<char, std::char_traits<char>, std
::allocator<char> > const*, google::protobuf::internal::ArenaStringPtr)'
/home/anton/CK_TOOLS/lib-caffe-nvidia-cudnn-fp16-gcc-5.4.0-lib.cudnn-api-5.1.5-lib.openblas-0.2.19-85636ff-linux-64/src/.build_release/lib/libcaffe.s
o: undefined reference to `google::protobuf::io::CodedInputStream::DecrementRecursionDepthAndPopLimit(int)'
/home/anton/CK_TOOLS/lib-caffe-nvidia-cudnn-fp16-gcc-5.4.0-lib.cudnn-api-5.1.5-lib.openblas-0.2.19-85636ff-linux-64/src/.build_release/lib/libcaffe.s
o: undefined reference to `google::protobuf::Arena::AddListNode(void*, void (*)(void*))'
/home/anton/CK_TOOLS/lib-caffe-nvidia-cudnn-fp16-gcc-5.4.0-lib.cudnn-api-5.1.5-lib.openblas-0.2.19-85636ff-linux-64/src/.build_release/lib/libcaffe.s
o: undefined reference to `google::protobuf::io::CodedInputStream::ReadLengthAndPushLimit()'
/home/anton/CK_TOOLS/lib-caffe-nvidia-cudnn-fp16-gcc-5.4.0-lib.cudnn-api-5.1.5-lib.openblas-0.2.19-85636ff-linux-64/src/.build_release/lib/libcaffe.s
o: undefined reference to `google::protobuf::io::CodedInputStream::ReadTagFallback(unsigned int)'
collect2: error: ld returned 1 exit status
gfursin commented 7 years ago

But did you make a clean installation? Just try to remove at least all protobuf: $ ck rm env: --tags=protobuf $ ck rm env: --tags=protobuf-host And then try again - I made a support to pick up native version of protobuf rather than installing it - I think this is the problem ...

psyhtest commented 7 years ago

But you said it was a problem with gflags? So I only removed that... Let me try again.

gfursin commented 7 years ago

It looks like a different error - I did a clean install (removed all env and CK_TOOLS) and it seemed to work ... BTW, this problem is not related to CK but to the mess with gflags/protobuf in Caffe - we should normally fix versions from all software to avoid such problems ...

psyhtest commented 7 years ago

Success at last!

First, I had to clear all the environments as you suggested:

$ ck rm env:* --tags=gflags
$ ck rm env:* --tags=protobuf
$ ck rm env:* --tags=protobuf-host

then, detect the libs installed on the platform:

$ ck detect soft:lib.gflags
$ ck detect soft:lib.protobuf.host

(lib.protobuf.host was critical here!) and, finally, install:

$ ck install package:lib-caffe-nvidia-fp16-cudnn