dividiti / ck-caffe

Collective Knowledge workflow for Caffe to automate installation across diverse platforms and to collaboratively evaluate and optimize Caffe-based workloads across diverse hardware, software and data sets (compilers, libraries, tools, models, inputs):
http://cKnowledge.org
BSD 3-Clause "New" or "Revised" License
193 stars 40 forks source link

classification segmentation fault when calling Caffe::SetDevices #99

Closed kindloaf closed 7 years ago

kindloaf commented 7 years ago

I'm testing Caffe with OpenCL on an android device. When I run the program classification, there was a segmentation fault. Here is how I compiled and ran the program:

ck compile program:caffe-classification-opencl --target_os=android21-arm64
ck run program:caffe-classification-opencl --target_os=android21-arm64

Here is the information of the segmentation fault:

Stack frame #03 pc 00000000007ecb50  /data/local/tmp/libOpenCL.so (clCreateProgramWithBinary+268)
Stack frame #04 pc 00000000004a4c30  /data/local/tmp/tmp/libcaffe.so (viennacl::ocl::context::add_program(std::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::basic_string<char, std::char_traits<char>, std::allocator<char> > const&)+5040)
Stack frame #05 pc 000000000049dd2c  /data/local/tmp/tmp/libcaffe.so (caffe::RegisterKernels(viennacl::ocl::context*)+2032)
Stack frame #06 pc 000000000049c130  /data/local/tmp/tmp/libcaffe.so (caffe::device::SetProgram()+28)
Stack frame #07 pc 000000000049c290  /data/local/tmp/tmp/libcaffe.so (caffe::device::Init()+256)
Stack frame #08 pc 000000000048d240  /data/local/tmp/tmp/libcaffe.so (caffe::Caffe::SetDevices(std::vector<int, std::allocator<int> >)+3264)
Stack frame #09 pc 000000000001c81c  /data/local/tmp/tmp/classification (Classifier::Classifier(std::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::basic_string<char, std::char_traits<char>, std::allocator<char> > const&)+404)
Stack frame #10 pc 000000000001ec4c  /data/local/tmp/tmp/classification (main+728)

Any advice?

DVEfremov commented 7 years ago

Have you tried to run it under valgrind ?

gfursin commented 7 years ago

By the way, what is the device?

gfursin commented 7 years ago

And Android NDK version?

DVEfremov commented 7 years ago

I use valgrind compiled for my device (ARV-v7a) as described here http://valgrind.org/docs/manual/dist.readme-android.html

gfursin commented 7 years ago

I will try to rebuild/run clean ck-caffe version soon on my Samsung S7 ...

kindloaf commented 7 years ago

@DVEfremov I have not tried valgrind. I will update this thread after I use valgrind.

kindloaf commented 7 years ago

@gfursin The NDK version is r14b. By the way, the libOpenCL.so is provided by the device. By calling clGetPlatformInfo function with CL_PLATFORM_VERSION, it seems that the .so file was compiled with OpenCL 1.2. From the log of ck-caffe, I assume the version is consistent:

Found OpenCL include: /home/.../CK-TOOLS/lib-opencl-stubs-1.2-android-ndk-4.9.x-android21-arm64/include
kindloaf commented 7 years ago

@gfursin By the way, in the thread of https://github.com/sh1r0/caffe-android-lib/issues/23 For the command line to compile / run the program, did you mean program:caffe-classification-opencl instead of program:caffe-classification?

kindloaf commented 7 years ago

I just found the issue: there was a file /data/local/tmp/viennacl_cache_0f45121d68e15d6052d1a913db3647b1fc0fc609 generated after running classification. When I removed the file, the segmentation fault is gone. If the file is there, the program would crash.

Not sure what's the real culprit, but it solved my problem now.

Thanks for quick reply - I will close the issue.

gfursin commented 7 years ago

This file is related to OpenCL kernel caching via ViannaCL. After deleting this file and running classification again, do you see a newly generated viennaclcache{some hash} file? If you run classification several times now, does it still work? I am asking because it looks like kernels have changed but were not recompiled on your system. I put @psyhtest in the CC since he was trying to improve ViennaCL caching mechanism ...

gfursin commented 7 years ago

Also, we may need to add an option in the CK to clear such cache files (or maybe remove them automatically during ck-caffe reinstallation for Android) - I need to think about that ...

kindloaf commented 7 years ago

@gfursin and @psyhtest What I did was the following: (1) Used ck run program:caffe-classification-opencl --target_os=android21-arm64 to run the test program. For the very first time after compilation, the run was successful. (2) When I invoked ck run again, or used adb shell ... to launch the test program, it segmented faulted. (3) Then I removed cache file, and used adb shell... to launch the test program, it succeeded. A new cache file was generated. (4) I repeated (3) a few times. My observation is that if I didn't remove the cache file before launching the test program, it would always segmentation faulted.

gfursin commented 7 years ago

Thanks a lot for reporting - that's quite strange though. I will test it tonight on my S7. By the way, if you by chance will have some time, you may try to run this app: https://play.google.com/store/apps/details?id=openscience.crowdsource.video.experiments There, you can select Caffe OpenCL and try to run it several times for example using GoogleNet - I am curious if you will have the same issue? (Note that this app will send to cknowledge.org/repo anonymous info about your platform and classification time). The thing is that scenarios for this app are prepared in exactly the same way as you did above but using my desktop machine. However, they were prepared a few months ago, so if they work, it's likely that some latest changes in Caffe/ViennaCL/CLBlast cause a problem ... Thanks again!

kindloaf commented 7 years ago

I will give it a try.