Closed hansbogert closed 10 years ago
The example works on my machine with both Intel and NVIDIA SDKs:
./sort
input: [ 83, 86, 77, 15, 93, 35, 86, 92, 49, 21, 62, 27, 90, 59, 63, 26, 40, 26, 72, 36, 11, 68, 67, 29, 82, 30, 62, 23, 67, 35, 29, 2, 22 ]
output: [ 2, 11, 15, 21, 22, 23, 26, 26, 27, 29, 29, 30, 35, 35, 36, 40, 49, 59, 62, 62, 63, 67, 67, 68, 72, 77, 82, 83, 86, 86, 90, 92, 93 ]
compute::sort
also works for me with much larger inputs (a couple of millions). It seems that your version of OpenCL has problems with kernels compilation. Can you successfully run other examples from Boost.Compute or even something as simple as https://gist.github.com/ddemidov/2925717?
Yes I can run other examples e.g. changing the input to 32 items does not fail, so the problem is somewhere in radix_sort. I can't try your example because 1) it is opencl 1.2 (correct me if I'm wrong) and I don't have nvidia opencl1.2 headers i.e. CL/cl.hpp. and 2) it does not compile on OSX.
My mistake, I needed cl.h pp
I can run it, but both my setups do not contain double precision capable hardware:
$ ./hello
GPUs with double precision not found.
It should be enough to delete lines 10-16 and replace double
with float
throughout the example. Sorry for inconvenience.
as expected, does not fail:
./hello
GeForce 9600 GT
3
9600 GT is compute capability 1.0, which does not suport atomic operations. And the kernels generated for the example do use atomics.
And the error message for hd4000 says device not available
. Are you able to run the example there?
Yes,
./hello.osx
HD Graphics 4000
3
I am able to reproduce this as well... The problem is with radix_sort
. The CL_DEVICE_NOT_AVAILABLE
error usually occurs when the program source for the kernel fails to compile.
Ya, like Denis said, the radix_sort()
uses atomics which are not supported by your hardware. The reason you see a difference when running different numbers of values is that internally sort()
will use and insertion sort algorithm (which doesn't require atomics) for tiny inputs (32 or less values) and the radix sort algorithm for anything larger.
Just to verify, my nvidia 9600gt as well as the intel hd4000 do not have support for atomics?
Can you compile and run the following code? It should list your devices with supported extensions. If the device supports atomics, it should have cl_khr_local_int32_base_atomics
, cl_khr_local_int32_extended_atomics
in the output.
#include <iostream>
#include <vector>
#include <string>
#define __CL_ENABLE_EXCEPTIONS
#include <CL/cl.hpp>
int main() {
try {
// Get list of OpenCL platforms.
std::vector<cl::Platform> platform;
cl::Platform::get(&platform);
if (platform.empty()) {
std::cerr << "OpenCL platforms not found." << std::endl;
return 1;
}
for(auto p = platform.begin(); p != platform.end(); p++) {
std::vector<cl::Device> pldev;
try {
p->getDevices(CL_DEVICE_TYPE_ALL, &pldev);
for(auto d = pldev.begin(); d != pldev.end(); d++) {
std::cout << d->getInfo<CL_DEVICE_NAME>() << ":\n\t"
<< d->getInfo<CL_DEVICE_EXTENSIONS>() << std::endl;
}
} catch(...) {
}
}
} catch (const cl::Error &err) {
std::cerr
<< "OpenCL error: "
<< err.what() << "(" << err.err() << ")"
<< std::endl;
return 1;
}
}
Seems it's supported
Intel(R) Core(TM) i5-3427U CPU @ 1.80GHz:
cl_APPLE_SetMemObjectDestructor cl_APPLE_ContextLoggingFunctions cl_APPLE_clut cl_APPLE_query_kernel_names cl_APPLE_gl_sharing cl_khr_gl_event cl_khr_fp64 cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_byte_addressable_store cl_khr_int64_base_atomics cl_khr_int64_extended_atomics cl_khr_3d_image_writes cl_khr_image2d_from_buffer cl_APPLE_fp64_basic_ops cl_APPLE_fixed_alpha_channel_orders cl_APPLE_biased_fixed_point_image_formats cl_APPLE_command_queue_priority
HD Graphics 4000:
cl_APPLE_SetMemObjectDestructor cl_APPLE_ContextLoggingFunctions cl_APPLE_clut cl_APPLE_query_kernel_names cl_APPLE_gl_sharing cl_khr_gl_event cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_byte_addressable_store cl_khr_image2d_from_buffer cl_khr_gl_depth_images cl_khr_depth_images
Those should work. Can you add #define BOOST_COMPUTE_DEBUG_KERNEL_COMPILATION
to the top of your file (before any <boost/compute/*>
includes. This will then print out information indicating why the kernel failed to build.
added the line to the sort_vector example:
//---------------------------------------------------------------------------//
// Copyright (c) 2013 Kyle Lutz <kyle.r.lutz@gmail.com>
//
// Distributed under the Boost Software License, Version 1.0
// See accompanying file LICENSE_1_0.txt or copy at
// http://www.boost.org/LICENSE_1_0.txt
//
// See http://kylelutz.github.com/compute for more information.
//---------------------------------------------------------------------------//
#include <algorithm>
#include <iostream>
#include <vector>
#define BOOST_COMPUTE_DEBUG_KERNEL_COMPILATION
#include <boost/compute/system.hpp>
#include <boost/compute/algorithm/copy.hpp>
#include <boost/compute/algorithm/sort.hpp>
#include <boost/compute/container/vector.hpp>
...
Output is the same as without the #define
./sort_vector.osx
input: [ 7, 49, 73, 58, 30, 72, 44, 78, 23, 9, 40, 65, 92, 42, 87, 3, 27, 29, 40, 12, 3, 69, 9, 57, 60, 33, 99, 78, 16, 35, 97, 26, 12 ]
terminate called after throwing an instance of 'boost::exception_detail::clone_impl<boost::exception_detail::error_info_injector<boost::compute::context_error> >'
what(): [CL_DEVICE_NOT_AVAILABLE] : OpenCL Error : Error: Build Program driver returned (10015)
[1] 19241 abort ./sort_vector.osx
That's strange. Can you add the following two lines to the top of main:
const boost::compute::device device = boost::compute::system::default_device();
std::cout << "device: " << device.name() << std::endl;
This should find the default OpenCL device (or any OpenCL device) and print its name to stdout. If that doesn't work then there is something wrong with the OpenCL implementation installed on your system. Ensure that the correct OpenCL library/framework is being linked.
printing the device works:
./sort_vector.osx
device: HD Graphics 4000
input: [ 7, 49, 73, 58, 30, 72, 44, 78, 23, 9, 40, 65, 92, 42, 87, 3, 27, 29, 40, 12, 3, 69, 9, 57, 60, 33, 99, 78, 16, 35, 97, 26, 12 ]
terminate called after throwing an instance of 'boost::exception_detail::clone_impl<boost::exception_detail::error_info_injector<boost::compute::context_error> >'
what(): [CL_DEVICE_NOT_AVAILABLE] : OpenCL Error : Error: Build Program driver returned (10015)
[1] 42740 abort ./sort_vector.osx
This is very strange. I found this post on the Intel forums (https://software.intel.com/en-us/forums/topic/505677) but with no resolution (and the error message from the OpenCL implementation isn't very helpful).
Commented on the intel forum to ask the topic starter if there's any followup
@hansbogert I'm no longer able to reproduce this(tried with 33 and 50 items) both on my Intel HD 4000 as well as Geforce 650M. Perhaps an OS update fixed it(running Yosemite DP 8 now). Check if it is fixed for you as well.
Indeed solved, bisected it and came down to commit: a78212fdde9254c18cabacd4388ca7106dbbcdbd
Rename K to K_BITS in radix_sort()
This should fix the following error seen on the Apple OpenCL
implementation when compiling the radix_sort program: "error:
definition of macro 'K' conflicts with an identifier used in
the precompiled header".
So solved long ago.
using https://github.com/kylelutz/compute/blob/master/example/sort_vector.cpp we can make it fail by using a host_vector initialized with more than 32 items. Tested on mac osx with a hd4000 and on linux using a nvidia card.
On mac osx:
On linux/nvidia:
Easy c&p reproduce: https://gist.github.com/hansbogert/10975461