Closed gicmo closed 11 years ago
Could you post some system information, possibly the output from clinfo?
The device is a retina MacBook Pro (Mid 2012) running OS X 10.9 (see below). It has two GPUs, the builtin Intel HD Graphics 4000 and the dedicated NVIDIA GeForce GT 650M. There didn't seem to be an clinfo program installed so I took the source of the debian package and fixed the includes (and some other minor things) so it would compile on mac. Output is below. I am not sure what other information would be helpful. Don't hesitate to ask if you can think if anything.
% sw_vers
ProductName: Mac OS X
ProductVersion: 10.9
BuildVersion: 13A598
% ./clinfo
Number of platforms: 1
Platform Profile: FULL_PROFILE
Platform Version: OpenCL 1.2 (Aug 24 2013 21:03:27)
Platform Name: Apple
Platform Vendor: Apple
Platform Extensions: cl_APPLE_SetMemObjectDestructor cl_APPLE_ContextLoggingFunctions cl_APPLE_clut cl_APPLE_query_kernel_names cl_APPLE_gl_sharing cl_khr_gl_event
Platform Name: Apple
Number of devices: 3
Device Type: CL_DEVICE_TYPE_CPU
Device ID: 4294967295
Max compute units: 8
Max work items dimensions: 3
Max work items[0]: 1024
Max work items[1]: 1
Max work items[2]: 1
Max work group size: 1024
Preferred vector width char: 16
Preferred vector width short: 8
Preferred vector width int: 4
Preferred vector width long: 2
Preferred vector width float: 4
Preferred vector width double: 2
Native vector width char: 16
Native vector width short: 8
Native vector width int: 4
Native vector width long: 2
Native vector width float: 4
Native vector width double: 2
Max clock frequency: 2700Mhz
Address bits: 64
Max memory allocation: 4294967296
Image support: Yes
Max number of images read arguments: 128
Max number of images write arguments: 8
Max image 2D width: 8192
Max image 2D height: 8192
Max image 3D width: 2048
Max image 3D height: 2048
Max image 3D depth: 2048
Max samplers within kernel: 16
Max size of kernel argument: 4096
Alignment (bits) of base address: 1024
Minimum alignment (bytes) for any datatype: 128
Single precision floating point capability
Denorms: Yes
Quiet NaNs: Yes
Round to nearest even: Yes
Round to zero: Yes
Round to +ve and infinity: Yes
IEEE754-2008 fused multiply-add: Yes
Cache type: Read/Write
Cache line size: 8388608
Cache size: 64
Global memory size: 17179869184
Constant buffer size: 65536
Max number of constant args: 8
Local memory type: Global
Local memory size: 32768
Error correction support: 0
Unified memory for Host and Device: 1
Profiling timer resolution: 1
Device endianess: Little
Available: Yes
Compiler available: Yes
Execution capabilities:
Execute OpenCL kernels: Yes
Execute native function: Yes
Queue properties:
Out-of-Order: No
Profiling : Yes
Platform ID: 0x7fff0000
Name: Intel(R) Core(TM) i7-3820QM CPU @ 2.70GHz
Vendor: Intel
Device OpenCL C version: OpenCL C 1.2
Driver version: 1.1
Profile: FULL_PROFILE
Version: OpenCL 1.2
Extensions: cl_APPLE_SetMemObjectDestructor cl_APPLE_ContextLoggingFunctions cl_APPLE_clut cl_APPLE_query_kernel_names cl_APPLE_gl_sharing cl_khr_gl_event cl_khr_fp64 cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_byte_addressable_store cl_khr_int64_base_atomics cl_khr_int64_extended_atomics cl_khr_3d_image_writes cl_khr_image2d_from_buffer cl_APPLE_fp64_basic_ops cl_APPLE_fixed_alpha_channel_orders cl_APPLE_biased_fixed_point_image_formats cl_APPLE_command_queue_priority
Device Type: CL_DEVICE_TYPE_GPU
Device ID: 16918272
Max compute units: 2
Max work items dimensions: 3
Max work items[0]: 1024
Max work items[1]: 1024
Max work items[2]: 64
Max work group size: 1024
Preferred vector width char: 1
Preferred vector width short: 1
Preferred vector width int: 1
Preferred vector width long: 1
Preferred vector width float: 1
Preferred vector width double: 1
Native vector width char: 1
Native vector width short: 1
Native vector width int: 1
Native vector width long: 1
Native vector width float: 1
Native vector width double: 1
Max clock frequency: 900Mhz
Address bits: 32
Max memory allocation: 268435456
Image support: Yes
Max number of images read arguments: 256
Max number of images write arguments: 16
Max image 2D width: 16384
Max image 2D height: 16384
Max image 3D width: 2048
Max image 3D height: 2048
Max image 3D depth: 2048
Max samplers within kernel: 32
Max size of kernel argument: 4352
Alignment (bits) of base address: 1024
Minimum alignment (bytes) for any datatype: 128
Single precision floating point capability
Denorms: Yes
Quiet NaNs: Yes
Round to nearest even: Yes
Round to zero: Yes
Round to +ve and infinity: Yes
IEEE754-2008 fused multiply-add: No
Cache type: None
Cache line size: 0
Cache size: 0
Global memory size: 1073741824
Constant buffer size: 65536
Max number of constant args: 9
Local memory type: Local
Local memory size: 49152
Error correction support: 0
Unified memory for Host and Device: 0
Profiling timer resolution: 1000
Device endianess: Little
Available: Yes
Compiler available: Yes
Execution capabilities:
Execute OpenCL kernels: Yes
Execute native function: No
Queue properties:
Out-of-Order: No
Profiling : Yes
Platform ID: 0x7fff0000
Name: GeForce GT 650M
Vendor: NVIDIA
Device OpenCL C version: OpenCL C 1.2
Driver version: 8.18.22 310.40.05f01
Profile: FULL_PROFILE
Version: OpenCL 1.2
Extensions: cl_APPLE_SetMemObjectDestructor cl_APPLE_ContextLoggingFunctions cl_APPLE_clut cl_APPLE_query_kernel_names cl_APPLE_gl_sharing cl_khr_gl_event cl_khr_byte_addressable_store cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_APPLE_fp64_basic_ops cl_khr_fp64 cl_khr_3d_image_writes cl_khr_depth_images cl_khr_gl_depth_images cl_khr_gl_msaa_sharing cl_khr_image2d_from_buffer
Device Type: CL_DEVICE_TYPE_GPU
Device ID: 16925696
Max compute units: 16
Max work items dimensions: 3
Max work items[0]: 512
Max work items[1]: 512
Max work items[2]: 512
Max work group size: 512
Preferred vector width char: 1
Preferred vector width short: 1
Preferred vector width int: 1
Preferred vector width long: 1
Preferred vector width float: 1
Preferred vector width double: 0
Native vector width char: 1
Native vector width short: 1
Native vector width int: 1
Native vector width long: 1
Native vector width float: 1
Native vector width double: 0
Max clock frequency: 1200Mhz
Address bits: 64
Max memory allocation: 268435456
Image support: Yes
Max number of images read arguments: 128
Max number of images write arguments: 8
Max image 2D width: 16384
Max image 2D height: 16384
Max image 3D width: 2048
Max image 3D height: 2048
Max image 3D depth: 2048
Max samplers within kernel: 16
Max size of kernel argument: 1024
Alignment (bits) of base address: 1024
Minimum alignment (bytes) for any datatype: 128
Single precision floating point capability
Denorms: No
Quiet NaNs: Yes
Round to nearest even: Yes
Round to zero: Yes
Round to +ve and infinity: Yes
IEEE754-2008 fused multiply-add: No
Cache type: None
Cache line size: 0
Cache size: 0
Global memory size: 1073741824
Constant buffer size: 65536
Max number of constant args: 8
Local memory type: Local
Local memory size: 65536
Error correction support: 0
Unified memory for Host and Device: 1
Profiling timer resolution: 80
Device endianess: Little
Available: Yes
Compiler available: Yes
Execution capabilities:
Execute OpenCL kernels: Yes
Execute native function: No
Queue properties:
Out-of-Order: No
Profiling : Yes
Platform ID: 0x7fff0000
Name: HD Graphics 4000
Vendor: Intel
Device OpenCL C version: OpenCL C 1.2
Driver version: 1.2(Sep 19 2013 22:31:23)
Profile: FULL_PROFILE
Version: OpenCL 1.2
Extensions: cl_APPLE_SetMemObjectDestructor cl_APPLE_ContextLoggingFunctions cl_APPLE_clut cl_APPLE_query_kernel_names cl_APPLE_gl_sharing cl_khr_gl_event cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_byte_addressable_store cl_khr_image2d_from_buffer cl_khr_gl_depth_images cl_khr_depth_images
OpenCL:
Version: 2.3.57
Obtained from: Apple
Last Modified: 22/10/2013 23:40
Kind: Intel
64-Bit (Intel): Yes
Signed by: Software Signing, Apple Code Signing Certification Authority, Apple Root CA
Get Info String: 2.3.57, Copyright 2008-2013 Apple Inc.
Location: /System/Library/Frameworks/OpenCL.framework
Private: No
@abergeron you told me you had problem on mac with this, but you gave me more detail. I think it will help them. Can you write what you did? From memory, you described that clmath didn't checked the size of something in opencl driver.
On Thu, Oct 31, 2013 at 5:43 AM, Christian Kellner <notifications@github.com
wrote:
The device is a retina MacBook Pro (Mid 2012) running OS X 10.9 (see below). It has two GPUs, the builtin Intel HD Graphics 4000 and the dedicated NVIDIA GeForce GT 650M. There didn't seem to be an clinfo program installed so I took the source of the debian package and fix the includes so it would compile. Output is below. I am not sure what other information would be helpful. Don't hesitate to ask if you can think if anything.
% sw_vers ProductName: Mac OS X ProductVersion: 10.9 BuildVersion: 13A598
% ./clinfo Number of platforms: 1 Platform Profile: FULL_PROFILE Platform Version: OpenCL 1.2 (Aug 24 2013 21:03:27) Platform Name: Apple Platform Vendor: Apple Platform Extensions: cl_APPLE_SetMemObjectDestructor cl_APPLE_ContextLoggingFunctions cl_APPLE_clut cl_APPLE_query_kernel_names cl_APPLE_gl_sharing cl_khr_gl_event
Platform Name: Apple Number of devices: 3 Device Type: CL_DEVICE_TYPE_CPU Device ID: 4294967295 Max compute units: 8 Max work items dimensions: 3 Max work items[0]: 1024 Max work items[1]: 1 Max work items[2]: 1 Max work group size: 1024 Preferred vector width char: 16 Preferred vector width short: 8 Preferred vector width int: 4 Preferred vector width long: 2 Preferred vector width float: 4 Preferred vector width double: 2 Native vector width char: 16 Native vector width short: 8 Native vector width int: 4 Native vector width long: 2 Native vector width float: 4 Native vector width double: 2 Max clock frequency: 2700Mhz Address bits: 64 Max memory allocation: 4294967296 Image support: Yes Max number of images read arguments: 128 Max number of images write arguments: 8 Max image 2D width: 8192 Max image 2D height: 8192 Max image 3D width: 2048 Max image 3D height: 2048 Max image 3D depth: 2048 Max samplers within kernel: 16 Max size of kernel argument: 4096 Alignment (bits) of base address: 1024 Minimum alignment (bytes) for any datatype: 128 Single precision floating point capability Denorms: Yes Quiet NaNs: Yes Round to nearest even: Yes Round to zero: Yes Round to +ve and infinity: Yes IEEE754-2008 fused multiply-add: Yes Cache type: Read/Write Cache line size: 8388608 Cache size: 64 Global memory size: 17179869184 Constant buffer size: 65536 Max number of constant args: 8 Local memory type: Global Local memory size: 32768 Error correction support: 0 Unified memory for Host and Device: 1 Profiling timer resolution: 1 Device endianess: Little Available: Yes Compiler available: Yes Execution capabilities: Execute OpenCL kernels: Yes Execute native function: Yes Queue properties: Out-of-Order: No Profiling : Yes Platform ID: 0x7fff0000 Name: Intel(R) Core(TM) i7-3820QM CPU @ 2.70GHz Vendor: Intel Device OpenCL C version: OpenCL C 1.2 Driver version: 1.1 Profile: FULL_PROFILE Version: OpenCL 1.2 Extensions: cl_APPLE_SetMemObjectDestructor cl_APPLE_ContextLoggingFunctions cl_APPLE_clut cl_APPLE_query_kernel_names cl_APPLE_gl_sharing cl_khr_gl_event cl_khr_fp64 cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_byte_addressable_store cl_khr_int64_base_atomics cl_khr_int64_extended_atomics cl_khr_3d_image_writes cl_khr_image2d_from_buffer cl_APPLE_fp64_basic_ops cl_APPLE_fixed_alpha_channel_orders cl_APPLE_biased_fixed_point_image_formats cl_APPLE_command_queue_priority
Device Type: CL_DEVICE_TYPE_GPU Device ID: 16918272 Max compute units: 2 Max work items dimensions: 3 Max work items[0]: 1024 Max work items[1]: 1024 Max work items[2]: 64 Max work group size: 1024 Preferred vector width char: 1 Preferred vector width short: 1 Preferred vector width int: 1 Preferred vector width long: 1 Preferred vector width float: 1 Preferred vector width double: 1 Native vector width char: 1 Native vector width short: 1 Native vector width int: 1 Native vector width long: 1 Native vector width float: 1 Native vector width double: 1 Max clock frequency: 900Mhz Address bits: 32 Max memory allocation: 268435456 Image support: Yes Max number of images read arguments: 256 Max number of images write arguments: 16 Max image 2D width: 16384 Max image 2D height: 16384 Max image 3D width: 2048 Max image 3D height: 2048 Max image 3D depth: 2048 Max samplers within kernel: 32 Max size of kernel argument: 4352 Alignment (bits) of base address: 1024 Minimum alignment (bytes) for any datatype: 128 Single precision floating point capability Denorms: Yes Quiet NaNs: Yes Round to nearest even: Yes Round to zero: Yes Round to +ve and infinity: Yes IEEE754-2008 fused multiply-add: No Cache type: None Cache line size: 0 Cache size: 0 Global memory size: 1073741824 Constant buffer size: 65536 Max number of constant args: 9 Local memory type: Local Local memory size: 49152 Error correction support: 0 Unified memory for Host and Device: 0 Profiling timer resolution: 1000 Device endianess: Little Available: Yes Compiler available: Yes Execution capabilities: Execute OpenCL kernels: Yes Execute native function: No Queue properties: Out-of-Order: No Profiling : Yes Platform ID: 0x7fff0000 Name: GeForce GT 650M Vendor: NVIDIA Device OpenCL C version: OpenCL C 1.2 Driver version: 8.18.22 310.40.05f01 Profile: FULL_PROFILE Version: OpenCL 1.2 Extensions: cl_APPLE_SetMemObjectDestructor cl_APPLE_ContextLoggingFunctions cl_APPLE_clut cl_APPLE_query_kernel_names cl_APPLE_gl_sharing cl_khr_gl_event cl_khr_byte_addressable_store cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_APPLE_fp64_basic_ops cl_khr_fp64 cl_khr_3d_image_writes cl_khr_depth_images cl_khr_gl_depth_images cl_khr_gl_msaa_sharing cl_khr_image2d_from_buffer
Device Type: CL_DEVICE_TYPE_GPU Device ID: 16925696 Max compute units: 16 Max work items dimensions: 3 Max work items[0]: 512 Max work items[1]: 512 Max work items[2]: 512 Max work group size: 512 Preferred vector width char: 1 Preferred vector width short: 1 Preferred vector width int: 1 Preferred vector width long: 1 Preferred vector width float: 1 Preferred vector width double: 0 Native vector width char: 1 Native vector width short: 1 Native vector width int: 1 Native vector width long: 1 Native vector width float: 1 Native vector width double: 0 Max clock frequency: 1200Mhz Address bits: 64 Max memory allocation: 268435456 Image support: Yes Max number of images read arguments: 128 Max number of images write arguments: 8 Max image 2D width: 16384 Max image 2D height: 16384 Max image 3D width: 2048 Max image 3D height: 2048 Max image 3D depth: 2048 Max samplers within kernel: 16 Max size of kernel argument: 1024 Alignment (bits) of base address: 1024 Minimum alignment (bytes) for any datatype: 128 Single precision floating point capability Denorms: No Quiet NaNs: Yes Round to nearest even: Yes Round to zero: Yes Round to +ve and infinity: Yes IEEE754-2008 fused multiply-add: No Cache type: None Cache line size: 0 Cache size: 0 Global memory size: 1073741824 Constant buffer size: 65536 Max number of constant args: 8 Local memory type: Local Local memory size: 65536 Error correction support: 0 Unified memory for Host and Device: 1 Profiling timer resolution: 80 Device endianess: Little Available: Yes Compiler available: Yes Execution capabilities: Execute OpenCL kernels: Yes Execute native function: No Queue properties: Out-of-Order: No Profiling : Yes Platform ID: 0x7fff0000 Name: HD Graphics 4000 Vendor: Intel Device OpenCL C version: OpenCL C 1.2 Driver version: 1.2(Sep 19 2013 22:31:23) Profile: FULL_PROFILE Version: OpenCL 1.2 Extensions: cl_APPLE_SetMemObjectDestructor cl_APPLE_ContextLoggingFunctions cl_APPLE_clut cl_APPLE_query_kernel_names cl_APPLE_gl_sharing cl_khr_gl_event cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_byte_addressable_store cl_khr_image2d_from_buffer cl_khr_gl_depth_images cl_khr_depth_images
OpenCL:
Version: 2.3.57 Obtained from: Apple Last Modified: 22/10/2013 23:40 Kind: Intel 64-Bit (Intel): Yes Signed by: Software Signing, Apple Code Signing Certification Authority, Apple Root CA Get Info String: 2.3.57, Copyright 2008-2013 Apple Inc. Location: /System/Library/Frameworks/OpenCL.framework Private: No
Reply to this email directly or view it on GitHubhttps://github.com/clMathLibraries/clBLAS/issues/21#issuecomment-27472830 .
Are you referring to the maximum local group size problem? Because, if yes, it has absolutely nothing to do with this problem.
What would be needed for proper Mac support is to get the test suite to build and run without errors. I haven't managed the building part yet, since AMCL is not available on Mac and building netlib correctly is a huge pain because of the fortran code. If there was a way to link with a C blas implementation (like ALTAS, GotoBLAS or the Accelerate framework) it would be much easier to try the testsuite.
So I was completly wrong! Sorry for the trouble.
On Thu, Oct 31, 2013 at 8:35 PM, abergeron notifications@github.com wrote:
Are you referring to the maximum local group size problem? Because, if yes, it has absolutely nothing to do with this problem.
What would be needed for proper Mac support is to get the test suite to build and run without errors. I haven't managed the building part yet, since AMCL is not available on Mac and building netlib correctly is a huge pain because of the fortran code. If there was a way to link with a C blas implementation (like ALTAS, GotoBLAS or the Accelerate framework) it would be much easier to try the testsuite.
— Reply to this email directly or view it on GitHubhttps://github.com/clMathLibraries/clBLAS/issues/21#issuecomment-27540495 .
I did some more investigation and followed my gut feeling that this crash occurs if we don't have the kernel source but still call clGetProgramInfo (CL_PROGRAM_SOURCE). I checked for how this could happen and found the call to dropProgramSource (see below). So I quickly added a flag (noSource) to the Kernel structure, indicating if we have the kernel source or not and then in fullKernelSize condition the call to clGetProgramInfo (SOURCE) on that flag.
This fixes the crash for me.
[==========] Running 9808 tests from 124 test cases.
[----------] Global test environment set-up.
[----------] 4 tests from TRSM_extratest
[ RUN ] TRSM_extratest.strsm
Process 5668 stopped
* thread #1: tid = 0x5658, 0x0000000100aa388f libclBLAS.2.dylib`makeKernel(device=0x0000000001022700, context=0x0000000101b1e790, kernelGenerator=0x0000000100acd310, dims=0x0000000102e072e0, pgran=0x0000000102e07358, extra=0x00007fff5fbfde58, buildOpts=0x00007fff5fbfdc60, error=0x00007fff5fbfdfa4) + 559 at common.c:494, queue = 'com.apple.main-thread, stop reason = breakpoint 2.1
frame #0: 0x0000000100aa388f libclBLAS.2.dylib`makeKernel(device=0x0000000001022700, context=0x0000000101b1e790, kernelGenerator=0x0000000100acd310, dims=0x0000000102e072e0, pgran=0x0000000102e07358, extra=0x00007fff5fbfde58, buildOpts=0x00007fff5fbfdc60, error=0x00007fff5fbfdfa4) + 559 at common.c:494
491
492 #if !defined(KEEP_CLBLAS_KERNEL_SOURCES)
493 if (err == CL_SUCCESS) {
-> 494 err = dropProgramSource(&kernel->program, context, device);
495 kernel->noSource = 1;
496 }
497 #endif /* !DUMP_CLBLAS_KERNELS */
pull request #24 closes this issue
Using the vanilla example program (sgemm) as a test case I get a crash on OSX (10.9) inside fullKernelSize(). Maybe this is an bug in the OpenCL implementation (similar to [1]) on OSX because it chokes on a strlen called with NULL inside clGetProgramInfo(). Commenting out that line will make the test program work. Stacktrace is attached.
[1] http://www.mail-archive.com/pocl-devel@lists.sourceforge.net/msg00414.html