clMathLibraries / clBLAS

a software library containing BLAS functions written in OpenCL
Apache License 2.0
844 stars 237 forks source link

[OSX] crash in clGetProgramInfo called by fullKernelSize #21

Closed gicmo closed 11 years ago

gicmo commented 11 years ago

Using the vanilla example program (sgemm) as a test case I get a crash on OSX (10.9) inside fullKernelSize(). Maybe this is an bug in the OpenCL implementation (similar to [1]) on OSX because it chokes on a strlen called with NULL inside clGetProgramInfo(). Commenting out that line will make the test program work. Stacktrace is attached.

[1] http://www.mail-archive.com/pocl-devel@lists.sourceforge.net/msg00414.html

* thread #1: tid = 0x5463cc, 0x00007fff908c0812 libsystem_c.dylib`strlen + 18, queue = 'com.apple.main-thread, stop reason = EXC_BAD_ACCESS (code=1, address=0x0)
    frame #0: 0x00007fff908c0812 libsystem_c.dylib`strlen + 18
    frame #1: 0x00007fff85134c60 OpenCL`clGetProgramInfo + 625
    frame #2: 0x00000001000075cc libclBLAS.2.dylib`fullKernelSize(kern=0x000000010054ed60) + 236 at kern_cache.c:428
    frame #3: 0x00000001000072b6 libclBLAS.2.dylib`addKernelToCache(kcache=0x00000001002177f0, sid=18, kern=0x000000010054ed60, key=0x00007fff5fbff000, extraCmp=0x000000010003c5e0) + 198 at kern_cache.c:311
    frame #4: 0x0000000100038c73 libclBLAS.2.dylib`makeSolutionSeq(funcID=CLBLAS_GEMM, args=0x00007fff5fbff4d0, numCommandQueues=1, commandQueues=0x00007fff5fbff830, numEventsInWaitList=0, eventWaitList=0x0000000000000000, events=0x00007fff5fbff810, seq=0x00007fff5fbff2d8) + 2915 at solution_seq_make.c:599
    frame #5: 0x000000010001503e libclBLAS.2.dylib`doGemm(kargs=0x00007fff5fbff4d0, order=clblasRowMajor, transA=clblasNoTrans, transB=clblasNoTrans, M=3, N=2, K=4, A=0x000000010021a190, offA=6, lda=5, B=0x000000010021a410, offB=4, ldb=3, C=0x000000010021a690, offC=4, ldc=3, numCommandQueues=1, commandQueues=0x00007fff5fbff830, numEventsInWaitList=0, eventWaitList=0x0000000000000000, events=0x00007fff5fbff810) + 1118 at xgemm.c:102
    frame #6: 0x0000000100014bc8 libclBLAS.2.dylib`clblasSgemm(order=clblasRowMajor, transA=clblasNoTrans, transB=clblasNoTrans, M=3, N=2, K=4, alpha=10, A=0x000000010021a190, offA=6, lda=5, B=0x000000010021a410, offB=4, ldb=3, beta=20, C=0x000000010021a690, offC=4, ldc=3, numCommandQueues=1, commandQueues=0x00007fff5fbff830, numEventsInWaitList=0, eventWaitList=0x0000000000000000, events=0x00007fff5fbff810) + 680 at xgemm.c:145
    frame #7: 0x0000000100001aaf clblastest`main + 1295 at main.cpp:156
    frame #8: 0x00007fff908765fd libdyld.dylib`start + 1
kknox commented 11 years ago

Could you post some system information, possibly the output from clinfo?

gicmo commented 11 years ago

The device is a retina MacBook Pro (Mid 2012) running OS X 10.9 (see below). It has two GPUs, the builtin Intel HD Graphics 4000 and the dedicated NVIDIA GeForce GT 650M. There didn't seem to be an clinfo program installed so I took the source of the debian package and fixed the includes (and some other minor things) so it would compile on mac. Output is below. I am not sure what other information would be helpful. Don't hesitate to ask if you can think if anything.

% sw_vers
ProductName:    Mac OS X
ProductVersion: 10.9
BuildVersion:   13A598
% ./clinfo
Number of platforms:                 1
  Platform Profile:              FULL_PROFILE
  Platform Version:              OpenCL 1.2 (Aug 24 2013 21:03:27)
  Platform Name:                 Apple
  Platform Vendor:               Apple
  Platform Extensions:               cl_APPLE_SetMemObjectDestructor cl_APPLE_ContextLoggingFunctions cl_APPLE_clut cl_APPLE_query_kernel_names cl_APPLE_gl_sharing cl_khr_gl_event

  Platform Name:                 Apple
Number of devices:               3
  Device Type:                   CL_DEVICE_TYPE_CPU
  Device ID:                     4294967295
  Max compute units:                 8
  Max work items dimensions:             3
    Max work items[0]:               1024
    Max work items[1]:               1
    Max work items[2]:               1
  Max work group size:               1024
  Preferred vector width char:           16
  Preferred vector width short:          8
  Preferred vector width int:            4
  Preferred vector width long:           2
  Preferred vector width float:          4
  Preferred vector width double:         2
  Native vector width char:          16
  Native vector width short:             8
  Native vector width int:           4
  Native vector width long:          2
  Native vector width float:             4
  Native vector width double:            2
  Max clock frequency:               2700Mhz
  Address bits:                  64
  Max memory allocation:             4294967296
  Image support:                 Yes
  Max number of images read arguments:       128
  Max number of images write arguments:      8
  Max image 2D width:                8192
  Max image 2D height:               8192
  Max image 3D width:                2048
  Max image 3D height:               2048
  Max image 3D depth:                2048
  Max samplers within kernel:            16
  Max size of kernel argument:           4096
  Alignment (bits) of base address:      1024
  Minimum alignment (bytes) for any datatype:    128
  Single precision floating point capability
    Denorms:                     Yes
    Quiet NaNs:                  Yes
    Round to nearest even:           Yes
    Round to zero:               Yes
    Round to +ve and infinity:           Yes
    IEEE754-2008 fused multiply-add:         Yes
  Cache type:                    Read/Write
  Cache line size:               8388608
  Cache size:                    64
  Global memory size:                17179869184
  Constant buffer size:              65536
  Max number of constant args:           8
  Local memory type:                 Global
  Local memory size:                 32768
  Error correction support:          0
  Unified memory for Host and Device:        1
  Profiling timer resolution:            1
  Device endianess:              Little
  Available:                     Yes
  Compiler available:                Yes
  Execution capabilities:
    Execute OpenCL kernels:          Yes
    Execute native function:             Yes
  Queue properties:
    Out-of-Order:                No
    Profiling :                  Yes
  Platform ID:                   0x7fff0000
  Name:                      Intel(R) Core(TM) i7-3820QM CPU @ 2.70GHz
  Vendor:                    Intel
  Device OpenCL C version:           OpenCL C 1.2
  Driver version:                1.1
  Profile:                   FULL_PROFILE
  Version:                   OpenCL 1.2
  Extensions:                    cl_APPLE_SetMemObjectDestructor cl_APPLE_ContextLoggingFunctions cl_APPLE_clut cl_APPLE_query_kernel_names cl_APPLE_gl_sharing cl_khr_gl_event cl_khr_fp64 cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_byte_addressable_store cl_khr_int64_base_atomics cl_khr_int64_extended_atomics cl_khr_3d_image_writes cl_khr_image2d_from_buffer cl_APPLE_fp64_basic_ops cl_APPLE_fixed_alpha_channel_orders cl_APPLE_biased_fixed_point_image_formats cl_APPLE_command_queue_priority

  Device Type:                   CL_DEVICE_TYPE_GPU
  Device ID:                     16918272
  Max compute units:                 2
  Max work items dimensions:             3
    Max work items[0]:               1024
    Max work items[1]:               1024
    Max work items[2]:               64
  Max work group size:               1024
  Preferred vector width char:           1
  Preferred vector width short:          1
  Preferred vector width int:            1
  Preferred vector width long:           1
  Preferred vector width float:          1
  Preferred vector width double:         1
  Native vector width char:          1
  Native vector width short:             1
  Native vector width int:           1
  Native vector width long:          1
  Native vector width float:             1
  Native vector width double:            1
  Max clock frequency:               900Mhz
  Address bits:                  32
  Max memory allocation:             268435456
  Image support:                 Yes
  Max number of images read arguments:       256
  Max number of images write arguments:      16
  Max image 2D width:                16384
  Max image 2D height:               16384
  Max image 3D width:                2048
  Max image 3D height:               2048
  Max image 3D depth:                2048
  Max samplers within kernel:            32
  Max size of kernel argument:           4352
  Alignment (bits) of base address:      1024
  Minimum alignment (bytes) for any datatype:    128
  Single precision floating point capability
    Denorms:                     Yes
    Quiet NaNs:                  Yes
    Round to nearest even:           Yes
    Round to zero:               Yes
    Round to +ve and infinity:           Yes
    IEEE754-2008 fused multiply-add:         No
  Cache type:                    None
  Cache line size:               0
  Cache size:                    0
  Global memory size:                1073741824
  Constant buffer size:              65536
  Max number of constant args:           9
  Local memory type:                 Local
  Local memory size:                 49152
  Error correction support:          0
  Unified memory for Host and Device:        0
  Profiling timer resolution:            1000
  Device endianess:              Little
  Available:                     Yes
  Compiler available:                Yes
  Execution capabilities:
    Execute OpenCL kernels:          Yes
    Execute native function:             No
  Queue properties:
    Out-of-Order:                No
    Profiling :                  Yes
  Platform ID:                   0x7fff0000
  Name:                      GeForce GT 650M
  Vendor:                    NVIDIA
  Device OpenCL C version:           OpenCL C 1.2
  Driver version:                8.18.22 310.40.05f01
  Profile:                   FULL_PROFILE
  Version:                   OpenCL 1.2
  Extensions:                    cl_APPLE_SetMemObjectDestructor cl_APPLE_ContextLoggingFunctions cl_APPLE_clut cl_APPLE_query_kernel_names cl_APPLE_gl_sharing cl_khr_gl_event cl_khr_byte_addressable_store cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_APPLE_fp64_basic_ops cl_khr_fp64 cl_khr_3d_image_writes cl_khr_depth_images cl_khr_gl_depth_images cl_khr_gl_msaa_sharing cl_khr_image2d_from_buffer

  Device Type:                   CL_DEVICE_TYPE_GPU
  Device ID:                     16925696
  Max compute units:                 16
  Max work items dimensions:             3
    Max work items[0]:               512
    Max work items[1]:               512
    Max work items[2]:               512
  Max work group size:               512
  Preferred vector width char:           1
  Preferred vector width short:          1
  Preferred vector width int:            1
  Preferred vector width long:           1
  Preferred vector width float:          1
  Preferred vector width double:         0
  Native vector width char:          1
  Native vector width short:             1
  Native vector width int:           1
  Native vector width long:          1
  Native vector width float:             1
  Native vector width double:            0
  Max clock frequency:               1200Mhz
  Address bits:                  64
  Max memory allocation:             268435456
  Image support:                 Yes
  Max number of images read arguments:       128
  Max number of images write arguments:      8
  Max image 2D width:                16384
  Max image 2D height:               16384
  Max image 3D width:                2048
  Max image 3D height:               2048
  Max image 3D depth:                2048
  Max samplers within kernel:            16
  Max size of kernel argument:           1024
  Alignment (bits) of base address:      1024
  Minimum alignment (bytes) for any datatype:    128
  Single precision floating point capability
    Denorms:                     No
    Quiet NaNs:                  Yes
    Round to nearest even:           Yes
    Round to zero:               Yes
    Round to +ve and infinity:           Yes
    IEEE754-2008 fused multiply-add:         No
  Cache type:                    None
  Cache line size:               0
  Cache size:                    0
  Global memory size:                1073741824
  Constant buffer size:              65536
  Max number of constant args:           8
  Local memory type:                 Local
  Local memory size:                 65536
  Error correction support:          0
  Unified memory for Host and Device:        1
  Profiling timer resolution:            80
  Device endianess:              Little
  Available:                     Yes
  Compiler available:                Yes
  Execution capabilities:
    Execute OpenCL kernels:          Yes
    Execute native function:             No
  Queue properties:
    Out-of-Order:                No
    Profiling :                  Yes
  Platform ID:                   0x7fff0000
  Name:                      HD Graphics 4000
  Vendor:                    Intel
  Device OpenCL C version:           OpenCL C 1.2
  Driver version:                1.2(Sep 19 2013 22:31:23)
  Profile:                   FULL_PROFILE
  Version:                   OpenCL 1.2
  Extensions:                    cl_APPLE_SetMemObjectDestructor cl_APPLE_ContextLoggingFunctions cl_APPLE_clut cl_APPLE_query_kernel_names cl_APPLE_gl_sharing cl_khr_gl_event cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_byte_addressable_store cl_khr_image2d_from_buffer cl_khr_gl_depth_images cl_khr_depth_images
OpenCL:

  Version:  2.3.57
  Obtained from:    Apple
  Last Modified:    22/10/2013 23:40
  Kind: Intel
  64-Bit (Intel):   Yes
  Signed by:    Software Signing, Apple Code Signing Certification Authority, Apple Root CA
  Get Info String:  2.3.57, Copyright 2008-2013 Apple Inc.
  Location: /System/Library/Frameworks/OpenCL.framework
  Private:  No
nouiz commented 11 years ago

@abergeron you told me you had problem on mac with this, but you gave me more detail. I think it will help them. Can you write what you did? From memory, you described that clmath didn't checked the size of something in opencl driver.

On Thu, Oct 31, 2013 at 5:43 AM, Christian Kellner <notifications@github.com

wrote:

The device is a retina MacBook Pro (Mid 2012) running OS X 10.9 (see below). It has two GPUs, the builtin Intel HD Graphics 4000 and the dedicated NVIDIA GeForce GT 650M. There didn't seem to be an clinfo program installed so I took the source of the debian package and fix the includes so it would compile. Output is below. I am not sure what other information would be helpful. Don't hesitate to ask if you can think if anything.

% sw_vers ProductName: Mac OS X ProductVersion: 10.9 BuildVersion: 13A598

% ./clinfo Number of platforms: 1 Platform Profile: FULL_PROFILE Platform Version: OpenCL 1.2 (Aug 24 2013 21:03:27) Platform Name: Apple Platform Vendor: Apple Platform Extensions: cl_APPLE_SetMemObjectDestructor cl_APPLE_ContextLoggingFunctions cl_APPLE_clut cl_APPLE_query_kernel_names cl_APPLE_gl_sharing cl_khr_gl_event

Platform Name: Apple Number of devices: 3 Device Type: CL_DEVICE_TYPE_CPU Device ID: 4294967295 Max compute units: 8 Max work items dimensions: 3 Max work items[0]: 1024 Max work items[1]: 1 Max work items[2]: 1 Max work group size: 1024 Preferred vector width char: 16 Preferred vector width short: 8 Preferred vector width int: 4 Preferred vector width long: 2 Preferred vector width float: 4 Preferred vector width double: 2 Native vector width char: 16 Native vector width short: 8 Native vector width int: 4 Native vector width long: 2 Native vector width float: 4 Native vector width double: 2 Max clock frequency: 2700Mhz Address bits: 64 Max memory allocation: 4294967296 Image support: Yes Max number of images read arguments: 128 Max number of images write arguments: 8 Max image 2D width: 8192 Max image 2D height: 8192 Max image 3D width: 2048 Max image 3D height: 2048 Max image 3D depth: 2048 Max samplers within kernel: 16 Max size of kernel argument: 4096 Alignment (bits) of base address: 1024 Minimum alignment (bytes) for any datatype: 128 Single precision floating point capability Denorms: Yes Quiet NaNs: Yes Round to nearest even: Yes Round to zero: Yes Round to +ve and infinity: Yes IEEE754-2008 fused multiply-add: Yes Cache type: Read/Write Cache line size: 8388608 Cache size: 64 Global memory size: 17179869184 Constant buffer size: 65536 Max number of constant args: 8 Local memory type: Global Local memory size: 32768 Error correction support: 0 Unified memory for Host and Device: 1 Profiling timer resolution: 1 Device endianess: Little Available: Yes Compiler available: Yes Execution capabilities: Execute OpenCL kernels: Yes Execute native function: Yes Queue properties: Out-of-Order: No Profiling : Yes Platform ID: 0x7fff0000 Name: Intel(R) Core(TM) i7-3820QM CPU @ 2.70GHz Vendor: Intel Device OpenCL C version: OpenCL C 1.2 Driver version: 1.1 Profile: FULL_PROFILE Version: OpenCL 1.2 Extensions: cl_APPLE_SetMemObjectDestructor cl_APPLE_ContextLoggingFunctions cl_APPLE_clut cl_APPLE_query_kernel_names cl_APPLE_gl_sharing cl_khr_gl_event cl_khr_fp64 cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_byte_addressable_store cl_khr_int64_base_atomics cl_khr_int64_extended_atomics cl_khr_3d_image_writes cl_khr_image2d_from_buffer cl_APPLE_fp64_basic_ops cl_APPLE_fixed_alpha_channel_orders cl_APPLE_biased_fixed_point_image_formats cl_APPLE_command_queue_priority

Device Type: CL_DEVICE_TYPE_GPU Device ID: 16918272 Max compute units: 2 Max work items dimensions: 3 Max work items[0]: 1024 Max work items[1]: 1024 Max work items[2]: 64 Max work group size: 1024 Preferred vector width char: 1 Preferred vector width short: 1 Preferred vector width int: 1 Preferred vector width long: 1 Preferred vector width float: 1 Preferred vector width double: 1 Native vector width char: 1 Native vector width short: 1 Native vector width int: 1 Native vector width long: 1 Native vector width float: 1 Native vector width double: 1 Max clock frequency: 900Mhz Address bits: 32 Max memory allocation: 268435456 Image support: Yes Max number of images read arguments: 256 Max number of images write arguments: 16 Max image 2D width: 16384 Max image 2D height: 16384 Max image 3D width: 2048 Max image 3D height: 2048 Max image 3D depth: 2048 Max samplers within kernel: 32 Max size of kernel argument: 4352 Alignment (bits) of base address: 1024 Minimum alignment (bytes) for any datatype: 128 Single precision floating point capability Denorms: Yes Quiet NaNs: Yes Round to nearest even: Yes Round to zero: Yes Round to +ve and infinity: Yes IEEE754-2008 fused multiply-add: No Cache type: None Cache line size: 0 Cache size: 0 Global memory size: 1073741824 Constant buffer size: 65536 Max number of constant args: 9 Local memory type: Local Local memory size: 49152 Error correction support: 0 Unified memory for Host and Device: 0 Profiling timer resolution: 1000 Device endianess: Little Available: Yes Compiler available: Yes Execution capabilities: Execute OpenCL kernels: Yes Execute native function: No Queue properties: Out-of-Order: No Profiling : Yes Platform ID: 0x7fff0000 Name: GeForce GT 650M Vendor: NVIDIA Device OpenCL C version: OpenCL C 1.2 Driver version: 8.18.22 310.40.05f01 Profile: FULL_PROFILE Version: OpenCL 1.2 Extensions: cl_APPLE_SetMemObjectDestructor cl_APPLE_ContextLoggingFunctions cl_APPLE_clut cl_APPLE_query_kernel_names cl_APPLE_gl_sharing cl_khr_gl_event cl_khr_byte_addressable_store cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_APPLE_fp64_basic_ops cl_khr_fp64 cl_khr_3d_image_writes cl_khr_depth_images cl_khr_gl_depth_images cl_khr_gl_msaa_sharing cl_khr_image2d_from_buffer

Device Type: CL_DEVICE_TYPE_GPU Device ID: 16925696 Max compute units: 16 Max work items dimensions: 3 Max work items[0]: 512 Max work items[1]: 512 Max work items[2]: 512 Max work group size: 512 Preferred vector width char: 1 Preferred vector width short: 1 Preferred vector width int: 1 Preferred vector width long: 1 Preferred vector width float: 1 Preferred vector width double: 0 Native vector width char: 1 Native vector width short: 1 Native vector width int: 1 Native vector width long: 1 Native vector width float: 1 Native vector width double: 0 Max clock frequency: 1200Mhz Address bits: 64 Max memory allocation: 268435456 Image support: Yes Max number of images read arguments: 128 Max number of images write arguments: 8 Max image 2D width: 16384 Max image 2D height: 16384 Max image 3D width: 2048 Max image 3D height: 2048 Max image 3D depth: 2048 Max samplers within kernel: 16 Max size of kernel argument: 1024 Alignment (bits) of base address: 1024 Minimum alignment (bytes) for any datatype: 128 Single precision floating point capability Denorms: No Quiet NaNs: Yes Round to nearest even: Yes Round to zero: Yes Round to +ve and infinity: Yes IEEE754-2008 fused multiply-add: No Cache type: None Cache line size: 0 Cache size: 0 Global memory size: 1073741824 Constant buffer size: 65536 Max number of constant args: 8 Local memory type: Local Local memory size: 65536 Error correction support: 0 Unified memory for Host and Device: 1 Profiling timer resolution: 80 Device endianess: Little Available: Yes Compiler available: Yes Execution capabilities: Execute OpenCL kernels: Yes Execute native function: No Queue properties: Out-of-Order: No Profiling : Yes Platform ID: 0x7fff0000 Name: HD Graphics 4000 Vendor: Intel Device OpenCL C version: OpenCL C 1.2 Driver version: 1.2(Sep 19 2013 22:31:23) Profile: FULL_PROFILE Version: OpenCL 1.2 Extensions: cl_APPLE_SetMemObjectDestructor cl_APPLE_ContextLoggingFunctions cl_APPLE_clut cl_APPLE_query_kernel_names cl_APPLE_gl_sharing cl_khr_gl_event cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_byte_addressable_store cl_khr_image2d_from_buffer cl_khr_gl_depth_images cl_khr_depth_images

OpenCL:

Version: 2.3.57 Obtained from: Apple Last Modified: 22/10/2013 23:40 Kind: Intel 64-Bit (Intel): Yes Signed by: Software Signing, Apple Code Signing Certification Authority, Apple Root CA Get Info String: 2.3.57, Copyright 2008-2013 Apple Inc. Location: /System/Library/Frameworks/OpenCL.framework Private: No

— Reply to this email directly or view it on GitHubhttps://github.com/clMathLibraries/clBLAS/issues/21#issuecomment-27472830 .

abergeron commented 11 years ago

Are you referring to the maximum local group size problem? Because, if yes, it has absolutely nothing to do with this problem.

What would be needed for proper Mac support is to get the test suite to build and run without errors. I haven't managed the building part yet, since AMCL is not available on Mac and building netlib correctly is a huge pain because of the fortran code. If there was a way to link with a C blas implementation (like ALTAS, GotoBLAS or the Accelerate framework) it would be much easier to try the testsuite.

nouiz commented 11 years ago

So I was completly wrong! Sorry for the trouble.

On Thu, Oct 31, 2013 at 8:35 PM, abergeron notifications@github.com wrote:

Are you referring to the maximum local group size problem? Because, if yes, it has absolutely nothing to do with this problem.

What would be needed for proper Mac support is to get the test suite to build and run without errors. I haven't managed the building part yet, since AMCL is not available on Mac and building netlib correctly is a huge pain because of the fortran code. If there was a way to link with a C blas implementation (like ALTAS, GotoBLAS or the Accelerate framework) it would be much easier to try the testsuite.

— Reply to this email directly or view it on GitHubhttps://github.com/clMathLibraries/clBLAS/issues/21#issuecomment-27540495 .

gicmo commented 11 years ago

I did some more investigation and followed my gut feeling that this crash occurs if we don't have the kernel source but still call clGetProgramInfo (CL_PROGRAM_SOURCE). I checked for how this could happen and found the call to dropProgramSource (see below). So I quickly added a flag (noSource) to the Kernel structure, indicating if we have the kernel source or not and then in fullKernelSize condition the call to clGetProgramInfo (SOURCE) on that flag.

This fixes the crash for me.

[==========] Running 9808 tests from 124 test cases.
[----------] Global test environment set-up.
[----------] 4 tests from TRSM_extratest
[ RUN      ] TRSM_extratest.strsm
Process 5668 stopped
* thread #1: tid = 0x5658, 0x0000000100aa388f libclBLAS.2.dylib`makeKernel(device=0x0000000001022700, context=0x0000000101b1e790, kernelGenerator=0x0000000100acd310, dims=0x0000000102e072e0, pgran=0x0000000102e07358, extra=0x00007fff5fbfde58, buildOpts=0x00007fff5fbfdc60, error=0x00007fff5fbfdfa4) + 559 at common.c:494, queue = 'com.apple.main-thread, stop reason = breakpoint 2.1
    frame #0: 0x0000000100aa388f libclBLAS.2.dylib`makeKernel(device=0x0000000001022700, context=0x0000000101b1e790, kernelGenerator=0x0000000100acd310, dims=0x0000000102e072e0, pgran=0x0000000102e07358, extra=0x00007fff5fbfde58, buildOpts=0x00007fff5fbfdc60, error=0x00007fff5fbfdfa4) + 559 at common.c:494
   491
   492  #if !defined(KEEP_CLBLAS_KERNEL_SOURCES)
   493      if (err == CL_SUCCESS) {
-> 494          err = dropProgramSource(&kernel->program, context, device);
   495          kernel->noSource = 1;
   496      }
   497  #endif  /* !DUMP_CLBLAS_KERNELS */
kknox commented 11 years ago

pull request #24 closes this issue