clMathLibraries / clBLAS

a software library containing BLAS functions written in OpenCL
Apache License 2.0
843 stars 237 forks source link

Make GTest work on OS X #22

Closed gicmo closed 11 years ago

gicmo commented 11 years ago

I changed the cmake files and some of the source files to make clBLAS test suit compile on OS-X (using the Accelerate Framework).

Without any modification to the actual source of clBLAS all tests fail due to the same issue as reported in issue #21:

gicmo@kaon clBLAS/build/tests % lldb ./test-short                                                                                                                     [gtest_osx|…]
Current executable set to './test-short' (x86_64).
(lldb) r
Process 16074 launched: './test-short' (x86_64)
Initialize OpenCL and clblas...
---- NVIDIA
---- Intel
SetUp: about to create command queues

Test environment:

Device name: GeForce GT 650M
Device vendor: NVIDIA
Platform (bit): Apple OS X
clblas version: 2.1.0
Driver version: 8.18.22 310.40.05f01
Device version: OpenCL 1.2
Global mem size: 1024 MB
---------------------------------------------------------

[==========] Running 9808 tests from 124 test cases.
[----------] Global test environment set-up.
[----------] 4 tests from TRSM_extratest
[ RUN      ] TRSM_extratest.strsm
Process 16074 stopped
* thread #1: tid = 0x20c47, 0x00007fff9731c812 libsystem_c.dylib`strlen + 18, queue = 'com.apple.main-thread, stop reason = EXC_BAD_ACCESS (code=1, address=0x0)
    frame #0: 0x00007fff9731c812 libsystem_c.dylib`strlen + 18
libsystem_c.dylib`strlen + 18:
-> 0x7fff9731c812:  pcmpeqb (%rdi), %xmm0
   0x7fff9731c816:  pmovmskb %xmm0, %esi
   0x7fff9731c81a:  andq   $15, %rcx
   0x7fff9731c81e:  orq    $-1, %rax

(lldb) bt
* thread #1: tid = 0x20c47, 0x00007fff9731c812 libsystem_c.dylib`strlen + 18, queue = 'com.apple.main-thread, stop reason = EXC_BAD_ACCESS (code=1, address=0x0)
    frame #0: 0x00007fff9731c812 libsystem_c.dylib`strlen + 18
    frame #1: 0x00007fff8bb90c60 OpenCL`clGetProgramInfo + 625
    frame #2: 0x0000000100a785cc libclBLAS.2.dylib`fullKernelSize(kern=0x00000001098bce70) + 236 at kern_cache.c:428
    frame #3: 0x0000000100a782b6 libclBLAS.2.dylib`addKernelToCache(kcache=0x00000001098b84d0, sid=34, kern=0x00000001098bce70, key=0x00007fff5fbfddd0, extraCmp=0x0000000100aad5e0) + 198 at kern_cache.c:311
    frame #4: 0x0000000100aa9c73 libclBLAS.2.dylib`makeSolutionSeq(funcID=CLBLAS_TRSM, args=0x00007fff5fbfe250, numCommandQueues=1, commandQueues=0x000000010049e408, numEventsInWaitList=0, eventWaitList=0x0000000000000000, events=0x00000001089b14a0, seq=0x00007fff5fbfe0a0) + 2915 at solution_seq_make.c:599
    frame #5: 0x0000000100a87ccf libclBLAS.2.dylib`doTrsm(kargs=0x00007fff5fbfe250, order=clblasColumnMajor, side=clblasLeft, uplo=clblasUpper, transA=clblasNoTrans, diag=clblasNonUnit, M=5, N=2, A=0x00000001098badb0, offA=0, lda=32, B=0x0000000103427c20, offB=0, ldb=32, numCommandQueues=1, commandQueues=0x000000010049e408, numEventsInWaitList=0, eventWaitList=0x0000000000000000, events=0x00000001089b14a0) + 991 at xtrsm.c:105
    frame #6: 0x0000000100a878d3 libclBLAS.2.dylib`clblasStrsm(order=clblasColumnMajor, side=clblasLeft, uplo=clblasUpper, transA=clblasNoTrans, diag=clblasNonUnit, M=5, N=2, alpha=1, A=0x00000001098badb0, offA=0, lda=32, B=0x0000000103427c20, offB=0, ldb=32, numCommandQueues=1, commandQueues=0x000000010049e408, numEventsInWaitList=0, eventWaitList=0x0000000000000000, events=0x00000001089b14a0) + 547 at xtrsm.c:144
    frame #7: 0x0000000100371a78 test-short`clMath::clblas::trsm(order=clblasColumnMajor, side=clblasLeft, uplo=clblasUpper, transA=clblasNoTrans, diag=clblasNonUnit, M=5, N=2, alpha=1, A=0x00000001098badb0, offA=0, lda=32, B=0x0000000103427c20, offB=0, ldb=32, numCommandQueues=1, commandQueues=0x000000010049e408, numEventsInWaitList=0, eventWaitList=0x0000000000000000, events=0x00000001089b14a0) + 424 at clBLAS-wrapper.cpp:785
    frame #8: 0x0000000100045fa6 test-short`void Extratest<float>(M=5, N=2, lda=32, ldb=32, alpha=1, delta=0.0000099999997) + 2454 at corr-trsm.cpp:395
    frame #9: 0x000000010003678c test-short`TRSM_extratest_strsm_Test::TestBody(this=0x0000000109898dc0) + 76 at corr-trsm.cpp:434
    frame #10: 0x00000001003d7653 test-short`void testing::internal::HandleSehExceptionsInMethodIfSupported<testing::Test, void>(testing::Test*, void (testing::Test::*)(), char const*) + 131
    frame #11: 0x00000001003c2797 test-short`void testing::internal::HandleExceptionsInMethodIfSupported<testing::Test, void>(testing::Test*, void (testing::Test::*)(), char const*) + 119
    frame #12: 0x000000010039d045 test-short`testing::Test::Run() + 197
    frame #13: 0x000000010039e2bb test-short`testing::TestInfo::Run() + 219
    frame #14: 0x000000010039ef97 test-short`testing::TestCase::Run() + 231
    frame #15: 0x00000001003ab538 test-short`testing::internal::UnitTestImpl::RunAllTests() + 952
    frame #16: 0x00000001003d47e3 test-short`bool testing::internal::HandleSehExceptionsInMethodIfSupported<testing::internal::UnitTestImpl, bool>(testing::internal::UnitTestImpl*, bool (testing::internal::UnitTestImpl::*)(), char const*) + 131
    frame #17: 0x00000001003c4f07 test-short`bool testing::internal::HandleExceptionsInMethodIfSupported<testing::internal::UnitTestImpl, bool>(testing::internal::UnitTestImpl*, bool (testing::internal::UnitTestImpl::*)(), char const*) + 119
    frame #18: 0x00000001003ab0f6 test-short`testing::UnitTest::Run() + 422
    frame #19: 0x00000001002e9ae1 test-short`RUN_ALL_TESTS() + 17 at gtest.h:2288
    frame #20: 0x00000001002c9757 test-short`main(argc=1, argv=0x00007fff5fbff2c0) + 1015 at test-correctness.cpp:3397
    frame #21: 0x00007fff972d25fd libdyld.dylib`start + 1
kknox commented 11 years ago

Just curious, did you get the test programs to compile with 'unix makefiles' or did you manage to get it all to build with 'Xcode'?

I was having problems with the combination of boost/gtest/xcode and unresolved symbols, I assume with the clang compiler that xcode natively uses.

gicmo commented 11 years ago

This is indeed with the 'Unix Makefiles' cmake generator. But the 'Unix Makefiles' also use seem to be using clang:

-- The Fortran compiler identification is GNU
-- The C compiler identification is Clang 5.0.0
-- The CXX compiler identification is Clang 5.0.0

I haven't tried the XCode generator yet because I think cmake 2.8.12 produces xcode project files that Xcode 5.0 doesn't like: "Project /Users/gicmo/Coding/src/clBLAS/build-xcode/clBLAS.xcodeproj cannot be opened because the project file cannot be parsed."

gicmo commented 11 years ago

Ah, btw, I just realised you might be right with your suspicion about clang. If any of the other two libraries (boost, gtest) were compiled with gcc (e.g. via homebrew on a pre-10.9 machine) then they are linked against libstdc++; on the other hand I think that clang will always default to its own libc++. Of course they are incompatible, and the symptoms often is undefined symbols (e.g. the string classes). I have had that before. Upgrading to 10.9 results in homebrew using clang and libc++, hence no issue.

kknox commented 11 years ago

Thank you, sounds like a great reason to upgrade to 10.9.

My gut says that your issue and #21 are the result of a platform problem, and I am investigating.

I will finish a code review of your pull request in a day or two.

gicmo commented 11 years ago

I did some quick research and testing to see if we can provide better timing on OSX too. It seems the best approach is one based on mach_absolute_time[1]. Quick testing code: https://gist.github.com/gicmo/7335930

I have updated the branch. NB: the typedef is probably unnecessary because "sizeof(unsigned long) == sizeof(uint64_t)" is most likely is true on all modern macs but I think it is more clean that way.

[1] https://developer.apple.com/library/mac/qa/qa1398/_index.html

kknox commented 11 years ago

The good news is that I got your pull request to build on my laptop, but I am getting a completely different failure. However, it looks like my machine is almost completely different from your machine. Not just different hardware, but different opencl drivers. What I have below indicates a failure to compile the kernel on this machine, but it looks like it got farther than where you got on your machine.

$ lldb ./test-short
Current executable set to './test-short' (x86_64).
(lldb) r
Process 83382 launched: './test-short' (x86_64)
Initialize OpenCL and clblas...
---- AMD
SetUp: about to create command queues

Test environment:

Device name: ATI Radeon HD 6750M
Device vendor: AMD
Platform (bit): Apple OS X
clblas version: 2.1.0
Driver version: 1.0
Device version: OpenCL 1.1 
Global mem size: 1024 MB
---------------------------------------------------------

[==========] Running 9808 tests from 124 test cases.
[----------] Global test environment set-up.
[----------] 4 tests from TRSM_extratest
[ RUN      ] TRSM_extratest.strsm
Calling reference xTRSM routine... Calling clblas xTRSM routine... 
========================================================

AN INTERNAL KERNEL BUILD ERROR OCCURRED!
device name = ATI Radeon HD 6750M
error = -11
memory pattern = 2-staged cached global memory based block trsm, computing kernel generator
Subproblem dimensions: dims[0].itemY = SUBDIM_UNUSED, dims[0].itemX = 8, dims[0].y = 32, dims[0].x = 8, dims[0].bwidth = 32; ; dims[1].itemY = 4, dims[1].itemX = 1, dims[1].y = 4, dims[1].x = 1, dims[1].bwidth = 4; ; 
Parallelism granularity: pgran->wgDim = 1, pgran->wgSize[0] = 64, pgran->wgSize[1] = 1, pgran->wfSize = 64
Kernel extra flags: 671090480
gicmo commented 11 years ago

Good to see you got it to compile, so at least the patch works :-) Yep, indeed OpenCL version and driver are different. Maybe I should modify the test so it selects the Intel gpu, to see what problems arise then.

gicmo commented 11 years ago

Of course no modifications are needed because the test programs are already supporting that. Good news, specifying the Intel HD 4000 GPU program executes for quite a longer time (until it eventually crashes). There are some test cases that create kernel compile errors

gicmo@kaon clBLAS/build/tests % ./test-short --device "HD Graphics 4000"                                                           134 ↵ [gtest_osx|±1…]
Initialize OpenCL and clblas...
---- NVIDIA
---- Intel
SetUp: about to create command queues
---- Intel(R) Core(TM) i7-3820QM CPU @ 2.70GHz
---- GeForce GT 650M
---- HD Graphics 4000
SetUp: about to create command queues

Test environment:

Device name: HD Graphics 4000
Device vendor: Intel
Platform (bit): Apple OS X
clblas version: 2.1.0
Driver version: 1.2(Sep 19 2013 22:31:23)
Device version: OpenCL 1.2
Global mem size: 1024 MB
---------------------------------------------------------

[==========] Running 9808 tests from 124 test cases.
[----------] Global test environment set-up.
[----------] 4 tests from TRSM_extratest
[ RUN      ] TRSM_extratest.strsm
Calling reference xTRSM routine... Calling clblas xTRSM routine... Done
[       OK ] TRSM_extratest.strsm (305 ms)
[ RUN      ] TRSM_extratest.dtrsm
>> WARNING: The target device doesn't support native double precision floating point arithmetic
>> Test skipped
[       OK ] TRSM_extratest.dtrsm (0 ms)
[ RUN      ] TRSM_extratest.ctrsm
Calling reference xTRSM routine... Calling clblas xTRSM routine... Done
[       OK ] TRSM_extratest.ctrsm (328 ms)
[ RUN      ] TRSM_extratest.ztrsm
>> WARNING: The target device doesn't support native double precision floating point arithmetic
>> Test skipped
[       OK ] TRSM_extratest.ztrsm (0 ms)
[----------] 4 tests from TRSM_extratest (633 ms total)

[----------] 288 tests from ColumnMajor_SmallRange/GEMM
[ RUN      ] ColumnMajor_SmallRange/GEMM.sgemm/0
clblasColumnMajor, clblasNoTrans, clblasNoTrans
M = 63, N = 63, K = 63
offA = 0, offB = 0, offC = 0
lda = 63, ldb = 63, ldc = 63
seed = 12345
queues = 1
Generating input data... Done
Calling reference xGEMM routine... Done
Calling clblas xGEMM routine... Done
[       OK ] ColumnMajor_SmallRange/GEMM.sgemm/0 (316 ms)
[ RUN      ] ColumnMajor_SmallRange/GEMM.sgemm/1
clblasColumnMajor, clblasNoTrans, clblasNoTrans
M = 63, N = 63, K = 128
offA = 0, offB = 0, offC = 0
lda = 63, ldb = 128, ldc = 63
seed = 12345
queues = 1
>> Test is skipped because it has no importance for this level of coverage
[       OK ] ColumnMajor_SmallRange/GEMM.sgemm/1 (1 ms)
[ RUN      ] ColumnMajor_SmallRange/GEMM.sgemm/2
clblasColumnMajor, clblasNoTrans, clblasNoTrans
M = 63, N = 128, K = 63
offA = 0, offB = 0, offC = 0
lda = 63, ldb = 63, ldc = 63
seed = 12345
queues = 1
>> Test is skipped because it has no importance for this level of coverage
[       OK ] ColumnMajor_SmallRange/GEMM.sgemm/2 (0 ms)
[ RUN      ] ColumnMajor_SmallRange/GEMM.sgemm/3
clblasColumnMajor, clblasNoTrans, clblasNoTrans
M = 63, N = 128, K = 128
offA = 0, offB = 0, offC = 0
lda = 63, ldb = 128, ldc = 63
seed = 12345
queues = 1
>> Test is skipped because it has no importance for this level of coverage
[       OK ] ColumnMajor_SmallRange/GEMM.sgemm/3 (0 ms)
[ RUN      ] ColumnMajor_SmallRange/GEMM.sgemm/4
clblasColumnMajor, clblasNoTrans, clblasNoTrans
M = 128, N = 63, K = 63
offA = 0, offB = 0, offC = 0
lda = 128, ldb = 63, ldc = 128
seed = 12345
queues = 1
>> Test is skipped because it has no importance for this level of coverage
[       OK ] ColumnMajor_SmallRange/GEMM.sgemm/4 (0 ms)
[ RUN      ] ColumnMajor_SmallRange/GEMM.sgemm/5
clblasColumnMajor, clblasNoTrans, clblasNoTrans
M = 128, N = 63, K = 128
offA = 0, offB = 0, offC = 0
lda = 128, ldb = 128, ldc = 128
seed = 12345
queues = 1
>> Test is skipped because it has no importance for this level of coverage
[       OK ] ColumnMajor_SmallRange/GEMM.sgemm/5 (0 ms)
[ RUN      ] ColumnMajor_SmallRange/GEMM.sgemm/6
clblasColumnMajor, clblasNoTrans, clblasNoTrans
M = 128, N = 128, K = 63
offA = 0, offB = 0, offC = 0
lda = 128, ldb = 63, ldc = 128
seed = 12345
queues = 1
>> Test is skipped because it has no importance for this level of coverage
[       OK ] ColumnMajor_SmallRange/GEMM.sgemm/6 (0 ms)
[ RUN      ] ColumnMajor_SmallRange/GEMM.sgemm/7
clblasColumnMajor, clblasNoTrans, clblasNoTrans
M = 128, N = 128, K = 128
offA = 0, offB = 0, offC = 0
lda = 128, ldb = 128, ldc = 128
seed = 12345
queues = 1
Generating input data... Done
Calling reference xGEMM routine... Done
Calling clblas xGEMM routine... Done
[       OK ] ColumnMajor_SmallRange/GEMM.sgemm/7 (61 ms)
[ RUN      ] ColumnMajor_SmallRange/GEMM.sgemm/8
clblasColumnMajor, clblasNoTrans, clblasTrans
M = 63, N = 63, K = 63
offA = 0, offB = 0, offC = 0
lda = 63, ldb = 63, ldc = 63
seed = 12345
queues = 1
Generating input data... Done
Calling reference xGEMM routine... Done
Calling clblas xGEMM routine... Done
[       OK ] ColumnMajor_SmallRange/GEMM.sgemm/8 (1216 ms)
[ RUN      ] ColumnMajor_SmallRange/GEMM.sgemm/9
clblasColumnMajor, clblasNoTrans, clblasTrans
M = 63, N = 63, K = 128
offA = 0, offB = 0, offC = 0
lda = 63, ldb = 63, ldc = 63
seed = 12345
queues = 1
>> Test is skipped because it has no importance for this level of coverage
[       OK ] ColumnMajor_SmallRange/GEMM.sgemm/9 (1 ms)
[ RUN      ] ColumnMajor_SmallRange/GEMM.sgemm/10
clblasColumnMajor, clblasNoTrans, clblasTrans
M = 63, N = 128, K = 63
offA = 0, offB = 0, offC = 0
lda = 63, ldb = 128, ldc = 63
seed = 12345
queues = 1
>> Test is skipped because it has no importance for this level of coverage
[       OK ] ColumnMajor_SmallRange/GEMM.sgemm/10 (0 ms)
[ RUN      ] ColumnMajor_SmallRange/GEMM.sgemm/11
clblasColumnMajor, clblasNoTrans, clblasTrans
M = 63, N = 128, K = 128
offA = 0, offB = 0, offC = 0
lda = 63, ldb = 128, ldc = 63
seed = 12345
queues = 1
>> Test is skipped because it has no importance for this level of coverage
[       OK ] ColumnMajor_SmallRange/GEMM.sgemm/11 (0 ms)
[ RUN      ] ColumnMajor_SmallRange/GEMM.sgemm/12
clblasColumnMajor, clblasNoTrans, clblasTrans
M = 128, N = 63, K = 63
offA = 0, offB = 0, offC = 0
lda = 128, ldb = 63, ldc = 128
seed = 12345
queues = 1
>> Test is skipped because it has no importance for this level of coverage
[       OK ] ColumnMajor_SmallRange/GEMM.sgemm/12 (1 ms)
[ RUN      ] ColumnMajor_SmallRange/GEMM.sgemm/13
clblasColumnMajor, clblasNoTrans, clblasTrans
M = 128, N = 63, K = 128
offA = 0, offB = 0, offC = 0
lda = 128, ldb = 63, ldc = 128
seed = 12345
queues = 1
>> Test is skipped because it has no importance for this level of coverage
[       OK ] ColumnMajor_SmallRange/GEMM.sgemm/13 (0 ms)
[ RUN      ] ColumnMajor_SmallRange/GEMM.sgemm/14
clblasColumnMajor, clblasNoTrans, clblasTrans
M = 128, N = 128, K = 63
offA = 0, offB = 0, offC = 0
lda = 128, ldb = 128, ldc = 128
seed = 12345
queues = 1
>> Test is skipped because it has no importance for this level of coverage
[       OK ] ColumnMajor_SmallRange/GEMM.sgemm/14 (0 ms)
[ RUN      ] ColumnMajor_SmallRange/GEMM.sgemm/15
clblasColumnMajor, clblasNoTrans, clblasTrans
M = 128, N = 128, K = 128
offA = 0, offB = 0, offC = 0
lda = 128, ldb = 128, ldc = 128
seed = 12345
queues = 1
Generating input data... Done
Calling reference xGEMM routine... Done
Calling clblas xGEMM routine... Done
[       OK ] ColumnMajor_SmallRange/GEMM.sgemm/15 (208 ms)
[ RUN      ] ColumnMajor_SmallRange/GEMM.sgemm/16
clblasColumnMajor, clblasNoTrans, clblasConjTrans
M = 63, N = 63, K = 63
offA = 0, offB = 0, offC = 0
lda = 63, ldb = 63, ldc = 63
seed = 12345
queues = 1
>> Test is skipped because it has no importance for this level of coverage
[       OK ] ColumnMajor_SmallRange/GEMM.sgemm/16 (0 ms)
[ RUN      ] ColumnMajor_SmallRange/GEMM.sgemm/17
clblasColumnMajor, clblasNoTrans, clblasConjTrans
M = 63, N = 63, K = 128
offA = 0, offB = 0, offC = 0
lda = 63, ldb = 63, ldc = 63
seed = 12345
queues = 1
>> Test is skipped because it has no importance for this level of coverage
[       OK ] ColumnMajor_SmallRange/GEMM.sgemm/17 (1 ms)
[ RUN      ] ColumnMajor_SmallRange/GEMM.sgemm/18
clblasColumnMajor, clblasNoTrans, clblasConjTrans
M = 63, N = 128, K = 63
offA = 0, offB = 0, offC = 0
lda = 63, ldb = 128, ldc = 63
seed = 12345
queues = 1
>> Test is skipped because it has no importance for this level of coverage
[       OK ] ColumnMajor_SmallRange/GEMM.sgemm/18 (0 ms)
[ RUN      ] ColumnMajor_SmallRange/GEMM.sgemm/19
clblasColumnMajor, clblasNoTrans, clblasConjTrans
M = 63, N = 128, K = 128
offA = 0, offB = 0, offC = 0
lda = 63, ldb = 128, ldc = 63
seed = 12345
queues = 1
>> Test is skipped because it has no importance for this level of coverage
[       OK ] ColumnMajor_SmallRange/GEMM.sgemm/19 (0 ms)
[ RUN      ] ColumnMajor_SmallRange/GEMM.sgemm/20
clblasColumnMajor, clblasNoTrans, clblasConjTrans
M = 128, N = 63, K = 63
offA = 0, offB = 0, offC = 0
lda = 128, ldb = 63, ldc = 128
seed = 12345
queues = 1
>> Test is skipped because it has no importance for this level of coverage
[       OK ] ColumnMajor_SmallRange/GEMM.sgemm/20 (1 ms)
[ RUN      ] ColumnMajor_SmallRange/GEMM.sgemm/21
clblasColumnMajor, clblasNoTrans, clblasConjTrans
M = 128, N = 63, K = 128
offA = 0, offB = 0, offC = 0
lda = 128, ldb = 63, ldc = 128
seed = 12345
queues = 1
>> Test is skipped because it has no importance for this level of coverage
[       OK ] ColumnMajor_SmallRange/GEMM.sgemm/21 (0 ms)
[ RUN      ] ColumnMajor_SmallRange/GEMM.sgemm/22
clblasColumnMajor, clblasNoTrans, clblasConjTrans
M = 128, N = 128, K = 63
offA = 0, offB = 0, offC = 0
lda = 128, ldb = 128, ldc = 128
seed = 12345
queues = 1
>> Test is skipped because it has no importance for this level of coverage
[       OK ] ColumnMajor_SmallRange/GEMM.sgemm/22 (0 ms)
[ RUN      ] ColumnMajor_SmallRange/GEMM.sgemm/23
clblasColumnMajor, clblasNoTrans, clblasConjTrans
M = 128, N = 128, K = 128
offA = 0, offB = 0, offC = 0
lda = 128, ldb = 128, ldc = 128
seed = 12345
queues = 1
>> Test is skipped because it has no importance for this level of coverage
[       OK ] ColumnMajor_SmallRange/GEMM.sgemm/23 (1 ms)
[ RUN      ] ColumnMajor_SmallRange/GEMM.sgemm/24
clblasColumnMajor, clblasTrans, clblasNoTrans
M = 63, N = 63, K = 63
offA = 0, offB = 0, offC = 0
lda = 63, ldb = 63, ldc = 63
seed = 12345
queues = 1
Generating input data... Done
Calling reference xGEMM routine... Done
Calling clblas xGEMM routine...

[... lots of other tests ...]

Let me investigate the crash ...

gicmo commented 11 years ago

Mea culpa: The crash actually come from the fixes for the _dotu functions. I replaced the buggy code with correct calls to the corresponding cblas functions - as I should have in the first place.

The results for the Intel GPU are then:

[----------] Global test environment tear-down
[==========] 9808 tests from 124 test cases ran. (667661 ms total)
[  PASSED  ] 9288 tests.
[  FAILED  ] 520 tests, listed below:
      [... list of failed tests at: https://gist.github.com/gicmo/7355869 ...]
520 FAILED TESTS
./test-short --device "HD Graphics 4000"  14.43s user 4.16s system 2% cpu 11:08.36 total

BUT, I realised that I had applied the other "patch" where I commented to call to clGetProgramInfo in fullKernelSize out. Without that "fix" I again get the crash in clGetProgramInfo, no matter if I run it on the Nvidia GPU, the Intel GPU or even the CPU.

kknox commented 11 years ago

@gicmo If you could amend your last commit to include a one line change, i would be ready to merge this pull request. Basically, linking the Accelerate library does not require a fortran compiler to be present.

gicmo-gtest_osx *] $ git diff
diff --git a/src/CMakeLists.txt b/src/CMakeLists.txt
index 8897150..76547fc 100644
--- a/src/CMakeLists.txt
+++ b/src/CMakeLists.txt
@@ -38,7 +38,7 @@ if( CMAKE_GENERATOR MATCHES "NMake" )
 endif( )

 # If we are on linux, and we wish to link with the netlib BLAS implementation, we need to have a valid fortran compiler
-if( NOT CORR_TEST_WITH_ACML AND NOT WIN32)
+if( NOT CORR_TEST_WITH_ACML AND NOT WIN32 AND NOT APPLE )
   project(clBLAS Fortran C CXX )
 else( )
   project(clBLAS C CXX)
gicmo commented 11 years ago

@kknox Thanks! Good catch, I totally missed that fortran compiler section in the CMake file. I thought about the getCurrentTime() code I wrote again and think it was a poor decision to calculate the mach_time_base ratio upfront (see the long commit message for details). I hope you agree. (On my machine numer and denom are both 1 so it didn't matter at all.)

Btw, if you prefer a single squashed commit, I can also do that.

kknox commented 11 years ago

Hi @gicmo I think there is value in keeping two commits, the first wrt making gtest work on macosx, and the others mostly about the timer work that you did. So if you could squash f896445, 6ceb2f7, daaa650, 3af9051, 082350f

That might make the history easier to understand. With regard to MacOSX timers, I have not developed personal experience with them yet, so I don't have significant feedback. Your logic in your commit message sounded solid, and I surely appreciate a nano-second timer as an upgrade from the old clock.

Thank you for your willingness to dive into the code, much appreciated!

gicmo commented 11 years ago

Hey @kknox, very good idea. I followed your advice and squashed all the commits into two logically grouped ones. I forced pushed the squashed commits after rebasing to the same branch. I hope that was ok. Thank you guys for providing the open source library in the first place!