Closed gicmo closed 11 years ago
Just curious, did you get the test programs to compile with 'unix makefiles' or did you manage to get it all to build with 'Xcode'?
I was having problems with the combination of boost/gtest/xcode and unresolved symbols, I assume with the clang compiler that xcode natively uses.
This is indeed with the 'Unix Makefiles' cmake generator. But the 'Unix Makefiles' also use seem to be using clang:
-- The Fortran compiler identification is GNU
-- The C compiler identification is Clang 5.0.0
-- The CXX compiler identification is Clang 5.0.0
I haven't tried the XCode generator yet because I think cmake 2.8.12 produces xcode project files that Xcode 5.0 doesn't like: "Project /Users/gicmo/Coding/src/clBLAS/build-xcode/clBLAS.xcodeproj cannot be opened because the project file cannot be parsed."
Ah, btw, I just realised you might be right with your suspicion about clang. If any of the other two libraries (boost, gtest) were compiled with gcc (e.g. via homebrew on a pre-10.9 machine) then they are linked against libstdc++; on the other hand I think that clang will always default to its own libc++. Of course they are incompatible, and the symptoms often is undefined symbols (e.g. the string classes). I have had that before. Upgrading to 10.9 results in homebrew using clang and libc++, hence no issue.
Thank you, sounds like a great reason to upgrade to 10.9.
My gut says that your issue and #21 are the result of a platform problem, and I am investigating.
I will finish a code review of your pull request in a day or two.
I did some quick research and testing to see if we can provide better timing on OSX too. It seems the best approach is one based on mach_absolute_time[1]. Quick testing code: https://gist.github.com/gicmo/7335930
I have updated the branch. NB: the typedef is probably unnecessary because "sizeof(unsigned long) == sizeof(uint64_t)" is most likely is true on all modern macs but I think it is more clean that way.
[1] https://developer.apple.com/library/mac/qa/qa1398/_index.html
The good news is that I got your pull request to build on my laptop, but I am getting a completely different failure. However, it looks like my machine is almost completely different from your machine. Not just different hardware, but different opencl drivers. What I have below indicates a failure to compile the kernel on this machine, but it looks like it got farther than where you got on your machine.
$ lldb ./test-short
Current executable set to './test-short' (x86_64).
(lldb) r
Process 83382 launched: './test-short' (x86_64)
Initialize OpenCL and clblas...
---- AMD
SetUp: about to create command queues
Test environment:
Device name: ATI Radeon HD 6750M
Device vendor: AMD
Platform (bit): Apple OS X
clblas version: 2.1.0
Driver version: 1.0
Device version: OpenCL 1.1
Global mem size: 1024 MB
---------------------------------------------------------
[==========] Running 9808 tests from 124 test cases.
[----------] Global test environment set-up.
[----------] 4 tests from TRSM_extratest
[ RUN ] TRSM_extratest.strsm
Calling reference xTRSM routine... Calling clblas xTRSM routine...
========================================================
AN INTERNAL KERNEL BUILD ERROR OCCURRED!
device name = ATI Radeon HD 6750M
error = -11
memory pattern = 2-staged cached global memory based block trsm, computing kernel generator
Subproblem dimensions: dims[0].itemY = SUBDIM_UNUSED, dims[0].itemX = 8, dims[0].y = 32, dims[0].x = 8, dims[0].bwidth = 32; ; dims[1].itemY = 4, dims[1].itemX = 1, dims[1].y = 4, dims[1].x = 1, dims[1].bwidth = 4; ;
Parallelism granularity: pgran->wgDim = 1, pgran->wgSize[0] = 64, pgran->wgSize[1] = 1, pgran->wfSize = 64
Kernel extra flags: 671090480
Good to see you got it to compile, so at least the patch works :-) Yep, indeed OpenCL version and driver are different. Maybe I should modify the test so it selects the Intel gpu, to see what problems arise then.
Of course no modifications are needed because the test programs are already supporting that. Good news, specifying the Intel HD 4000 GPU program executes for quite a longer time (until it eventually crashes). There are some test cases that create kernel compile errors
gicmo@kaon clBLAS/build/tests % ./test-short --device "HD Graphics 4000" 134 ↵ [gtest_osx|±1…]
Initialize OpenCL and clblas...
---- NVIDIA
---- Intel
SetUp: about to create command queues
---- Intel(R) Core(TM) i7-3820QM CPU @ 2.70GHz
---- GeForce GT 650M
---- HD Graphics 4000
SetUp: about to create command queues
Test environment:
Device name: HD Graphics 4000
Device vendor: Intel
Platform (bit): Apple OS X
clblas version: 2.1.0
Driver version: 1.2(Sep 19 2013 22:31:23)
Device version: OpenCL 1.2
Global mem size: 1024 MB
---------------------------------------------------------
[==========] Running 9808 tests from 124 test cases.
[----------] Global test environment set-up.
[----------] 4 tests from TRSM_extratest
[ RUN ] TRSM_extratest.strsm
Calling reference xTRSM routine... Calling clblas xTRSM routine... Done
[ OK ] TRSM_extratest.strsm (305 ms)
[ RUN ] TRSM_extratest.dtrsm
>> WARNING: The target device doesn't support native double precision floating point arithmetic
>> Test skipped
[ OK ] TRSM_extratest.dtrsm (0 ms)
[ RUN ] TRSM_extratest.ctrsm
Calling reference xTRSM routine... Calling clblas xTRSM routine... Done
[ OK ] TRSM_extratest.ctrsm (328 ms)
[ RUN ] TRSM_extratest.ztrsm
>> WARNING: The target device doesn't support native double precision floating point arithmetic
>> Test skipped
[ OK ] TRSM_extratest.ztrsm (0 ms)
[----------] 4 tests from TRSM_extratest (633 ms total)
[----------] 288 tests from ColumnMajor_SmallRange/GEMM
[ RUN ] ColumnMajor_SmallRange/GEMM.sgemm/0
clblasColumnMajor, clblasNoTrans, clblasNoTrans
M = 63, N = 63, K = 63
offA = 0, offB = 0, offC = 0
lda = 63, ldb = 63, ldc = 63
seed = 12345
queues = 1
Generating input data... Done
Calling reference xGEMM routine... Done
Calling clblas xGEMM routine... Done
[ OK ] ColumnMajor_SmallRange/GEMM.sgemm/0 (316 ms)
[ RUN ] ColumnMajor_SmallRange/GEMM.sgemm/1
clblasColumnMajor, clblasNoTrans, clblasNoTrans
M = 63, N = 63, K = 128
offA = 0, offB = 0, offC = 0
lda = 63, ldb = 128, ldc = 63
seed = 12345
queues = 1
>> Test is skipped because it has no importance for this level of coverage
[ OK ] ColumnMajor_SmallRange/GEMM.sgemm/1 (1 ms)
[ RUN ] ColumnMajor_SmallRange/GEMM.sgemm/2
clblasColumnMajor, clblasNoTrans, clblasNoTrans
M = 63, N = 128, K = 63
offA = 0, offB = 0, offC = 0
lda = 63, ldb = 63, ldc = 63
seed = 12345
queues = 1
>> Test is skipped because it has no importance for this level of coverage
[ OK ] ColumnMajor_SmallRange/GEMM.sgemm/2 (0 ms)
[ RUN ] ColumnMajor_SmallRange/GEMM.sgemm/3
clblasColumnMajor, clblasNoTrans, clblasNoTrans
M = 63, N = 128, K = 128
offA = 0, offB = 0, offC = 0
lda = 63, ldb = 128, ldc = 63
seed = 12345
queues = 1
>> Test is skipped because it has no importance for this level of coverage
[ OK ] ColumnMajor_SmallRange/GEMM.sgemm/3 (0 ms)
[ RUN ] ColumnMajor_SmallRange/GEMM.sgemm/4
clblasColumnMajor, clblasNoTrans, clblasNoTrans
M = 128, N = 63, K = 63
offA = 0, offB = 0, offC = 0
lda = 128, ldb = 63, ldc = 128
seed = 12345
queues = 1
>> Test is skipped because it has no importance for this level of coverage
[ OK ] ColumnMajor_SmallRange/GEMM.sgemm/4 (0 ms)
[ RUN ] ColumnMajor_SmallRange/GEMM.sgemm/5
clblasColumnMajor, clblasNoTrans, clblasNoTrans
M = 128, N = 63, K = 128
offA = 0, offB = 0, offC = 0
lda = 128, ldb = 128, ldc = 128
seed = 12345
queues = 1
>> Test is skipped because it has no importance for this level of coverage
[ OK ] ColumnMajor_SmallRange/GEMM.sgemm/5 (0 ms)
[ RUN ] ColumnMajor_SmallRange/GEMM.sgemm/6
clblasColumnMajor, clblasNoTrans, clblasNoTrans
M = 128, N = 128, K = 63
offA = 0, offB = 0, offC = 0
lda = 128, ldb = 63, ldc = 128
seed = 12345
queues = 1
>> Test is skipped because it has no importance for this level of coverage
[ OK ] ColumnMajor_SmallRange/GEMM.sgemm/6 (0 ms)
[ RUN ] ColumnMajor_SmallRange/GEMM.sgemm/7
clblasColumnMajor, clblasNoTrans, clblasNoTrans
M = 128, N = 128, K = 128
offA = 0, offB = 0, offC = 0
lda = 128, ldb = 128, ldc = 128
seed = 12345
queues = 1
Generating input data... Done
Calling reference xGEMM routine... Done
Calling clblas xGEMM routine... Done
[ OK ] ColumnMajor_SmallRange/GEMM.sgemm/7 (61 ms)
[ RUN ] ColumnMajor_SmallRange/GEMM.sgemm/8
clblasColumnMajor, clblasNoTrans, clblasTrans
M = 63, N = 63, K = 63
offA = 0, offB = 0, offC = 0
lda = 63, ldb = 63, ldc = 63
seed = 12345
queues = 1
Generating input data... Done
Calling reference xGEMM routine... Done
Calling clblas xGEMM routine... Done
[ OK ] ColumnMajor_SmallRange/GEMM.sgemm/8 (1216 ms)
[ RUN ] ColumnMajor_SmallRange/GEMM.sgemm/9
clblasColumnMajor, clblasNoTrans, clblasTrans
M = 63, N = 63, K = 128
offA = 0, offB = 0, offC = 0
lda = 63, ldb = 63, ldc = 63
seed = 12345
queues = 1
>> Test is skipped because it has no importance for this level of coverage
[ OK ] ColumnMajor_SmallRange/GEMM.sgemm/9 (1 ms)
[ RUN ] ColumnMajor_SmallRange/GEMM.sgemm/10
clblasColumnMajor, clblasNoTrans, clblasTrans
M = 63, N = 128, K = 63
offA = 0, offB = 0, offC = 0
lda = 63, ldb = 128, ldc = 63
seed = 12345
queues = 1
>> Test is skipped because it has no importance for this level of coverage
[ OK ] ColumnMajor_SmallRange/GEMM.sgemm/10 (0 ms)
[ RUN ] ColumnMajor_SmallRange/GEMM.sgemm/11
clblasColumnMajor, clblasNoTrans, clblasTrans
M = 63, N = 128, K = 128
offA = 0, offB = 0, offC = 0
lda = 63, ldb = 128, ldc = 63
seed = 12345
queues = 1
>> Test is skipped because it has no importance for this level of coverage
[ OK ] ColumnMajor_SmallRange/GEMM.sgemm/11 (0 ms)
[ RUN ] ColumnMajor_SmallRange/GEMM.sgemm/12
clblasColumnMajor, clblasNoTrans, clblasTrans
M = 128, N = 63, K = 63
offA = 0, offB = 0, offC = 0
lda = 128, ldb = 63, ldc = 128
seed = 12345
queues = 1
>> Test is skipped because it has no importance for this level of coverage
[ OK ] ColumnMajor_SmallRange/GEMM.sgemm/12 (1 ms)
[ RUN ] ColumnMajor_SmallRange/GEMM.sgemm/13
clblasColumnMajor, clblasNoTrans, clblasTrans
M = 128, N = 63, K = 128
offA = 0, offB = 0, offC = 0
lda = 128, ldb = 63, ldc = 128
seed = 12345
queues = 1
>> Test is skipped because it has no importance for this level of coverage
[ OK ] ColumnMajor_SmallRange/GEMM.sgemm/13 (0 ms)
[ RUN ] ColumnMajor_SmallRange/GEMM.sgemm/14
clblasColumnMajor, clblasNoTrans, clblasTrans
M = 128, N = 128, K = 63
offA = 0, offB = 0, offC = 0
lda = 128, ldb = 128, ldc = 128
seed = 12345
queues = 1
>> Test is skipped because it has no importance for this level of coverage
[ OK ] ColumnMajor_SmallRange/GEMM.sgemm/14 (0 ms)
[ RUN ] ColumnMajor_SmallRange/GEMM.sgemm/15
clblasColumnMajor, clblasNoTrans, clblasTrans
M = 128, N = 128, K = 128
offA = 0, offB = 0, offC = 0
lda = 128, ldb = 128, ldc = 128
seed = 12345
queues = 1
Generating input data... Done
Calling reference xGEMM routine... Done
Calling clblas xGEMM routine... Done
[ OK ] ColumnMajor_SmallRange/GEMM.sgemm/15 (208 ms)
[ RUN ] ColumnMajor_SmallRange/GEMM.sgemm/16
clblasColumnMajor, clblasNoTrans, clblasConjTrans
M = 63, N = 63, K = 63
offA = 0, offB = 0, offC = 0
lda = 63, ldb = 63, ldc = 63
seed = 12345
queues = 1
>> Test is skipped because it has no importance for this level of coverage
[ OK ] ColumnMajor_SmallRange/GEMM.sgemm/16 (0 ms)
[ RUN ] ColumnMajor_SmallRange/GEMM.sgemm/17
clblasColumnMajor, clblasNoTrans, clblasConjTrans
M = 63, N = 63, K = 128
offA = 0, offB = 0, offC = 0
lda = 63, ldb = 63, ldc = 63
seed = 12345
queues = 1
>> Test is skipped because it has no importance for this level of coverage
[ OK ] ColumnMajor_SmallRange/GEMM.sgemm/17 (1 ms)
[ RUN ] ColumnMajor_SmallRange/GEMM.sgemm/18
clblasColumnMajor, clblasNoTrans, clblasConjTrans
M = 63, N = 128, K = 63
offA = 0, offB = 0, offC = 0
lda = 63, ldb = 128, ldc = 63
seed = 12345
queues = 1
>> Test is skipped because it has no importance for this level of coverage
[ OK ] ColumnMajor_SmallRange/GEMM.sgemm/18 (0 ms)
[ RUN ] ColumnMajor_SmallRange/GEMM.sgemm/19
clblasColumnMajor, clblasNoTrans, clblasConjTrans
M = 63, N = 128, K = 128
offA = 0, offB = 0, offC = 0
lda = 63, ldb = 128, ldc = 63
seed = 12345
queues = 1
>> Test is skipped because it has no importance for this level of coverage
[ OK ] ColumnMajor_SmallRange/GEMM.sgemm/19 (0 ms)
[ RUN ] ColumnMajor_SmallRange/GEMM.sgemm/20
clblasColumnMajor, clblasNoTrans, clblasConjTrans
M = 128, N = 63, K = 63
offA = 0, offB = 0, offC = 0
lda = 128, ldb = 63, ldc = 128
seed = 12345
queues = 1
>> Test is skipped because it has no importance for this level of coverage
[ OK ] ColumnMajor_SmallRange/GEMM.sgemm/20 (1 ms)
[ RUN ] ColumnMajor_SmallRange/GEMM.sgemm/21
clblasColumnMajor, clblasNoTrans, clblasConjTrans
M = 128, N = 63, K = 128
offA = 0, offB = 0, offC = 0
lda = 128, ldb = 63, ldc = 128
seed = 12345
queues = 1
>> Test is skipped because it has no importance for this level of coverage
[ OK ] ColumnMajor_SmallRange/GEMM.sgemm/21 (0 ms)
[ RUN ] ColumnMajor_SmallRange/GEMM.sgemm/22
clblasColumnMajor, clblasNoTrans, clblasConjTrans
M = 128, N = 128, K = 63
offA = 0, offB = 0, offC = 0
lda = 128, ldb = 128, ldc = 128
seed = 12345
queues = 1
>> Test is skipped because it has no importance for this level of coverage
[ OK ] ColumnMajor_SmallRange/GEMM.sgemm/22 (0 ms)
[ RUN ] ColumnMajor_SmallRange/GEMM.sgemm/23
clblasColumnMajor, clblasNoTrans, clblasConjTrans
M = 128, N = 128, K = 128
offA = 0, offB = 0, offC = 0
lda = 128, ldb = 128, ldc = 128
seed = 12345
queues = 1
>> Test is skipped because it has no importance for this level of coverage
[ OK ] ColumnMajor_SmallRange/GEMM.sgemm/23 (1 ms)
[ RUN ] ColumnMajor_SmallRange/GEMM.sgemm/24
clblasColumnMajor, clblasTrans, clblasNoTrans
M = 63, N = 63, K = 63
offA = 0, offB = 0, offC = 0
lda = 63, ldb = 63, ldc = 63
seed = 12345
queues = 1
Generating input data... Done
Calling reference xGEMM routine... Done
Calling clblas xGEMM routine...
[... lots of other tests ...]
Let me investigate the crash ...
Mea culpa: The crash actually come from the fixes for the _dotu functions. I replaced the buggy code with correct calls to the corresponding cblas functions - as I should have in the first place.
The results for the Intel GPU are then:
[----------] Global test environment tear-down
[==========] 9808 tests from 124 test cases ran. (667661 ms total)
[ PASSED ] 9288 tests.
[ FAILED ] 520 tests, listed below:
[... list of failed tests at: https://gist.github.com/gicmo/7355869 ...]
520 FAILED TESTS
./test-short --device "HD Graphics 4000" 14.43s user 4.16s system 2% cpu 11:08.36 total
BUT, I realised that I had applied the other "patch" where I commented to call to clGetProgramInfo in fullKernelSize out. Without that "fix" I again get the crash in clGetProgramInfo, no matter if I run it on the Nvidia GPU, the Intel GPU or even the CPU.
@gicmo If you could amend your last commit to include a one line change, i would be ready to merge this pull request. Basically, linking the Accelerate library does not require a fortran compiler to be present.
gicmo-gtest_osx *] $ git diff
diff --git a/src/CMakeLists.txt b/src/CMakeLists.txt
index 8897150..76547fc 100644
--- a/src/CMakeLists.txt
+++ b/src/CMakeLists.txt
@@ -38,7 +38,7 @@ if( CMAKE_GENERATOR MATCHES "NMake" )
endif( )
# If we are on linux, and we wish to link with the netlib BLAS implementation, we need to have a valid fortran compiler
-if( NOT CORR_TEST_WITH_ACML AND NOT WIN32)
+if( NOT CORR_TEST_WITH_ACML AND NOT WIN32 AND NOT APPLE )
project(clBLAS Fortran C CXX )
else( )
project(clBLAS C CXX)
@kknox Thanks! Good catch, I totally missed that fortran compiler section in the CMake file. I thought about the getCurrentTime() code I wrote again and think it was a poor decision to calculate the mach_time_base ratio upfront (see the long commit message for details). I hope you agree. (On my machine numer and denom are both 1 so it didn't matter at all.)
Btw, if you prefer a single squashed commit, I can also do that.
Hi @gicmo I think there is value in keeping two commits, the first wrt making gtest work on macosx, and the others mostly about the timer work that you did. So if you could squash f896445, 6ceb2f7, daaa650, 3af9051, 082350f
That might make the history easier to understand. With regard to MacOSX timers, I have not developed personal experience with them yet, so I don't have significant feedback. Your logic in your commit message sounded solid, and I surely appreciate a nano-second timer as an upgrade from the old clock.
Thank you for your willingness to dive into the code, much appreciated!
Hey @kknox, very good idea. I followed your advice and squashed all the commits into two logically grouped ones. I forced pushed the squashed commits after rebasing to the same branch. I hope that was ok. Thank you guys for providing the open source library in the first place!
I changed the cmake files and some of the source files to make clBLAS test suit compile on OS-X (using the Accelerate Framework).
Without any modification to the actual source of clBLAS all tests fail due to the same issue as reported in issue #21: