CNugteren / CLBlast

Tuned OpenCL BLAS
Apache License 2.0
1.06k stars 202 forks source link

Accuracy problem on Apple M1 and Intel(R) UHD Graphics 770 #542

Closed fengyuentau closed 5 months ago

fengyuentau commented 6 months ago

CLBlast on Apple M1 gives incorrect Sgemm results with sqaure mat of scale >= 1152.

macOS version: 14.4.1

Reproducer is available at https://github.com/fengyuentau/test-clblast.

Test results are shown at https://github.com/fengyuentau/test-clblast?tab=readme-ov-file#results, which are

scale: 1250, max_diff: 348.821442
sacle: 1125, OK
scale: 1187, max_diff: 330.725189
scale: 1156, max_diff: 320.881348
sacle: 1140, OK
sacle: 1148, OK
scale: 1152, max_diff: 323.137970
sacle: 1150, OK
sacle: 1151, OK

I tried to comment out tuning results for Apple M1 and it can give correct resutls this time. Would you accept a patch to revert tuning results for Apple M1?

fengyuentau commented 6 months ago

Update:

OS: Ubuntu 22.04.2 LTS


Also Cgemm results are incorrect on Intel(R) UHD Graphics 770 with scale >= 256. Code and restuls are updated already. Also see below:

scale: 550, real_max_diff: 318.123413, imag_max_diff: 320.759399
scale: 325, real_max_diff: 196.814056, imag_max_diff: 191.424683
sacle: 212, OK
scale: 268, real_max_diff: 162.766602, imag_max_diff: 165.656494
sacle: 240, OK
sacle: 254, OK
scale: 261, real_max_diff: 162.882080, imag_max_diff: 161.465424
scale: 257, real_max_diff: 155.064240, imag_max_diff: 157.468262
sacle: 255, OK
scale: 256, real_max_diff: 159.027313, imag_max_diff: 166.023514

Note that reverting tuning results for the platform does gives accurate results again.

CNugteren commented 6 months ago

Thanks for reporting this.

However, I see you wrote your own tests, but CLBlast already contains a large and sophisticated test suite. Can you run the relevant (original) CLBlast tests on your hardware for me and see if they also fail? If they don't fail, can you modify them to include the large matrices that you test in your own tests in the original CLBlast tests and re-run them?

fengyuentau commented 6 months ago

Three tests are failed on M1 in the SGEMM routine tests (See below). Other tests are fine. No failed tests after reverting tuning results.

Original code without modification:

./clblast_test_xgemm

* Options given/available:
    -platform 0 [=default]
    -device 0 [=default]
    -full_test [false]
    -verbose [false]
    -cblas 1 [=default]

* Running on OpenCL device 'Apple M1'.
* Starting tests for the 'SGEMM' routine. Legend:
   : -> Test produced correct results
   . -> Test returned the correct error code
   X -> Test produced incorrect results
   / -> Test returned an incorrect error code
   \ -> Test not executed: OpenCL-kernel compilation error
   o -> Test not executed: Unsupported precision
   - -> Test not completed: Reference CBLAS doesn't output error codes
* Testing with error margins of 0.5% (relative) and 0.001 (absolute)
* Testing 'regular behaviour' for '101 (row-major) 111 (regular) 111 (regular)':
   ::::::::----::::---:---:-------:::::::::----::::---:---:-------:
   Pass rate  46.9%: 30 passed / 34 skipped / 0 failed
* Testing 'regular behaviour' for '101 (row-major) 111 (regular) 112 (transposed)':
   ::::::::------::-:-:-:-:-------:::::::::------::-:-:-:-:-------:
   Pass rate  46.9%: 30 passed / 34 skipped / 0 failed
* Testing 'regular behaviour' for '101 (row-major) 112 (transposed) 111 (regular)':
   ::::::::::::::::---:---:---:---:----:::X----::::-------:-------:
   Error rate 78.09%: m=64 n=7 k=7 lda=64 ldb=64 ldc=64 offa=0 offb=0 offc=0 alpha=3.14 beta=3.14 
   Pass rate  45.3%: 29 passed / 34 skipped / 1 failed
* Testing 'regular behaviour' for '101 (row-major) 112 (transposed) 112 (transposed)':
   ::::::::--::--::-:-:-:-:---:---:----::::------::-----:-:-------:
   Pass rate  42.2%: 27 passed / 37 skipped / 0 failed
* Testing 'regular behaviour' for '102 (col-major) 111 (regular) 111 (regular)':
   ::::::::--::--::::::::::--::--::-----:-:-------:-----:-:-------:
   Pass rate  46.9%: 30 passed / 34 skipped / 0 failed
* Testing 'regular behaviour' for '102 (col-major) 111 (regular) 112 (transposed)':
   ::::::::::::::::--:X--::--::--::-----:-:-----:-:-------:-------:
   Error rate 77.74%: m=7 n=64 k=7 lda=7 ldb=64 ldc=64 offa=0 offb=0 offc=0 alpha=3.14 beta=3.14 
   Pass rate  45.3%: 29 passed / 34 skipped / 1 failed
* Testing 'regular behaviour' for '102 (col-major) 112 (transposed) 111 (regular)':
   ::::::::------::::::::::------::-:-:-:-:-------:-:-:-:-:-------:
   Pass rate  46.9%: 30 passed / 34 skipped / 0 failed
* Testing 'regular behaviour' for '102 (col-major) 112 (transposed) 112 (transposed)':
   ::::::::----::::--::--::------::-:-:-:-:-----:-:---:---X-------:
   Error rate 96.95%: m=64 n=64 k=7 lda=64 ldb=64 ldc=64 offa=0 offb=0 offc=0 alpha=3.14 beta=3.14 
   Pass rate  40.6%: 26 passed / 37 skipped / 1 failed
* Completed all test-cases for this routine. Results:
   231 test(s) passed
   278 test(s) skipped
   3 test(s) failed

...
fengyuentau commented 6 months ago

Many CGEMM tests are failed on Intel(R) UHD Graphics 770 (See below). Others are fine. Again no failed tests after reverting tuning results.

$ ./clblast_test_xgemm

...

* Running on OpenCL device 'Intel(R) UHD Graphics 770'.
* Starting tests for the 'CGEMM' routine. Legend:
   : -> Test produced correct results
   . -> Test returned the correct error code
   X -> Test produced incorrect results
   / -> Test returned an incorrect error code
   \ -> Test not executed: OpenCL-kernel compilation error
   o -> Test not executed: Unsupported precision
   - -> Test not completed: Reference CBLAS doesn't output error codes
* Testing with error margins of 0.5% (relative) and 0.001 (absolute)
* Testing 'regular behaviour' for '101 (row-major) 111 (regular) 111 (regular)':
   ::::::::----::::---X---X-------X::::::::----::::---X---X-------X
   Error rate 58.61%: m=7 n=64 k=7 lda=7 ldb=64 ldc=64 offa=0 offb=0 offc=0 alpha=2.42+3.14i beta=2.42+3.14i 
   Error rate 58.61%: m=7 n=64 k=7 lda=64 ldb=64 ldc=64 offa=0 offb=0 offc=0 alpha=2.42+3.14i beta=2.42+3.14i 
   Error rate 58.61%: m=7 n=64 k=64 lda=64 ldb=64 ldc=64 offa=0 offb=0 offc=0 alpha=2.42+3.14i beta=2.42+3.14i 
   Error rate 73.53%: m=64 n=64 k=7 lda=7 ldb=64 ldc=64 offa=0 offb=0 offc=0 alpha=2.42+3.14i beta=2.42+3.14i 
   Error rate 73.53%: m=64 n=64 k=7 lda=64 ldb=64 ldc=64 offa=0 offb=0 offc=0 alpha=2.42+3.14i beta=2.42+3.14i 
   Error rate 73.53%: m=64 n=64 k=64 lda=64 ldb=64 ldc=64 offa=0 offb=0 offc=0 alpha=2.42+3.14i beta=2.42+3.14i 
   Pass rate  37.5%: 24 passed / 34 skipped / 6 failed
* Testing 'regular behaviour' for '101 (row-major) 111 (regular) 112 (transposed)':
   ::::::::------::-X-X-X-X-------X::::::::------::-X-X-X-X-------X
   Error rate 58.61%: m=7 n=64 k=7 lda=7 ldb=7 ldc=64 offa=0 offb=0 offc=0 alpha=2.42+3.14i beta=2.42+3.14i 
   Error rate 58.61%: m=7 n=64 k=7 lda=7 ldb=64 ldc=64 offa=0 offb=0 offc=0 alpha=2.42+3.14i beta=2.42+3.14i 
   Error rate 58.61%: m=7 n=64 k=7 lda=64 ldb=7 ldc=64 offa=0 offb=0 offc=0 alpha=2.42+3.14i beta=2.42+3.14i 
   Error rate 58.61%: m=7 n=64 k=7 lda=64 ldb=64 ldc=64 offa=0 offb=0 offc=0 alpha=2.42+3.14i beta=2.42+3.14i 
   Error rate 58.61%: m=7 n=64 k=64 lda=64 ldb=64 ldc=64 offa=0 offb=0 offc=0 alpha=2.42+3.14i beta=2.42+3.14i 
   Error rate 73.53%: m=64 n=64 k=7 lda=7 ldb=7 ldc=64 offa=0 offb=0 offc=0 alpha=2.42+3.14i beta=2.42+3.14i 
   Error rate 73.53%: m=64 n=64 k=7 lda=7 ldb=64 ldc=64 offa=0 offb=0 offc=0 alpha=2.42+3.14i beta=2.42+3.14i 
   Error rate 73.53%: m=64 n=64 k=7 lda=64 ldb=7 ldc=64 offa=0 offb=0 offc=0 alpha=2.42+3.14i beta=2.42+3.14i 
   Error rate 73.53%: m=64 n=64 k=7 lda=64 ldb=64 ldc=64 offa=0 offb=0 offc=0 alpha=2.42+3.14i beta=2.42+3.14i 
   Error rate 73.53%: m=64 n=64 k=64 lda=64 ldb=64 ldc=64 offa=0 offb=0 offc=0 alpha=2.42+3.14i beta=2.42+3.14i 
   Pass rate  31.2%: 20 passed / 34 skipped / 10 failed
* Testing 'regular behaviour' for '101 (row-major) 111 (regular) 113 (conjugate)':
   ::::::::------::-X-X-X-X-------X::::::::------::-X-X-X-X-------X
   Error rate 58.61%: m=7 n=64 k=7 lda=7 ldb=7 ldc=64 offa=0 offb=0 offc=0 alpha=2.42+3.14i beta=2.42+3.14i 
   Error rate 58.61%: m=7 n=64 k=7 lda=7 ldb=64 ldc=64 offa=0 offb=0 offc=0 alpha=2.42+3.14i beta=2.42+3.14i 
   Error rate 58.61%: m=7 n=64 k=7 lda=64 ldb=7 ldc=64 offa=0 offb=0 offc=0 alpha=2.42+3.14i beta=2.42+3.14i 
   Error rate 58.61%: m=7 n=64 k=7 lda=64 ldb=64 ldc=64 offa=0 offb=0 offc=0 alpha=2.42+3.14i beta=2.42+3.14i 
   Error rate 58.61%: m=7 n=64 k=64 lda=64 ldb=64 ldc=64 offa=0 offb=0 offc=0 alpha=2.42+3.14i beta=2.42+3.14i 
   Error rate 73.53%: m=64 n=64 k=7 lda=7 ldb=7 ldc=64 offa=0 offb=0 offc=0 alpha=2.42+3.14i beta=2.42+3.14i 
   Error rate 73.53%: m=64 n=64 k=7 lda=7 ldb=64 ldc=64 offa=0 offb=0 offc=0 alpha=2.42+3.14i beta=2.42+3.14i 
   Error rate 73.53%: m=64 n=64 k=7 lda=64 ldb=7 ldc=64 offa=0 offb=0 offc=0 alpha=2.42+3.14i beta=2.42+3.14i 
   Error rate 73.53%: m=64 n=64 k=7 lda=64 ldb=64 ldc=64 offa=0 offb=0 offc=0 alpha=2.42+3.14i beta=2.42+3.14i 
   Error rate 73.53%: m=64 n=64 k=64 lda=64 ldb=64 ldc=64 offa=0 offb=0 offc=0 alpha=2.42+3.14i beta=2.42+3.14i 
   Pass rate  31.2%: 20 passed / 34 skipped / 10 failed
* Testing 'regular behaviour' for '101 (row-major) 112 (transposed) 111 (regular)':
   ::::::::::::::::---X---X---X---X----::::----::::-------X-------X
   Error rate 58.61%: m=7 n=64 k=7 lda=7 ldb=64 ldc=64 offa=0 offb=0 offc=0 alpha=2.42+3.14i beta=2.42+3.14i 
   Error rate 58.61%: m=7 n=64 k=7 lda=64 ldb=64 ldc=64 offa=0 offb=0 offc=0 alpha=2.42+3.14i beta=2.42+3.14i 
   Error rate 58.61%: m=7 n=64 k=64 lda=7 ldb=64 ldc=64 offa=0 offb=0 offc=0 alpha=2.42+3.14i beta=2.42+3.14i 
   Error rate 58.61%: m=7 n=64 k=64 lda=64 ldb=64 ldc=64 offa=0 offb=0 offc=0 alpha=2.42+3.14i beta=2.42+3.14i 
   Error rate 73.53%: m=64 n=64 k=7 lda=64 ldb=64 ldc=64 offa=0 offb=0 offc=0 alpha=2.42+3.14i beta=2.42+3.14i 
   Error rate 73.53%: m=64 n=64 k=64 lda=64 ldb=64 ldc=64 offa=0 offb=0 offc=0 alpha=2.42+3.14i beta=2.42+3.14i 
   Pass rate  37.5%: 24 passed / 34 skipped / 6 failed
* Testing 'regular behaviour' for '101 (row-major) 112 (transposed) 112 (transposed)':
   ::::::::--::--::-X-X-X-X---X---X----::::------::-----X-X-------X
   Error rate 58.61%: m=7 n=64 k=7 lda=7 ldb=7 ldc=64 offa=0 offb=0 offc=0 alpha=2.42+3.14i beta=2.42+3.14i 
   Error rate 58.61%: m=7 n=64 k=7 lda=7 ldb=64 ldc=64 offa=0 offb=0 offc=0 alpha=2.42+3.14i beta=2.42+3.14i 
   Error rate 58.61%: m=7 n=64 k=7 lda=64 ldb=7 ldc=64 offa=0 offb=0 offc=0 alpha=2.42+3.14i beta=2.42+3.14i 
   Error rate 58.61%: m=7 n=64 k=7 lda=64 ldb=64 ldc=64 offa=0 offb=0 offc=0 alpha=2.42+3.14i beta=2.42+3.14i 
   Error rate 58.61%: m=7 n=64 k=64 lda=7 ldb=64 ldc=64 offa=0 offb=0 offc=0 alpha=2.42+3.14i beta=2.42+3.14i 
   Error rate 58.61%: m=7 n=64 k=64 lda=64 ldb=64 ldc=64 offa=0 offb=0 offc=0 alpha=2.42+3.14i beta=2.42+3.14i 
   Error rate 73.53%: m=64 n=64 k=7 lda=64 ldb=7 ldc=64 offa=0 offb=0 offc=0 alpha=2.42+3.14i beta=2.42+3.14i 
   Error rate 73.53%: m=64 n=64 k=7 lda=64 ldb=64 ldc=64 offa=0 offb=0 offc=0 alpha=2.42+3.14i beta=2.42+3.14i 
   Error rate 73.53%: m=64 n=64 k=64 lda=64 ldb=64 ldc=64 offa=0 offb=0 offc=0 alpha=2.42+3.14i beta=2.42+3.14i 
   Pass rate  28.1%: 18 passed / 37 skipped / 9 failed
* Testing 'regular behaviour' for '101 (row-major) 112 (transposed) 113 (conjugate)':
   ::::::::--::--::-X-X-X-X---X---X----::::------::-----X-X-------X
   Error rate 58.61%: m=7 n=64 k=7 lda=7 ldb=7 ldc=64 offa=0 offb=0 offc=0 alpha=2.42+3.14i beta=2.42+3.14i 
   Error rate 58.61%: m=7 n=64 k=7 lda=7 ldb=64 ldc=64 offa=0 offb=0 offc=0 alpha=2.42+3.14i beta=2.42+3.14i 
   Error rate 58.61%: m=7 n=64 k=7 lda=64 ldb=7 ldc=64 offa=0 offb=0 offc=0 alpha=2.42+3.14i beta=2.42+3.14i 
   Error rate 58.61%: m=7 n=64 k=7 lda=64 ldb=64 ldc=64 offa=0 offb=0 offc=0 alpha=2.42+3.14i beta=2.42+3.14i 
   Error rate 58.61%: m=7 n=64 k=64 lda=7 ldb=64 ldc=64 offa=0 offb=0 offc=0 alpha=2.42+3.14i beta=2.42+3.14i 
   Error rate 58.61%: m=7 n=64 k=64 lda=64 ldb=64 ldc=64 offa=0 offb=0 offc=0 alpha=2.42+3.14i beta=2.42+3.14i 
   Error rate 73.53%: m=64 n=64 k=7 lda=64 ldb=7 ldc=64 offa=0 offb=0 offc=0 alpha=2.42+3.14i beta=2.42+3.14i 
   Error rate 73.53%: m=64 n=64 k=7 lda=64 ldb=64 ldc=64 offa=0 offb=0 offc=0 alpha=2.42+3.14i beta=2.42+3.14i 
   Error rate 73.53%: m=64 n=64 k=64 lda=64 ldb=64 ldc=64 offa=0 offb=0 offc=0 alpha=2.42+3.14i beta=2.42+3.14i 
   Pass rate  28.1%: 18 passed / 37 skipped / 9 failed
* Testing 'regular behaviour' for '101 (row-major) 113 (conjugate) 111 (regular)':
   ::::::::::::::::---X---X---X---X----::::----::::-------X-------X
   Error rate 58.61%: m=7 n=64 k=7 lda=7 ldb=64 ldc=64 offa=0 offb=0 offc=0 alpha=2.42+3.14i beta=2.42+3.14i 
   Error rate 58.61%: m=7 n=64 k=7 lda=64 ldb=64 ldc=64 offa=0 offb=0 offc=0 alpha=2.42+3.14i beta=2.42+3.14i 
   Error rate 58.61%: m=7 n=64 k=64 lda=7 ldb=64 ldc=64 offa=0 offb=0 offc=0 alpha=2.42+3.14i beta=2.42+3.14i 
   Error rate 58.61%: m=7 n=64 k=64 lda=64 ldb=64 ldc=64 offa=0 offb=0 offc=0 alpha=2.42+3.14i beta=2.42+3.14i 
   Error rate 73.53%: m=64 n=64 k=7 lda=64 ldb=64 ldc=64 offa=0 offb=0 offc=0 alpha=2.42+3.14i beta=2.42+3.14i 
   Error rate 73.53%: m=64 n=64 k=64 lda=64 ldb=64 ldc=64 offa=0 offb=0 offc=0 alpha=2.42+3.14i beta=2.42+3.14i 
   Pass rate  37.5%: 24 passed / 34 skipped / 6 failed
* Testing 'regular behaviour' for '101 (row-major) 113 (conjugate) 112 (transposed)':
   ::::::::--::--::-X-X-X-X---X---X----::::------::-----X-X-------X
   Error rate 58.61%: m=7 n=64 k=7 lda=7 ldb=7 ldc=64 offa=0 offb=0 offc=0 alpha=2.42+3.14i beta=2.42+3.14i 
   Error rate 58.61%: m=7 n=64 k=7 lda=7 ldb=64 ldc=64 offa=0 offb=0 offc=0 alpha=2.42+3.14i beta=2.42+3.14i 
   Error rate 58.61%: m=7 n=64 k=7 lda=64 ldb=7 ldc=64 offa=0 offb=0 offc=0 alpha=2.42+3.14i beta=2.42+3.14i 
   Error rate 58.61%: m=7 n=64 k=7 lda=64 ldb=64 ldc=64 offa=0 offb=0 offc=0 alpha=2.42+3.14i beta=2.42+3.14i 
   Error rate 58.61%: m=7 n=64 k=64 lda=7 ldb=64 ldc=64 offa=0 offb=0 offc=0 alpha=2.42+3.14i beta=2.42+3.14i 
   Error rate 58.61%: m=7 n=64 k=64 lda=64 ldb=64 ldc=64 offa=0 offb=0 offc=0 alpha=2.42+3.14i beta=2.42+3.14i 
   Error rate 73.53%: m=64 n=64 k=7 lda=64 ldb=7 ldc=64 offa=0 offb=0 offc=0 alpha=2.42+3.14i beta=2.42+3.14i 
   Error rate 73.53%: m=64 n=64 k=7 lda=64 ldb=64 ldc=64 offa=0 offb=0 offc=0 alpha=2.42+3.14i beta=2.42+3.14i 
   Error rate 73.53%: m=64 n=64 k=64 lda=64 ldb=64 ldc=64 offa=0 offb=0 offc=0 alpha=2.42+3.14i beta=2.42+3.14i 
   Pass rate  28.1%: 18 passed / 37 skipped / 9 failed
* Testing 'regular behaviour' for '101 (row-major) 113 (conjugate) 113 (conjugate)':
   ::::::::--::--::-X-X-X-X---X---X----::::------::-----X-X-------X
   Error rate 58.61%: m=7 n=64 k=7 lda=7 ldb=7 ldc=64 offa=0 offb=0 offc=0 alpha=2.42+3.14i beta=2.42+3.14i 
   Error rate 58.61%: m=7 n=64 k=7 lda=7 ldb=64 ldc=64 offa=0 offb=0 offc=0 alpha=2.42+3.14i beta=2.42+3.14i 
   Error rate 58.61%: m=7 n=64 k=7 lda=64 ldb=7 ldc=64 offa=0 offb=0 offc=0 alpha=2.42+3.14i beta=2.42+3.14i 
   Error rate 58.61%: m=7 n=64 k=7 lda=64 ldb=64 ldc=64 offa=0 offb=0 offc=0 alpha=2.42+3.14i beta=2.42+3.14i 
   Error rate 58.61%: m=7 n=64 k=64 lda=7 ldb=64 ldc=64 offa=0 offb=0 offc=0 alpha=2.42+3.14i beta=2.42+3.14i 
   Error rate 58.61%: m=7 n=64 k=64 lda=64 ldb=64 ldc=64 offa=0 offb=0 offc=0 alpha=2.42+3.14i beta=2.42+3.14i 
   Error rate 73.53%: m=64 n=64 k=7 lda=64 ldb=7 ldc=64 offa=0 offb=0 offc=0 alpha=2.42+3.14i beta=2.42+3.14i 
   Error rate 73.53%: m=64 n=64 k=7 lda=64 ldb=64 ldc=64 offa=0 offb=0 offc=0 alpha=2.42+3.14i beta=2.42+3.14i 
   Error rate 73.53%: m=64 n=64 k=64 lda=64 ldb=64 ldc=64 offa=0 offb=0 offc=0 alpha=2.42+3.14i beta=2.42+3.14i 
   Pass rate  28.1%: 18 passed / 37 skipped / 9 failed
* Testing 'regular behaviour' for '102 (col-major) 111 (regular) 111 (regular)':
   ::::::::--::--::XXXXXXXX--XX--XX-----:-:-------:-----X-X-------X
   Error rate 58.61%: m=7 n=64 k=7 lda=7 ldb=7 ldc=7 offa=0 offb=0 offc=0 alpha=2.42+3.14i beta=2.42+3.14i 
   Error rate 58.61%: m=7 n=64 k=7 lda=7 ldb=7 ldc=64 offa=0 offb=0 offc=0 alpha=2.42+3.14i beta=2.42+3.14i 
   Error rate 58.61%: m=7 n=64 k=7 lda=7 ldb=64 ldc=7 offa=0 offb=0 offc=0 alpha=2.42+3.14i beta=2.42+3.14i 
   Error rate 58.61%: m=7 n=64 k=7 lda=7 ldb=64 ldc=64 offa=0 offb=0 offc=0 alpha=2.42+3.14i beta=2.42+3.14i 
   Error rate 58.61%: m=7 n=64 k=7 lda=64 ldb=7 ldc=7 offa=0 offb=0 offc=0 alpha=2.42+3.14i beta=2.42+3.14i 
   Error rate 58.61%: m=7 n=64 k=7 lda=64 ldb=7 ldc=64 offa=0 offb=0 offc=0 alpha=2.42+3.14i beta=2.42+3.14i 
   Error rate 58.61%: m=7 n=64 k=7 lda=64 ldb=64 ldc=7 offa=0 offb=0 offc=0 alpha=2.42+3.14i beta=2.42+3.14i 
   Error rate 58.61%: m=7 n=64 k=7 lda=64 ldb=64 ldc=64 offa=0 offb=0 offc=0 alpha=2.42+3.14i beta=2.42+3.14i 
   Error rate 58.61%: m=7 n=64 k=64 lda=7 ldb=64 ldc=7 offa=0 offb=0 offc=0 alpha=2.42+3.14i beta=2.42+3.14i 
   Error rate 58.61%: m=7 n=64 k=64 lda=7 ldb=64 ldc=64 offa=0 offb=0 offc=0 alpha=2.42+3.14i beta=2.42+3.14i 
   Error rate 58.61%: m=7 n=64 k=64 lda=64 ldb=64 ldc=7 offa=0 offb=0 offc=0 alpha=2.42+3.14i beta=2.42+3.14i 
   Error rate 58.61%: m=7 n=64 k=64 lda=64 ldb=64 ldc=64 offa=0 offb=0 offc=0 alpha=2.42+3.14i beta=2.42+3.14i 
   Error rate 72.77%: m=64 n=64 k=7 lda=64 ldb=7 ldc=64 offa=0 offb=0 offc=0 alpha=2.42+3.14i beta=2.42+3.14i 
   Error rate 72.77%: m=64 n=64 k=7 lda=64 ldb=64 ldc=64 offa=0 offb=0 offc=0 alpha=2.42+3.14i beta=2.42+3.14i 
   Error rate 72.77%: m=64 n=64 k=64 lda=64 ldb=64 ldc=64 offa=0 offb=0 offc=0 alpha=2.42+3.14i beta=2.42+3.14i 
   Pass rate  23.4%: 15 passed / 34 skipped / 15 failed
* Testing 'regular behaviour' for '102 (col-major) 111 (regular) 112 (transposed)':
   ::::::::::::::::--XX--XX--XX--XX-----:-:-----:-:-------X-------X
   Error rate 58.61%: m=7 n=64 k=7 lda=7 ldb=64 ldc=7 offa=0 offb=0 offc=0 alpha=2.42+3.14i beta=2.42+3.14i 
   Error rate 58.61%: m=7 n=64 k=7 lda=7 ldb=64 ldc=64 offa=0 offb=0 offc=0 alpha=2.42+3.14i beta=2.42+3.14i 
   Error rate 58.61%: m=7 n=64 k=7 lda=64 ldb=64 ldc=7 offa=0 offb=0 offc=0 alpha=2.42+3.14i beta=2.42+3.14i 
   Error rate 58.61%: m=7 n=64 k=7 lda=64 ldb=64 ldc=64 offa=0 offb=0 offc=0 alpha=2.42+3.14i beta=2.42+3.14i 
   Error rate 58.61%: m=7 n=64 k=64 lda=7 ldb=64 ldc=7 offa=0 offb=0 offc=0 alpha=2.42+3.14i beta=2.42+3.14i 
   Error rate 58.61%: m=7 n=64 k=64 lda=7 ldb=64 ldc=64 offa=0 offb=0 offc=0 alpha=2.42+3.14i beta=2.42+3.14i 
   Error rate 58.61%: m=7 n=64 k=64 lda=64 ldb=64 ldc=7 offa=0 offb=0 offc=0 alpha=2.42+3.14i beta=2.42+3.14i 
   Error rate 58.61%: m=7 n=64 k=64 lda=64 ldb=64 ldc=64 offa=0 offb=0 offc=0 alpha=2.42+3.14i beta=2.42+3.14i 
   Error rate 72.77%: m=64 n=64 k=7 lda=64 ldb=64 ldc=64 offa=0 offb=0 offc=0 alpha=2.42+3.14i beta=2.42+3.14i 
   Error rate 72.77%: m=64 n=64 k=64 lda=64 ldb=64 ldc=64 offa=0 offb=0 offc=0 alpha=2.42+3.14i beta=2.42+3.14i 
   Pass rate  31.2%: 20 passed / 34 skipped / 10 failed
* Testing 'regular behaviour' for '102 (col-major) 111 (regular) 113 (conjugate)':
   ::::::::::::::::--XX--XX--XX--XX-----:-:-----:-:-------X-------X
   Error rate 58.61%: m=7 n=64 k=7 lda=7 ldb=64 ldc=7 offa=0 offb=0 offc=0 alpha=2.42+3.14i beta=2.42+3.14i 
   Error rate 58.61%: m=7 n=64 k=7 lda=7 ldb=64 ldc=64 offa=0 offb=0 offc=0 alpha=2.42+3.14i beta=2.42+3.14i 
   Error rate 58.61%: m=7 n=64 k=7 lda=64 ldb=64 ldc=7 offa=0 offb=0 offc=0 alpha=2.42+3.14i beta=2.42+3.14i 
   Error rate 58.61%: m=7 n=64 k=7 lda=64 ldb=64 ldc=64 offa=0 offb=0 offc=0 alpha=2.42+3.14i beta=2.42+3.14i 
   Error rate 58.61%: m=7 n=64 k=64 lda=7 ldb=64 ldc=7 offa=0 offb=0 offc=0 alpha=2.42+3.14i beta=2.42+3.14i 
   Error rate 58.61%: m=7 n=64 k=64 lda=7 ldb=64 ldc=64 offa=0 offb=0 offc=0 alpha=2.42+3.14i beta=2.42+3.14i 
   Error rate 58.61%: m=7 n=64 k=64 lda=64 ldb=64 ldc=7 offa=0 offb=0 offc=0 alpha=2.42+3.14i beta=2.42+3.14i 
   Error rate 58.61%: m=7 n=64 k=64 lda=64 ldb=64 ldc=64 offa=0 offb=0 offc=0 alpha=2.42+3.14i beta=2.42+3.14i 
   Error rate 72.77%: m=64 n=64 k=7 lda=64 ldb=64 ldc=64 offa=0 offb=0 offc=0 alpha=2.42+3.14i beta=2.42+3.14i 
   Error rate 72.77%: m=64 n=64 k=64 lda=64 ldb=64 ldc=64 offa=0 offb=0 offc=0 alpha=2.42+3.14i beta=2.42+3.14i 
   Pass rate  31.2%: 20 passed / 34 skipped / 10 failed
* Testing 'regular behaviour' for '102 (col-major) 112 (transposed) 111 (regular)':
   ::::::::------::XXXXXXXX------XX-:-:-:-:-------:-X-X-X-X-------X
   Error rate 58.61%: m=7 n=64 k=7 lda=7 ldb=7 ldc=7 offa=0 offb=0 offc=0 alpha=2.42+3.14i beta=2.42+3.14i 
   Error rate 58.61%: m=7 n=64 k=7 lda=7 ldb=7 ldc=64 offa=0 offb=0 offc=0 alpha=2.42+3.14i beta=2.42+3.14i 
   Error rate 58.61%: m=7 n=64 k=7 lda=7 ldb=64 ldc=7 offa=0 offb=0 offc=0 alpha=2.42+3.14i beta=2.42+3.14i 
   Error rate 58.61%: m=7 n=64 k=7 lda=7 ldb=64 ldc=64 offa=0 offb=0 offc=0 alpha=2.42+3.14i beta=2.42+3.14i 
   Error rate 58.61%: m=7 n=64 k=7 lda=64 ldb=7 ldc=7 offa=0 offb=0 offc=0 alpha=2.42+3.14i beta=2.42+3.14i 
   Error rate 58.61%: m=7 n=64 k=7 lda=64 ldb=7 ldc=64 offa=0 offb=0 offc=0 alpha=2.42+3.14i beta=2.42+3.14i 
   Error rate 58.61%: m=7 n=64 k=7 lda=64 ldb=64 ldc=7 offa=0 offb=0 offc=0 alpha=2.42+3.14i beta=2.42+3.14i 
   Error rate 58.61%: m=7 n=64 k=7 lda=64 ldb=64 ldc=64 offa=0 offb=0 offc=0 alpha=2.42+3.14i beta=2.42+3.14i 
   Error rate 58.61%: m=7 n=64 k=64 lda=64 ldb=64 ldc=7 offa=0 offb=0 offc=0 alpha=2.42+3.14i beta=2.42+3.14i 
   Error rate 58.61%: m=7 n=64 k=64 lda=64 ldb=64 ldc=64 offa=0 offb=0 offc=0 alpha=2.42+3.14i beta=2.42+3.14i 
   Error rate 72.77%: m=64 n=64 k=7 lda=7 ldb=7 ldc=64 offa=0 offb=0 offc=0 alpha=2.42+3.14i beta=2.42+3.14i 
   Error rate 72.77%: m=64 n=64 k=7 lda=7 ldb=64 ldc=64 offa=0 offb=0 offc=0 alpha=2.42+3.14i beta=2.42+3.14i 
   Error rate 72.77%: m=64 n=64 k=7 lda=64 ldb=7 ldc=64 offa=0 offb=0 offc=0 alpha=2.42+3.14i beta=2.42+3.14i 
   Error rate 72.77%: m=64 n=64 k=7 lda=64 ldb=64 ldc=64 offa=0 offb=0 offc=0 alpha=2.42+3.14i beta=2.42+3.14i 
   Error rate 72.77%: m=64 n=64 k=64 lda=64 ldb=64 ldc=64 offa=0 offb=0 offc=0 alpha=2.42+3.14i beta=2.42+3.14i 
   Pass rate  23.4%: 15 passed / 34 skipped / 15 failed
* Testing 'regular behaviour' for '102 (col-major) 112 (transposed) 112 (transposed)':
   ::::::::----::::--XX--XX------XX-:-:-:-:-----:-:---X---X-------X
   Error rate 58.61%: m=7 n=64 k=7 lda=7 ldb=64 ldc=7 offa=0 offb=0 offc=0 alpha=2.42+3.14i beta=2.42+3.14i 
   Error rate 58.61%: m=7 n=64 k=7 lda=7 ldb=64 ldc=64 offa=0 offb=0 offc=0 alpha=2.42+3.14i beta=2.42+3.14i 
   Error rate 58.61%: m=7 n=64 k=7 lda=64 ldb=64 ldc=7 offa=0 offb=0 offc=0 alpha=2.42+3.14i beta=2.42+3.14i 
   Error rate 58.61%: m=7 n=64 k=7 lda=64 ldb=64 ldc=64 offa=0 offb=0 offc=0 alpha=2.42+3.14i beta=2.42+3.14i 
   Error rate 58.61%: m=7 n=64 k=64 lda=64 ldb=64 ldc=7 offa=0 offb=0 offc=0 alpha=2.42+3.14i beta=2.42+3.14i 
   Error rate 58.61%: m=7 n=64 k=64 lda=64 ldb=64 ldc=64 offa=0 offb=0 offc=0 alpha=2.42+3.14i beta=2.42+3.14i 
   Error rate 72.77%: m=64 n=64 k=7 lda=7 ldb=64 ldc=64 offa=0 offb=0 offc=0 alpha=2.42+3.14i beta=2.42+3.14i 
   Error rate 72.77%: m=64 n=64 k=7 lda=64 ldb=64 ldc=64 offa=0 offb=0 offc=0 alpha=2.42+3.14i beta=2.42+3.14i 
   Error rate 72.77%: m=64 n=64 k=64 lda=64 ldb=64 ldc=64 offa=0 offb=0 offc=0 alpha=2.42+3.14i beta=2.42+3.14i 
   Pass rate  28.1%: 18 passed / 37 skipped / 9 failed
* Testing 'regular behaviour' for '102 (col-major) 112 (transposed) 113 (conjugate)':
   ::::::::----::::--XX--XX------XX-:-:-:-:-----:-:---X---X-------X
   Error rate 58.61%: m=7 n=64 k=7 lda=7 ldb=64 ldc=7 offa=0 offb=0 offc=0 alpha=2.42+3.14i beta=2.42+3.14i 
   Error rate 58.61%: m=7 n=64 k=7 lda=7 ldb=64 ldc=64 offa=0 offb=0 offc=0 alpha=2.42+3.14i beta=2.42+3.14i 
   Error rate 58.61%: m=7 n=64 k=7 lda=64 ldb=64 ldc=7 offa=0 offb=0 offc=0 alpha=2.42+3.14i beta=2.42+3.14i 
   Error rate 58.61%: m=7 n=64 k=7 lda=64 ldb=64 ldc=64 offa=0 offb=0 offc=0 alpha=2.42+3.14i beta=2.42+3.14i 
   Error rate 58.61%: m=7 n=64 k=64 lda=64 ldb=64 ldc=7 offa=0 offb=0 offc=0 alpha=2.42+3.14i beta=2.42+3.14i 
   Error rate 58.61%: m=7 n=64 k=64 lda=64 ldb=64 ldc=64 offa=0 offb=0 offc=0 alpha=2.42+3.14i beta=2.42+3.14i 
   Error rate 72.77%: m=64 n=64 k=7 lda=7 ldb=64 ldc=64 offa=0 offb=0 offc=0 alpha=2.42+3.14i beta=2.42+3.14i 
   Error rate 72.77%: m=64 n=64 k=7 lda=64 ldb=64 ldc=64 offa=0 offb=0 offc=0 alpha=2.42+3.14i beta=2.42+3.14i 
   Error rate 72.77%: m=64 n=64 k=64 lda=64 ldb=64 ldc=64 offa=0 offb=0 offc=0 alpha=2.42+3.14i beta=2.42+3.14i 
   Pass rate  28.1%: 18 passed / 37 skipped / 9 failed
* Testing 'regular behaviour' for '102 (col-major) 113 (conjugate) 111 (regular)':
   ::::::::------::XXXXXXXX------XX-:-:-:-:-------:-X-X-X-X-------X
   Error rate 58.61%: m=7 n=64 k=7 lda=7 ldb=7 ldc=7 offa=0 offb=0 offc=0 alpha=2.42+3.14i beta=2.42+3.14i 
   Error rate 58.61%: m=7 n=64 k=7 lda=7 ldb=7 ldc=64 offa=0 offb=0 offc=0 alpha=2.42+3.14i beta=2.42+3.14i 
   Error rate 58.61%: m=7 n=64 k=7 lda=7 ldb=64 ldc=7 offa=0 offb=0 offc=0 alpha=2.42+3.14i beta=2.42+3.14i 
   Error rate 58.61%: m=7 n=64 k=7 lda=7 ldb=64 ldc=64 offa=0 offb=0 offc=0 alpha=2.42+3.14i beta=2.42+3.14i 
   Error rate 58.61%: m=7 n=64 k=7 lda=64 ldb=7 ldc=7 offa=0 offb=0 offc=0 alpha=2.42+3.14i beta=2.42+3.14i 
   Error rate 58.61%: m=7 n=64 k=7 lda=64 ldb=7 ldc=64 offa=0 offb=0 offc=0 alpha=2.42+3.14i beta=2.42+3.14i 
   Error rate 58.61%: m=7 n=64 k=7 lda=64 ldb=64 ldc=7 offa=0 offb=0 offc=0 alpha=2.42+3.14i beta=2.42+3.14i 
   Error rate 58.61%: m=7 n=64 k=7 lda=64 ldb=64 ldc=64 offa=0 offb=0 offc=0 alpha=2.42+3.14i beta=2.42+3.14i 
   Error rate 58.61%: m=7 n=64 k=64 lda=64 ldb=64 ldc=7 offa=0 offb=0 offc=0 alpha=2.42+3.14i beta=2.42+3.14i 
   Error rate 58.61%: m=7 n=64 k=64 lda=64 ldb=64 ldc=64 offa=0 offb=0 offc=0 alpha=2.42+3.14i beta=2.42+3.14i 
   Error rate 72.77%: m=64 n=64 k=7 lda=7 ldb=7 ldc=64 offa=0 offb=0 offc=0 alpha=2.42+3.14i beta=2.42+3.14i 
   Error rate 72.77%: m=64 n=64 k=7 lda=7 ldb=64 ldc=64 offa=0 offb=0 offc=0 alpha=2.42+3.14i beta=2.42+3.14i 
   Error rate 72.77%: m=64 n=64 k=7 lda=64 ldb=7 ldc=64 offa=0 offb=0 offc=0 alpha=2.42+3.14i beta=2.42+3.14i 
   Error rate 72.77%: m=64 n=64 k=7 lda=64 ldb=64 ldc=64 offa=0 offb=0 offc=0 alpha=2.42+3.14i beta=2.42+3.14i 
   Error rate 72.77%: m=64 n=64 k=64 lda=64 ldb=64 ldc=64 offa=0 offb=0 offc=0 alpha=2.42+3.14i beta=2.42+3.14i 
   Pass rate  23.4%: 15 passed / 34 skipped / 15 failed
* Testing 'regular behaviour' for '102 (col-major) 113 (conjugate) 112 (transposed)':
   ::::::::----::::--XX--XX------XX-:-:-:-:-----:-:---X---X-------X
   Error rate 58.61%: m=7 n=64 k=7 lda=7 ldb=64 ldc=7 offa=0 offb=0 offc=0 alpha=2.42+3.14i beta=2.42+3.14i 
   Error rate 58.61%: m=7 n=64 k=7 lda=7 ldb=64 ldc=64 offa=0 offb=0 offc=0 alpha=2.42+3.14i beta=2.42+3.14i 
   Error rate 58.61%: m=7 n=64 k=7 lda=64 ldb=64 ldc=7 offa=0 offb=0 offc=0 alpha=2.42+3.14i beta=2.42+3.14i 
   Error rate 58.61%: m=7 n=64 k=7 lda=64 ldb=64 ldc=64 offa=0 offb=0 offc=0 alpha=2.42+3.14i beta=2.42+3.14i 
   Error rate 58.61%: m=7 n=64 k=64 lda=64 ldb=64 ldc=7 offa=0 offb=0 offc=0 alpha=2.42+3.14i beta=2.42+3.14i 
   Error rate 58.61%: m=7 n=64 k=64 lda=64 ldb=64 ldc=64 offa=0 offb=0 offc=0 alpha=2.42+3.14i beta=2.42+3.14i 
   Error rate 72.77%: m=64 n=64 k=7 lda=7 ldb=64 ldc=64 offa=0 offb=0 offc=0 alpha=2.42+3.14i beta=2.42+3.14i 
   Error rate 72.77%: m=64 n=64 k=7 lda=64 ldb=64 ldc=64 offa=0 offb=0 offc=0 alpha=2.42+3.14i beta=2.42+3.14i 
   Error rate 72.77%: m=64 n=64 k=64 lda=64 ldb=64 ldc=64 offa=0 offb=0 offc=0 alpha=2.42+3.14i beta=2.42+3.14i 
   Pass rate  28.1%: 18 passed / 37 skipped / 9 failed
* Testing 'regular behaviour' for '102 (col-major) 113 (conjugate) 113 (conjugate)':
   ::::::::----::::--XX--XX------XX-:-:-:-:-----:-:---X---X-------X
   Error rate 58.61%: m=7 n=64 k=7 lda=7 ldb=64 ldc=7 offa=0 offb=0 offc=0 alpha=2.42+3.14i beta=2.42+3.14i 
   Error rate 58.61%: m=7 n=64 k=7 lda=7 ldb=64 ldc=64 offa=0 offb=0 offc=0 alpha=2.42+3.14i beta=2.42+3.14i 
   Error rate 58.61%: m=7 n=64 k=7 lda=64 ldb=64 ldc=7 offa=0 offb=0 offc=0 alpha=2.42+3.14i beta=2.42+3.14i 
   Error rate 58.61%: m=7 n=64 k=7 lda=64 ldb=64 ldc=64 offa=0 offb=0 offc=0 alpha=2.42+3.14i beta=2.42+3.14i 
   Error rate 58.61%: m=7 n=64 k=64 lda=64 ldb=64 ldc=7 offa=0 offb=0 offc=0 alpha=2.42+3.14i beta=2.42+3.14i 
   Error rate 58.61%: m=7 n=64 k=64 lda=64 ldb=64 ldc=64 offa=0 offb=0 offc=0 alpha=2.42+3.14i beta=2.42+3.14i 
   Error rate 72.77%: m=64 n=64 k=7 lda=7 ldb=64 ldc=64 offa=0 offb=0 offc=0 alpha=2.42+3.14i beta=2.42+3.14i 
   Error rate 72.77%: m=64 n=64 k=7 lda=64 ldb=64 ldc=64 offa=0 offb=0 offc=0 alpha=2.42+3.14i beta=2.42+3.14i 
   Error rate 72.77%: m=64 n=64 k=64 lda=64 ldb=64 ldc=64 offa=0 offb=0 offc=0 alpha=2.42+3.14i beta=2.42+3.14i 
   Pass rate  28.1%: 18 passed / 37 skipped / 9 failed
* Completed all test-cases for this routine. Results:
   341 test(s) passed
   636 test(s) skipped
   175 test(s) failed

...
CNugteren commented 6 months ago

Thank you for running the tests. Perhaps this could be related to https://github.com/CNugteren/CLBlast/issues/533.

Since I don't have the same devices to test on as you do, I simply modified GetDeviceName and GetDeviceVendor in src/utilities/utilities.cpp to use tuning parameters for the M1 and UHD 770 on my own device. I did manage to reproduce the issues with the Intel UHD 770 exactly as you reported them locally, but I did not manage to reproduce the Apple M1 issue. So I'll need to dig deeper for the M1.

But first I'll try to solve the Intel UHD 770 issue. If I simply use the Intel GPU default parameters (in src/database/kernels/xgemm/xgemm_3232.hpp) the issue is resolved, so it is related to those values. They might be illegal (and thus there is a bug in the CLBlast tuner) or there might be a bug in the CLBlast kernels. It might be the same as the old issue https://github.com/CNugteren/CLBlast/issues/340. I'll investigate and let you know if there is progress.

fengyuentau commented 6 months ago

Let me know if I can help with the M1 issue.

Do we have options to build and use this library without tuning results?

CNugteren commented 6 months ago

Some initial results: when I revert https://github.com/CNugteren/CLBlast/pull/341, then the issue seems resolved, at least for a few tests I did. I'll do some more investigation and re-read the original #340 issue again, and will keep you updated.

Do we have options to build and use this library without tuning results?

Well it depends on what you mean with 'without tuning results', because it needs to use some set of parameters. What you can do is modify src/utilities/utilities.cpp as I mentioned above to mimic another device. You could then change to the default Apple GPU parameters for example (if you name your device 'Apple Non Existing Device' for example) or use the default-default parameters if you change your device vendor also to something non existent.

fengyuentau commented 6 months ago

Thank you for the quick update!

Well it depends on what you mean with 'without tuning results', because it needs to use some set of parameters.

We can add compile definition (e.g. HAVE_TUNING_RESULTS which can be controlled via CMake option and default to ON) then guard every tuning result except defaults with this macro. Below is an exmaple What do you think?

# CMakeLists.txt
option(WITH_TUNING_RESULTS "" ON)

if(WITH_TUNING_RESULTS)
  add_compile_definitions(-DHAVE_TUNING_RESULTS)
endif()

Then we can use #if HAVE_TUNING_RESULTS to guard tuning results except defaults.

CNugteren commented 6 months ago

This PR https://github.com/CNugteren/CLBlast/pull/543 likely solves the issue you reported on the Intel UHD 770. If you could try it out to confirm, that would be great!

The issue with the Apple M1 seems unrelated, since that device doesn't use this GEMMK=1 kernel that caused the issue. I also can't reproduce the issue on my own machine (non-Apple) by simply using the M1's tuning parameters, so there seems to be something else going on here. I'll have a think soon to see how we can debug this further.

fengyuentau commented 6 months ago

I will have a try later in this week. Thank you for the quick fix!

fengyuentau commented 5 months ago

@CNugteren I can confirm that https://github.com/CNugteren/CLBlast/pull/543 fixes the accuracy problem on the Intel UHD 770.

As for the Apple M1 accuracy problem, let me extend existing test in this repository to give you more results.

fengyuentau commented 5 months ago

@CNugteren I can also confirm that with https://github.com/CNugteren/CLBlast/pull/543 fixes the accuracy problem on Apple M1.

I guess we are done with this issue. Thank you for the quick response and updates and patches!