codeplaysoftware / portBLAS

An implementation of BLAS using the SYCL open standard.
Apache License 2.0
250 stars 48 forks source link

`blas1_rotmg_test` and `blas1_rotmg_test` fail #502

Closed fbarbari closed 4 months ago

fbarbari commented 6 months ago

Hello everyone, I was trying your library but the tests in the title fail.

Steps to reproduce:

git clone --recursive https://github.com/codeplaysoftware/portBLAS.git
cd portBLAS
export CC=icx
export CXX=icpx
cmake -S . -B build -DSYCL_COMPILER=dpcpp
cd build
make all
make test

blas1_rotg_test output:

Device vendor: Intel(R) Corporation
Device name: Intel(R) UHD Graphics 620
Device type: gpu
...
[ RUN      ] Rotg/RotgFloat.test/alloc_usm__api_async__a_340282346638528859811704183484516925440__b_minus_340282346638528859811704183484516925440
/tmp/portBLAS/test/unittest/blas1/blas1_rotg_test.cpp:93: Failure
Value of: utils::almost_equal(a, a_ref)
  Actual: false
Expected: true
[  FAILED  ] Rotg/RotgFloat.test/alloc_usm__api_async__a_340282346638528859811704183484516925440__b_minus_340282346638528859811704183484516925440, where GetParam() = ("usm", 4-byte object <00-00 00-00>, 3.40282e+38, -3.40282e+38) (0 ms)
...
[ RUN      ] Rotg/RotgFloat.test/alloc_usm__api_sync__a_340282346638528859811704183484516925440__b_minus_340282346638528859811704183484516925440
/tmp/portBLAS/test/unittest/blas1/blas1_rotg_test.cpp:93: Failure
Value of: utils::almost_equal(a, a_ref)
  Actual: false
Expected: true
[  FAILED  ] Rotg/RotgFloat.test/alloc_usm__api_sync__a_340282346638528859811704183484516925440__b_minus_340282346638528859811704183484516925440, where GetParam() = ("usm", 4-byte object <01-00 00-00>, 3.40282e+38, -3.40282e+38) (0 ms)
...
[ RUN      ] Rotg/RotgFloat.test/alloc_buf__api_async__a_340282346638528859811704183484516925440__b_minus_340282346638528859811704183484516925440
/tmp/portBLAS/test/unittest/blas1/blas1_rotg_test.cpp:93: Failure
Value of: utils::almost_equal(a, a_ref)
  Actual: false
Expected: true
[  FAILED  ] Rotg/RotgFloat.test/alloc_buf__api_async__a_340282346638528859811704183484516925440__b_minus_340282346638528859811704183484516925440, where GetParam() = ("buf", 4-byte object <00-00 00-00>, 3.40282e+38, -3.40282e+38) (1 ms)
...
[ RUN      ] Rotg/RotgFloat.test/alloc_buf__api_sync__a_340282346638528859811704183484516925440__b_minus_340282346638528859811704183484516925440
/tmp/portBLAS/test/unittest/blas1/blas1_rotg_test.cpp:93: Failure
Value of: utils::almost_equal(a, a_ref)
  Actual: false
Expected: true
[  FAILED  ] Rotg/RotgFloat.test/alloc_buf__api_sync__a_340282346638528859811704183484516925440__b_minus_340282346638528859811704183484516925440, where GetParam() = ("buf", 4-byte object <01-00 00-00>, 3.40282e+38, -3.40282e+38) (1 ms)

blas1_rotmg_test output:

Device vendor: Intel(R) Corporation
Device name: Intel(R) UHD Graphics 620
Device type: gpu
...
[ RUN      ] Rotmg_Usm/Rotmg_UsmFloat.test/alloc_usm__d1_2p9__d2_27431224__x1_1p50__y1_0p0__will_overflow_0
/tmp/portBLAS/test/unittest/blas1/blas1_rotmg_test.cpp:134: Failure
Value of: isAlmostEqual
  Actual: false
Expected: true
[  FAILED  ] Rotmg_Usm/Rotmg_UsmFloat.test/alloc_usm__d1_2p9__d2_27431224__x1_1p50__y1_0p0__will_overflow_0, where GetParam() = ("usm", 2.1, 2.74312e+07, 1.5, 5.72622e-08, false) (0 ms)
...
[ RUN      ] Rotmg_Buffer/Rotmg_BufferFloat.test/alloc_buf__d1_2p9__d2_27095732__x1_1p50__y1_0p0__will_overflow_0
/tmp/portBLAS/test/unittest/blas1/blas1_rotmg_test.cpp:134: Failure
Value of: isAlmostEqual
  Actual: false
Expected: true
[  FAILED  ] Rotmg_Buffer/Rotmg_BufferFloat.test/alloc_buf__d1_2p9__d2_27095732__x1_1p50__y1_0p0__will_overflow_0, where GetParam() = ("buf", 2.1, 2.70957e+07, 1.5, 5.46859e-08, false) (1 ms)

Ubuntu version: 22.04.1 Cmake version: 3.27.4 icpx version: Intel(R) oneAPI DPC++/C++ Compiler 2024.0.2 (2024.0.2.20231213) CPU: Intel(R) Core(TM) i5-8265U CPU @ 1.60GHz

muhammad-tanvir-1211 commented 6 months ago

Hello @fbarbari Thank you for your interest in portBLAS. Could you please share the name of the reference BLAS implementation you are using for these tests? There is a known issue with rotg and rotmg where some of the tests fail because of the usage of fast-math compiler directive documented here: https://github.com/codeplaysoftware/portBLAS/blob/fec888faae176bb2f86f3eaa9a4cd7739606052a/test/unittest/CMakeLists.txt#L106 and we had to explicitly specify not to use fast-math for these tests to make sure that these tests would pass. We will try to investigate these further and get back to you soon. Thanks.

fbarbari commented 6 months ago

Could you please share the name of the reference BLAS implementation you are using for these tests?

Cmake tells me this:

...
-- Found SystemBLAS: BLAS_LIBRARIES
...

Which I don't think is that much informative. I have installed OpenBLAS 0.3.24 on this system.

Rbiessy commented 5 months ago

Hello @fbarbari,

Looking at this issue I remember we had issues with these rotg and rotmg as we have found that they have many edge cases that are not well defined and libraries can give different output. In particular we have discussed this issue before with rotmg and OpenBLAS in https://github.com/codeplaysoftware/portBLAS/pull/376. In the end we decided to make our implementations match with netlib blas and cuBLAS. I have added a bit of documentation for this issue in https://github.com/codeplaysoftware/portBLAS/pull/506

I suggest using netlib blas if you want all tests to be green.

Rbiessy commented 5 months ago

I'm sorry I was mistaken somewhere. Locally OpenBLAS is giving correct results for these tests although I am using OpenBLAS 0.3.20 and my integrated GPU is UHD Graphics 770. I see there has been some changes regarding rotg in the OpenBLAS release notes. We will revisit how we want to approach this issue.

s-Nick commented 5 months ago

Hi @fbarbari, We looked into this issue and on our side everything works fine with openBLAS 0.3.26. We were able to reproduce test failure when the configuration uses a different BLAS library provided by oneAPI toolkit. To fix it and give more clarity there is PR #509 open. If you don't want to wait for it to be merged I suggest you to reconfigure and compile portBLAS adding two flags to specify openBLAS path: -DOPENBLAS_LIBRARIES=/path/to/openblas/lib and -DOPENBLAS_INCLUDE_DIRS=/path/to/openBLAS/include.

s-Nick commented 4 months ago

Hello @fbarbari, I think this issue is solved so I am going to close it. If you feel your problem is not resolved, please reopen it or open another issue for us. Thank you!