LLNL / RAJA

RAJA Performance Portability Layer (C++)
BSD 3-Clause "New" or "Revised" License
450 stars 102 forks source link

tensor matrix col-major subtract tests fail #1485

Open rhornung67 opened 1 year ago

rhornung67 commented 1 year ago

4 tests fail for all/most CPU compilers. This should be investigated to see if it is a bug in the code or a compiler bug.

rchen20 commented 3 months ago

All tests are passing when compiled in Debug with the latest inte/2023.2.1-magic compiler.

In Release, the following tests are failing with wrong answers (more than just subtraction tests):

        262 - test-launch-basic-param-expt-ReduceSum-OpenMP.exe (Failed)
        263 - test-launch-basic-param-expt-ReduceMin-OpenMP.exe (Failed)
        408 - test-tensor-matrix-int32_t-ColMajor-ET_Subtract.exe (Failed)
        420 - test-tensor-matrix-int64_t-RowMajor-ET_Add.exe (Failed)
        421 - test-tensor-matrix-int64_t-RowMajor-ET_Subtract.exe (Failed)
        433 - test-tensor-matrix-int64_t-ColMajor-ET_Add.exe (Failed)
        434 - test-tensor-matrix-int64_t-ColMajor-ET_Subtract.exe (Failed)
        446 - test-tensor-matrix-float-RowMajor-ET_Add.exe (Failed)
        447 - test-tensor-matrix-float-RowMajor-ET_Subtract.exe (Failed)
        449 - test-tensor-matrix-float-RowMajor-ET_MatrixVector.exe (Failed)
        459 - test-tensor-matrix-float-ColMajor-ET_Add.exe (Failed)
        460 - test-tensor-matrix-float-ColMajor-ET_Subtract.exe (Failed)
        461 - test-tensor-matrix-float-ColMajor-ET_Divide.exe (Failed)
        472 - test-tensor-matrix-double-RowMajor-ET_Add.exe (Failed)
        473 - test-tensor-matrix-double-RowMajor-ET_Subtract.exe (Failed)
        485 - test-tensor-matrix-double-ColMajor-ET_Add.exe (Failed)
        486 - test-tensor-matrix-double-ColMajor-ET_Subtract.exe (Failed)
        487 - test-tensor-matrix-double-ColMajor-ET_Divide.exe (Failed)

It might be helpful if we could find an intel/2024 compiler to try, which may have fixes we need.

rhornung67 commented 3 months ago

Yikes! We could try to request a newer intel compiler be installed on LC machines. It would help if some apps also wanted it.

rchen20 commented 3 months ago

Yeah, I'm trying with some other compilers to see if it's just an Intel problem. There was some talk at Sandia that the intel/2024 compiler had fixed many problems, but I don't see that compiler set anywhere on our machines.

rchen20 commented 3 months ago

Apart from the Intel compiler inaccuracies, there is definitely something wrong with test-tensor-matrix-*-ColMajor-ET_Subtract. When built with clang/14.0.6-magic, it looks like the tail of the vectorized computation is not being properly computed, and possibly is optimized out by the compiler in -O3 (works properly with -O0, like with Intel). I'm looking further in to the cause.