Open rhornung67 opened 1 year ago
All tests are passing when compiled in Debug
with the latest inte/2023.2.1-magic
compiler.
In Release
, the following tests are failing with wrong answers (more than just subtraction tests):
262 - test-launch-basic-param-expt-ReduceSum-OpenMP.exe (Failed)
263 - test-launch-basic-param-expt-ReduceMin-OpenMP.exe (Failed)
408 - test-tensor-matrix-int32_t-ColMajor-ET_Subtract.exe (Failed)
420 - test-tensor-matrix-int64_t-RowMajor-ET_Add.exe (Failed)
421 - test-tensor-matrix-int64_t-RowMajor-ET_Subtract.exe (Failed)
433 - test-tensor-matrix-int64_t-ColMajor-ET_Add.exe (Failed)
434 - test-tensor-matrix-int64_t-ColMajor-ET_Subtract.exe (Failed)
446 - test-tensor-matrix-float-RowMajor-ET_Add.exe (Failed)
447 - test-tensor-matrix-float-RowMajor-ET_Subtract.exe (Failed)
449 - test-tensor-matrix-float-RowMajor-ET_MatrixVector.exe (Failed)
459 - test-tensor-matrix-float-ColMajor-ET_Add.exe (Failed)
460 - test-tensor-matrix-float-ColMajor-ET_Subtract.exe (Failed)
461 - test-tensor-matrix-float-ColMajor-ET_Divide.exe (Failed)
472 - test-tensor-matrix-double-RowMajor-ET_Add.exe (Failed)
473 - test-tensor-matrix-double-RowMajor-ET_Subtract.exe (Failed)
485 - test-tensor-matrix-double-ColMajor-ET_Add.exe (Failed)
486 - test-tensor-matrix-double-ColMajor-ET_Subtract.exe (Failed)
487 - test-tensor-matrix-double-ColMajor-ET_Divide.exe (Failed)
It might be helpful if we could find an intel/2024
compiler to try, which may have fixes we need.
Yikes! We could try to request a newer intel compiler be installed on LC machines. It would help if some apps also wanted it.
Yeah, I'm trying with some other compilers to see if it's just an Intel problem. There was some talk at Sandia that the intel/2024 compiler had fixed many problems, but I don't see that compiler set anywhere on our machines.
Apart from the Intel compiler inaccuracies, there is definitely something wrong with test-tensor-matrix-*-ColMajor-ET_Subtract
. When built with clang/14.0.6-magic
, it looks like the tail of the vectorized computation is not being properly computed, and possibly is optimized out by the compiler in -O3
(works properly with -O0
, like with Intel). I'm looking further in to the cause.
4 tests fail for all/most CPU compilers. This should be investigated to see if it is a bug in the code or a compiler bug.