Open anadon opened 8 years ago
Hi @anadon
We are also tracking errors on the nvidia platform; can you check your failures with what we see on this arrayfire dashboard?
Let me know if you see significantly different errors.
One important 'gotcha' when running clblas unit tests; the only confirmed working reference implementations for correctness checking are either MKL or netlib blas. Make sure to link to either one of those when building test-correctness/test-short.
There are different errors if I'm reading things correctly. What should the next step be?
Check to see if the failure errors are really small; our unit tests expect the results to be bit-exact. That's why the reference implementation is best to be MKL or Netlib BLAS. On your system, look to see if the test failures that you see (in addition to the ones on the arrayfire dashboard) are different by only a factor of 10e-6. Those are usually floating point rounding errors we are marking as failures.
/home/campus14/jrmarsha/clBLASorigin/clBLAS/src/tests/correctness/corr-gemm.cpp:183: Failure Value of: err Actual: -36 Expected: 0 waitForSuccessfulFinish() [ FAILED ] SelectedBig_2/GEMM.dgemm/1, where GetParam() = (0, 1, 0, 5777, 5333, 3000, 48-byte object <00-00 00-00 00-00 00-00 00-00 00-00 00-00 00-00 00-00 00-00 00-00 00-00 00-00 00-00 00-00 00-00 00-00 00-00 00-00 00-00 00-00 00-00 00-00 00-00>, 1) (75583 ms)
But I'm more concerned about the 200,000+ occurrences of this: Failed to create/enqueue buffer for a matrix.
Almost no tests were actually run.
@anadon what gpu are you running on?
Quadro K620
Hi, Anadon Do you run on Linux? If so, checkout the develop branch. By my last pull request #274, you can verify the gemm result correctness against the Netlib CBLAS, by running the "client executable" which should stay in the /staging/ dir.
Did that -- I'm trying to get a test system that works with any case before I start throwing my experimental code at it. And I did pull from the most recent development branch.
by the client, you can specify any case (matrix size, transpose, ...) through command line. You do not necessarily code from scratch if you want to see the result correct or not.
...I just ran the full test correctness program? Is there something else I should have done to test it?
"client" is a complementary tool to check correctness. It allows users to check a specific case they are interested real quick on Linux. (see the pull request #274)
The "full test correctness program" has already predefined and hard coded different cases. It is supposed to run long time.
I recommend you run the client if you are testing gemm/trmm on Linux.
I'm trying to make sure I have a fully working environment before I start messing around with more code. I need to find out what is wrong with my setup or clBLAS, not limit my testing.
So I took the time to run the complete set of tests with the current devel branch on a linux system running Nvidia's 352.79 driver. One immediate issue is that I don't know what the appropriate way is to make the 415MB file available here. Next is dissecting what the series of errors mean and what configuration error, testing error, or divine intervention exists.