Closed zhisong closed 5 years ago
Sure I can help. Do you want me to try compiling and running some of the test cases? Also, do you know why it was crashing in your case?
Yes, please try it on test cases.
The DGETRF subroutine is used to compute the LU decomposition of the matrix dMA. The dimension of dMA is (0:NN,0:NN), an (NN+1)*(NN+1) matrix. The parameter LDA is the size of the first dimension of the matrix. Actually here, only (0:NN-1, 1:NN) of dMA has been used and the rest is just junk. When passing dMA to DGETRF, one can slice the matrix by putting dMA(0:NN-1, 1:NN) there and LDA=NN. However, this does not seem to work on my machine. The alternative is to put dMA(0:NN, 1:NN) and LDA=NN+1, so the last column dMA(NN, 1:NN) will be discarded inside DGETRF.
OK I understand. I tried compilation and execution and the test cases run without problems.
However, if I use the routine compare_spec_outputs.m
and compare the output of the G3V02L1Fi.001.sp test case (comparison between your branch and the master), I get
Dabs: 2.0961e-13
Drel: 6.6933e-09
Estimate for df: 6.5642e-18
The two outputs can be considered the same
which is quite good but I would have thought that the results are exactly the same (Dabs=0).
@jloizu I have a funny finding.
It seems that if I don't make clean then make after switching branches, even running the same file using the same branch will generate different results (although within tolerance).
If I switch branch, make clean then make, the lda_fix
branch is giving the same result as master
branch.
Dabs: 0 Drel: 0 Estimate for df: NaN The two outputs are exaclty the same
Can you please check this?
That is a funny finding indeed. However I tested in my case and it does not make any difference. Even if I do make clean before and after switching branches, I do not get exact agreement...
@jloizu I think this could be something to do with the version of Intel Fortran Compiler and MKL library. I tried the newest version on my machine (2019.3.199), the two branches produce exactly the same answer. If I use an older version (12.1.9.293), they will be different. What version of compiler and MKL are you using?
Also, note that the LU algorithm implemented by the Intel MKL library, DGETRF
, is not exact, but has an error bound |E| ≤c(min(m,n))ε P|L||U|
, where ε is the machine precision according to the following website:
https://software.intel.com/en-us/mkl-developer-reference-fortran-getrf
For solving a linear system of equation, the error could be found by applying another routine DGERFS
.
For the test case we've been running, the error is estimate to be 3e-13 (machine precision is 1e-16), which is reasonable.
@zhisong I am using Intel MKL 8.1
But since there is no randomness in the algorithm, the output should be the same (between two branches that are doing the same) regardless of the error in the LU algorithm, given the same input. Right? Or is the Lda_fix
branch using the LU algorithm differently?
Perhaps what @jonathanschilling mentioned is causing the differences? Namely, the compilation option -O3, which optimizes the order of operations and that can change for different code versions even if numerically equivalent, thus producing small differences in the output.
I tried with the -O0 compilation option and it still gives me the same (small) difference.
Sorry for messing up. I was not intended to push up these commits. I will remove this pull request and clean this up a bit.
I fixed a problem that crashes SPEC on raijin (Australian supercomputer). I am not sure if it creates problems on other machines. @jloizu @zhucaoxiang could you help me test out?