Lda fix - Githubissues

PrincetonUniversity / SPEC

The Stepped-Pressure Equilibrium Code, an advanced MRxMHD equilibrium solver.

https://princetonuniversity.github.io/SPEC/

GNU General Public License v3.0

25 stars 6 forks source link

Lda fix #71

Closed zhisong closed 5 years ago

zhisong commented 5 years ago

I fixed a problem that crashes SPEC on raijin (Australian supercomputer). I am not sure if it creates problems on other machines. @jloizu @zhucaoxiang could you help me test out?

jloizu commented 5 years ago

Sure I can help. Do you want me to try compiling and running some of the test cases? Also, do you know why it was crashing in your case?

zhisong commented 5 years ago

Yes, please try it on test cases.

The DGETRF subroutine is used to compute the LU decomposition of the matrix dMA. The dimension of dMA is (0:NN,0:NN), an (NN+1)*(NN+1) matrix. The parameter LDA is the size of the first dimension of the matrix. Actually here, only (0:NN-1, 1:NN) of dMA has been used and the rest is just junk. When passing dMA to DGETRF, one can slice the matrix by putting dMA(0:NN-1, 1:NN) there and LDA=NN. However, this does not seem to work on my machine. The alternative is to put dMA(0:NN, 1:NN) and LDA=NN+1, so the last column dMA(NN, 1:NN) will be discarded inside DGETRF.

jloizu commented 5 years ago

OK I understand. I tried compilation and execution and the test cases run without problems. However, if I use the routine compare_spec_outputs.m and compare the output of the G3V02L1Fi.001.sp test case (comparison between your branch and the master), I get

Dabs:            2.0961e-13
Drel:            6.6933e-09
Estimate for df: 6.5642e-18
The two outputs can be considered the same

which is quite good but I would have thought that the results are exactly the same (Dabs=0).

zhisong commented 5 years ago

@jloizu I have a funny finding. It seems that if I don't make clean then make after switching branches, even running the same file using the same branch will generate different results (although within tolerance). If I switch branch, make clean then make, the lda_fix branch is giving the same result as master branch.

Dabs: 0 Drel: 0 Estimate for df: NaN The two outputs are exaclty the same

Can you please check this?

jloizu commented 5 years ago

That is a funny finding indeed. However I tested in my case and it does not make any difference. Even if I do make clean before and after switching branches, I do not get exact agreement...

zhisong commented 5 years ago

@jloizu I think this could be something to do with the version of Intel Fortran Compiler and MKL library. I tried the newest version on my machine (2019.3.199), the two branches produce exactly the same answer. If I use an older version (12.1.9.293), they will be different. What version of compiler and MKL are you using?

Also, note that the LU algorithm implemented by the Intel MKL library, DGETRF, is not exact, but has an error bound |E| ≤c(min(m,n))ε P|L||U|, where ε is the machine precision according to the following website: https://software.intel.com/en-us/mkl-developer-reference-fortran-getrf

For solving a linear system of equation, the error could be found by applying another routine DGERFS. For the test case we've been running, the error is estimate to be 3e-13 (machine precision is 1e-16), which is reasonable.

jloizu commented 5 years ago

@zhisong I am using Intel MKL 8.1

But since there is no randomness in the algorithm, the output should be the same (between two branches that are doing the same) regardless of the error in the LU algorithm, given the same input. Right? Or is the Lda_fix branch using the LU algorithm differently?

Perhaps what @jonathanschilling mentioned is causing the differences? Namely, the compilation option -O3, which optimizes the order of operations and that can change for different code versions even if numerically equivalent, thus producing small differences in the output.

jloizu commented 5 years ago

I tried with the -O0 compilation option and it still gives me the same (small) difference.

zhisong commented 5 years ago

Sorry for messing up. I was not intended to push up these commits. I will remove this pull request and clean this up a bit.