comparing two output files

jloizu commented 7 years ago

When we make changes to the code, sometimes we expect that the code should reproduce the same result for a given input file, e.g. if we are simply working on code-cleaning, or speeding up. However, bugs can non-intentionally be inserted while making these changes.

For this reason, it is good to have some automatic checking-procedure against a "control case". I wrote a simple matlab routine (which should be uploaded some time soon) that compares 2 output files from SPEC, "fname1.sp.h5" and "fname2.sp.h5", and calculates the maximum difference in the geometrical coefficients of all interfaces. One can calculate the absolute maximum difference, Dabs, and the relative maximum difference, Drel.

If the geometry was exaclty the same we would have Dabs=Drel=0. Also, if Dabs~1e-16, then we should still consider them "the same" because of machine precision. However, when I tried to apply this to the l2stell testcase, run with (1) the master branch, and (2) the ma00aa branch, I get

Dabs = 1e-14 Drel = 1e-9

which could be due to the fact that depending on the compiler, the algorithm loop order, etc., the accumulated errors can be different for two versions of the code that "presumably do the same".

So the question is "how to decide whether two outputs are to be considered the same"?

jloizu commented 7 years ago

Here is a possible strategy. Please let me know what do you think.

Assume that I take the output geometry of one run and use it as initial guess for the other run, and the code considers that the system is in equilibrium and force imbalance still f~1e-15.

The idea would be to consider that two outputs are "the same" if force-balance is preserved.

A possible way of quantifying this without additional runs is to use the following fact: the force, f, is essentially proportional to [[B^2]], and B^2 is roughly proportional to the metric g, which is roughly proportional to R^2. Thus

df = (df/dR)dR ~ R dR ~ Dabs^2 / Drel

where Dabs=dR and Drel~dR/R. This means that if we find that Dabs^2 / Drel < 1e-15 then the difference in geometry, dR, is irrelevant since it preserves force-balance.

For the example I examined, that gives

df ~ (1e-14)^2 / 1e-9 = 1e-19 < 1e-16

which anticipates that both equilibria are "the same equilibrium".

Let me know if that makes sense to you.

SRHudson commented 7 years ago

Whatever makes sense to you is fine by me. Perhaps we should reduce all the numerical tolerances, such as mupftol, Ndiscrete, Nquad, iMpol, iNtor, forcetol, c05xtol. The most important of these are probably forcetol and c05xtol, as these control the accuracy requirements of force balance. Getting agreement to machine precision requires eliminating every source of significant error. It might also be interesting to see what the difference is between running the exactly same executable (i.e., the same source) on exactly the same input file but on two different machines with different compilers. Also, what is the effect of changing some of the compilation flags, e.g. from -O2 to -O3?

jloizu commented 7 years ago

I agree that this is an interesting test. However, it should not matter how high is the numerical accuracy in the input: if an input (say, with low accuracy) is run with two executables that presumably perform the same calculation, then no matter the accuracy, if the input is the same, the output should be the same, up to machine precision. Of course, if one changes the input (say, higher accuracy) then one expects a different output, even for the same exact executable. However, it could be that there are differences arising because of how the compilation has been done. Below the result of some tests that confirm these thoughts.

I have been briefly investigating differences in the output when running: (1) same input file, same code source, but compiled in the local station (intel) or in the cluster (gfortran) (2) same input file, two different executables (master and ma00aa) compiled on the same environment. (3) same executable run with different numerics parameters (forcetol,c05xtol,iMpol,etc.) in the input.

Conclusions a) in all cases, (1)-(3), the outputs are not the exactly same when compared. b) when the input is the same, namely (1) and (2), differences are always very small, with Dabs ~ 1e-14, Drel ~ 1e-9, and the expected change in force-balance df ~ 1e-19 <1e-16, so according to my criterion, these can be considered "the same" (and, in fact, when I run one with the final geometry of the other one, the system is already considered in force-balance). c) when the input is changed in terms of numerical accuracy (especially when increasing iMpol,iNtor,Nquad,Ndiscrete) the outputs differ "substantially", with Dabs ~ 1e-8, Drel ~ 1e-5, and expected change in force-balance df ~ 1e-11>1e-16, so according to my criterion these cannot be considered "the same" (and, in fact, when I run one with the final geometry of the other one, the system is not considered in force-balance and starts iterating on f).

jloizu commented 7 years ago

This short study shows me that the differences between running the two branches master and ma00aa are as small as when comparing runs from the same branch compiled in different environments. Thus, we can consider that the branch ma00aa produces "the same" results as master (although faster).

I will merge the ma00aa branch into the master branch.

Also, I will upload the matlab routine for comparing output files, so that we can use it each time before merging branches. This will give us confidence and avoid (some) hidden bugs.

SRHudson commented 7 years ago

Thanks for this study. It is enlightening, and I can see why many still consider numerics to be the "dark magic" of science!

PrincetonUniversity / SPEC

comparing two output files #20