Closed halldm2000 closed 9 years ago
The code printing these differences is testing mass conservation, so yes, this issue is identical to issue #11
I'm not sure who put this check and print statements in the code - we should be sure to remove them once this issue is resolved.
I don't think they are identical. They are both dealing with mass conservation, but unlike the general conservation problems, these errors arise only with -cc numa_node. The #11 errors arise even without numa_node.
Sorry about my confusion! I haven't been able to keep up. Would you mind documenting exactly what you mean by these two errors?
To clarify, issue #12 was created to solve the problem that the mass before and after lim8 has changed. diff = mass2 -mass1. It produces endless output like this:
ie,k= 5 1 diff= 1.144409179687500E-005
sums= 19838327949.5933 19838327949.5932
ie,k= 5 3 diff= 1.144409179687500E-005
these difference messages only appear in the case described above when -cc numa_node is used.
In this context, a diff of order 1e-5, means a change in the 15 digits. Look how well the two sums agree. So I suspect this is actually ok. If the errors are all of this level, we should remove this check, as well as the associated tmp1 and tmp2 arrays.
(conservation errors, if still present, will be detected in our other diagnostics)
By the way, this check, and the tmp1 and tmp2 arrays, is not supposed to be part of the model. AT some point there must have been a bug, which someone tracked down to the limiter, and added this check during the debug phase. Perhaps the mini-app inherited this from standalone HOMME?
after fixing issue #11, still see non-conservation when using numa_node + column_omp by commenting out COLUMN_OMP loops in batches traced the problem to edge_mod.F90 by commenting those out in batches, traced the problem to first omp loop in edgeVunpack. replaced parallel loop over i with loop over k. (to avoid threads working on the same data.)
tested fix with run_ne_tests. mass conservation restored. error norms look good. plots look good.
Lots of difference values are printed in prim_advec after lim8 when COLUMN_OMP=true, and number of threads > 1 and -cc numa_node flag is used on Edison. Have examined the DCMIP solutions and found them to be damaged in this case. So the errors are real, not spurious. These errors might be related to issue #11.