Closed ndkeen closed 2 years ago
@ndkeen I was just running on mappy and seeing probably the same thing. I was going to open an issue but will just add to this one if that's OK.
@bartgol this may be relevant to your PR #1656. I merged that PR into today's master, and I still get a /0.
Here are details on how to reproduce:
Relevant part of stack trace:
/home/ambradl/SCREAM/components/scream/src/share/util/scream_common_physics_functions_impl.hpp:88
/home/ambradl/SCREAM/components/scream/src/diagnostics/potential_temperature.cpp:62
Print statement showing some data:
if (p_mid(icol,jpack)[0] == 0)
fprintf(stderr,"amb> potential_temperature.cpp run_impl p_mid(icol,jpack) %d %d %e\n",icol,jpack,p_mid(icol,jpack)[0]);
yielding
amb> potential_temperature.cpp run_impl p_mid(icol,jpack) 0 0 0.000000e+00
Edit: Further printing, etc, with FPEs off shows that all of p_mid is 0 at intiialization but not during time stepping. Is it possible that the ne4 IC file is bad?
Edit: No, ncdump -v p_mid /sems-data-store/ACME/inputdata/atm/scream/init/init_ne4np4.nc
looks good.
What I am doing to set pack size=1 is edit components/scream/cmake/machine-files/cori-knl.cmake
and add:
set(SCREAM_PACK_SIZE 1 CACHE STRING "")
Thanks. But that's what I mean by "hard coding"; it modifies the repo state. The question is whether there is an xml/atmchange way of doing this so it's in one's run script rather than in a mod'ed repo.
Using a repo from May 19th, where the last change was 257a9d5dfb
, I also see this same div-by-zero.
Ah, p_mid
is not read from the input file, since Homme is supposed to compute at run time, and since homme runs before anything else, the AD does not see p_mid as a requirement. We can put an easy quick fix, namely turn the filed in Homme from 'Computed' to 'Updated'. That will require to have p_mid
in all input files, although from what I see this is probably already the case.
I'll take a look, and if I don't see a better solution, I will just change computet to updated.
I don't think we ever tried to change pack size for v1 cases. But ./xmlchange --append SCREAM_CMAKE_OPTIONS="SCREAM_PACK_SIZE 1"
seems to work:
$ ./xmlquery SCREAM_CMAKE_OPTIONS
SCREAM_CMAKE_OPTIONS: SCREAM_NP 4 SCREAM_NUM_VERTICAL_LEV 72 SCREAM_NUM_TRACERS 10
$ ./xmlchange --append SCREAM_CMAKE_OPTIONS="SCREAM_PACK_SIZE 1"
$ ./xmlquery SCREAM_CMAKE_OPTIONS
SCREAM_CMAKE_OPTIONS: SCREAM_NP 4 SCREAM_NUM_VERTICAL_LEV 72 SCREAM_NUM_TRACERS 10 SCREAM_PACK_SIZE 1
Actually, since this appears to happen only with packsize 1, it's probably not an IC issue, but, as pointed out before, a problem with the common phys funcitons implementation. I will focus on that first.
I didn't check for pack size > 1. It might happen then, too.
Well, we have our nightlies running packsize>1 on mappy, and they don't seem to pick up this error. That's why I speculated it was a packsize=1 issue.
But with size > 1, FPE is off, right?
Ah, right, this is an FPE thing. So yeah, I take it back, could be any pack size.
Oh good to know about that xmlchange option. I can look (or test) with recent repo versions if it helps to know when this might have started happening. I did just try with a repo from May 6th and I see same div-by-zero.
I don't see this error now (after #1669 was merged). Will close.
I'm assuming you meant 1669?
Trying with pack size 1 and DEBUG on cori yielded quick divide by zeros.
Not much in stack.