E3SM-Project / scream

Fork of E3SM used to develop exascale global atmosphere model written in C++
https://e3sm-project.github.io/scream/
Other
79 stars 56 forks source link

ne30 packsize=1 crash with NH on #1726

Open ambrad opened 2 years ago

ambrad commented 2 years ago

After NH mode was turned on, I started seeing

113: [DIRK] WARNING! Newton reached max iteration count, with deltaerr = nan

immediately in ne30 runs on the CPU with pack size set to 1.

ambrad commented 2 years ago

This is on Chrysalis. With Intel, I see this error for n30 with out-of-the-box settings except for pack size 1. With GNU, I haven't yet been able to reproduce this error; every configuration I've tried runs without a problem.

ndkeen commented 2 years ago

I was also going to report this, but wanted to first try repeating on cori-knl with Intel. I can run with Intel using default packsize.

Yep, same error on cori-knl with Intel. /global/cscratch1/sd/ndk/e3sm_scratch/cori-knl/se08-jun6/f30cpu.F2010-SCREAMv1.ne30_ne30.se08-jun6.intel.24s.n011b64x2.pack1.dd

6402: forrtl: error (76): Abort trap signal
6402: Image              PC                Routine            Line        Source
6402: e3sm.exe           0000000003E1F2F4  Unknown               Unknown  Unknown

@ndkeen

oksanaguba commented 2 years ago

Just to add -- i saw this error when i was running unstable namelist in homme. Yesterday, i opened logs for some of summit runs and did not see messages like this. Are they in e3sm log?

ambrad commented 2 years ago

Yes, they are in e3sm.log. On Summit, you're using GNU for all runs, right? GNU is fine so far on Chrysalis.

ambrad commented 2 years ago

Intel debug build, all other settings the same, runs without a problem.

ndkeen commented 2 years ago

How can I turn off NH to use what we had before?

ambrad commented 2 years ago

./atmchange theta_hydrostatic_mode=False