Open briandobbins opened 4 years ago
@jtruesdal @Katetc This is an error in MG3 using SCAM. I've assigned both of you since I'm not sure which code base is the one responsible for the error.
That's a great stack trace. It points to this line in MG3:
if (lamr(i,k) > qsmall .and. 1._r8/lamr(i,k) < Dcs) then
Which is probably the same issue as Steve has added to the PUMAS repo here: https://github.com/ESCOMP/PUMAS/issues/8 "Invalid code logic tripping up some compilers"
So, we are aware of the general issue in PUMAS, and glad to have a simple case that reproduces the problem here! Also tagging @andrewgettelman .
Also, you can leave me as the main assignee. I'll fix this and add a test for it going forward when we tackle the PUMAS issue.
Thanks Brian! I mentioned this to Hugh as well.
I'm happy to try to help fix this if needed. So it's ever .and. and .or. conditional? Or just those that might trigger a divide by zero error?
FYI, the fix Kate mentions works for this case.
Do we want to make a PR specifically for this, or allow the larger PUMAS issue to tackle it?
Thanks Brian! I mentioned this to Hugh as well.
I'm happy to try to help fix this if needed. So it's ever .and. and .or. conditional? Or just those that might trigger a divide by zero error?
The way to figure out whether the .and. or .or. needs to be split is to look at each section and see if it can always be evaluated independently without any other section. If not, then it needs to be contained in its own if statement with an outer if statement to eliminate the invalid condition(s).
Tagging @hmorrison100 on this as well so he sees it.
I believe that the PUMAS issue for this is ESCOMP/PUMAS#8. Keeping this issue open so that when the fix is tagged in PUMAS, we can update the Externals_CAM.cfg file.
@katec - Has this issue been addressed and should it be closed?
@katec - Has this issue been addressed and should it be closed?
@katec - Has this issue been addressed and should it be closed?
Yes, it was fixed in pumas tag pumas_cam-release_v1.13 and cam tag 6_3_017.
@Katetc - We are revisiting this, and I see that the original question says that it was an error in the cesm2_2 branch. I see that that branch is using puams_cam-releasev1.3, so it probably isn't fixed for that branch. Should it be and if so, can we just jump to v1.13 or will it require some work from someone to take that big a leap with pumas?
Running SCAM with the GNU compilers with DEBUG=TRUE results in an error in CESM 2.2, but works in CESM 2.1.3.
Tested on Cheyenne:
CESM 2.1.3, Intel compiler, DEBUG=FALSE - works fine CESM 2.1.3, Intel compiler, DEBUG=TRUE - works fine CESM 2.1.3, GNU compiler, DEBUG=FALSE - works fine CESM 2.1.3, GNU compiler, DEBUG=TRUE - works fine
CESM 2.2.0, Intel compiler, DEBUG=FALSE - works fine CESM 2.2.0, Intel compiler, DEBUG=TRUE - works fine CESM 2.2.0, GNU compiler, DEBUG=FALSE - works fine CESM 2.2.0, GNU compiler, DEBUG=TRUE - fail, with the message below
To reproduce the failure in the CESM 2.2 release, do:
_export CESM22ROOT=<path to CESM 2.2 checkout> ${CESM22ROOT}/cime/scripts/create_newcase --compset FSCAM --res T42_T42 --compiler gnu --case foo --user-mods-dir ${CESM22ROOT}/components/cam/cime_config/usermods_dirs/scam_arm97 --run-unsupported cd foo ./xmlchange DEBUG=TRUE,PIO_TYPENAME=netcdf,STOP_N=1,STOPOPTION=ndays ./case.setup ./case.build ./case.submit
I've not tested other IOPs, just arm97. I'm going to dig into this at some point, but I'm not familiar with the SCAM code base, so I thought others might have a quick solution or at least ideas.