Closed yantosca closed 5 years ago
Same issue when running it natively on the AMI?
Stop the instance, change its type to r5.24xlarge
, restart and run again. If that still dies then it is definitely not a (inadequate) memory problem...
So I ran again in the container in r5.24xlarge and now I get this error:
AGCM Date: 2016/07/01 Time: 00:10:00
At line 2731 of file /tutorial/gchp_standard/CodeDir/GCHP/ESMF/src/Superstructure/State/src/ESMF_StateAPI.F90
Fortran runtime error: End of record
Error termination. Backtrace:
#0 0x7f657849c2da in ???
#1 0x7f657849cec5 in ???
#2 0x7f657849d68d in ???
Primary job terminated normally, but 1 process returned
a non-zero exit code. Per user-direction, the job has been aborted.
--------------------------------------------------------------------------
--------------------------------------------------------------------------
mpirun detected that one or more processes exited with non-zero status, thus causing
the job to be terminated. The first process to do so was:
Process name: [[28387,1],0]
Exit code: 2
--------------------------------------------------------------------------
So it would appear to be an issue internal to MAPL. Or I might have run out of disk space. I requested 500 GB though.
Also I had run in the AMI itself earlier at c48 and had similar crashes to the
That's new message though:
> At line 2731 of file /tutorial/gchp_standard/CodeDir/GCHP/ESMF/src/Superstructure/State/src/ESMF_StateAPI.F90
Fortran runtime error: End of record
Haven't seen this ever before...
There are some references to this issue.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=20257 |https://stackoverflow.com/questions/29489388/end-of-record-error-when-saving-a-variable https://stackoverflow.com/questions/32684816/end-of-record-error-in-file-opening
It was a bug in gfortran but was supposed to be fixed in 4.1. But who knows
This issue seems to have been caused by an out-of-bounds error in the Olson landmap module, as described in https://github.com/geoschem/gchp/issues/13#issuecomment-449134471
Interesting! Why not happening in C24🤔 Can c48 run on AWS now?
So what appears to be happening is that the Olson landmap is not getting read in properly. This is happening in the code where State_Met%LandTypeFrac is populated from the OLSON Pointers from ExtData. Not sure why this is happening but it may be a MAPL issue. The OLSON data is read in by the custom code in MAPL to read in fraction of grid box (the "F:int" feature).
So while you can run on the cloud with the quick fix, I would avoid doing that until we understand the root cause of why the State_Met%LandTypeFrac is all zero.
I am closing this thread because the root cause is #15. Fixing #15 will fix this issue.
I ran a GCHP c48 run AWS cloud using
and it died after an hour.
In runConfig.sh:
The Docker commands were:
Tail end of log file:
I then commented out SpeciesConc_avg from the HISTORY.rc file and re-ran. Now, the only diagnostic active was SpeciesConc_inst. This also died at 1 hour:
This message:
might be indicative of an out-of-bounds error, perhaps where we deallocate arrays (or fields of State_* objects).