GEOS-ESM / MAPL

MAPL is a foundation layer of the GEOS architecture, whose original purpose is to supplement the Earth System Modeling Framework (ESMF)
https://geos-esm.github.io/MAPL/
Apache License 2.0
26 stars 17 forks source link

NAG runtime error: wrong values for Lons coordinates within subroutine stage2DLatLon (not a problem for gcc or intel) #2380

Open metdyn opened 1 year ago

metdyn commented 1 year ago

I have been with this problem for a few days. I am keeping a record for what I find and I plan to move on to the main project. This problem occurs with NAG compiler only (no issues for gcc or intel, thanks to @bena-nasa).

I used historyGC to output 2D and 3D field on CS grid. All the output variables in netCDF show correct value range, except the lat-lon grid coordinates. The minval for lons becomes -8.4280581075531867E+305. (I have changed real, allocatable :: lons to real(REAL64), allocatable :: lons. There is no difference changing this single vs. double precision variable. This problem is narrowed down to subroutine stage2DLatLon inside GriddedIO.F90.

I am running a test using griddedIO with a few print statement.

nag:
 GriddedIO.F90 863  stage2dlatlon: lons :  min, max =    0.3271623275453883   3.5622145049587107E+02
 GriddedIO.F90 863  stage2dlatlon: lons :  min, max =    2.1657371700710070E+02   3.0342628299289930E+02

 ServerThread.F90 870 message%var_name: lons minval(values_real64_1d) -8.4280581075531867E+305   3.5622145049587107E+02
AGCM1.rc
NX: 1
NY: 6

Root.GRID_TYPE: Cubed-Sphere
Root.GRIDNAME: PE24x144-CF
Root.LM: 3
Root.NF 6
Root.IM_WORLD: 24
GRID_LABELS:
::

COLLECTIONS: case1
::

  case1.template:  '%y4.nc4',
  case1.format:    'CFIO',
  case1.frequency:  010000,
  case1.duration: 000000
  case1.fields: 'VAR2D', 'Root',
                'VAR3D', 'Root',
                          ::
mathomp4 commented 1 year ago

The minval for lons becomes -8.4280581075531867E+305.

That is one massive, negative number...though not the smallest which would be E+308. How odd.

I'm going to mention @tclune as he knows more about NAG than any of us.

tclune commented 1 year ago

Still looks a lot like something uninitialized ...

tclune commented 1 year ago

The NAG correlation would possibly be either something that NAG does by copy-in copy-out and therefore does not "keep" the intended value and/or a stale pointer. Not that this helps a lot. The trick is to track down where that value is appearing and then it will probably be obvious. I doubt that this is a compiler defect.