geoschem / gchp_legacy

Repository for GEOS-Chem High Performance: software that enables running GEOS-Chem on a cubed-sphere grid with MPI parallelization.
http://wiki.geos-chem.org/GEOS-Chem_HP
Other
7 stars 13 forks source link

[BUG/ISSUE] Olson landmap data is being read in as all zeroes for GCHP simulations with gfortran (but not ifort) #15

Closed yantosca closed 5 years ago

yantosca commented 5 years ago

Long story short, the Olson land map data seems to be coming in as all zeroes from the Import State for GCHP simulations that use gfortran. This issue is most certainly the root cause of previously-mentioned issues #13 and #14.

My GC-classic and GCHP code were at the following commits:

GEOS-Chem repository
   Path        : /local/ryantosca/GCHP/gf82/Code.12.1.1
   Branch      : bugfix/GCHP_issues
   Last commit : Add a better error trap in Compute_Olson_Landmap_GCHP
   Date        : Fri Dec 21 12:03:57 2018 -0500
   User        : Bob Yantosca
   Hash        : 7db80f35
   Git status  : 
GCHP repository
   Path        : /local/ryantosca/GCHP/gf82/Code.12.1.1/GCHP
   Branch      : bugfix/GCHP_issues
   Last commit : Now exit if Compute_Olson_Landmap_GCHP returns with failure
   Date        : Fri Dec 21 12:07:09 2018 -0500
   User        : Bob Yantosca
   Hash        : 7bd7110
   Git status  : M Chem_GridCompMod.F90

Note that I modified the code to put in an error trap that will exit if all elements of State_Met%LandTypeFrac are zero (or more precisely, when the variable maxFracInd is zero).

I have narrowed down the issue to this code section of GCHP/Chem_GridCompMod.F90, where the Olson land map data is obtained from the Import State and copied into State_Met%LandFracType.

       IF ( FIRST ) THEN

          ! Set Olson fractional land type from import (ewl)
          If (am_I_Root) Write(6,'(a)') 'Initializing land type ' // &
                           'fractions from Olson imports'
          Ptr2d => NULL()
          DO T = 1, NSURFTYPE

             ! Create two-char string for land type
             landTypeInt = T-1
             IF ( landTypeInt < 10 ) THEN
                WRITE ( landTypeStr, "(A1,I1)" ) '0', landTypeInt
             ELSE
                WRITE ( landTypeStr, "(I2)" ) landTypeInt  
             ENDIF
             importName = 'OLSON' // TRIM(landTypeStr)

             ! Get pointer and set populate State_Met variable
             CALL MAPL_GetPointer ( IMPORT, Ptr2D, TRIM(importName),  &
                                    notFoundOK=.TRUE., __RC__ )
             If ( Associated(Ptr2D) ) Then
                If (am_I_Root) Write(6,*)                                &
                     ' ### Reading ' // TRIM(importName) // ' from imports'
                State_Met%LandTypeFrac(:,:,T) = Ptr2D(:,:)
             ELSE
                WRITE(6,*) TRIM(importName) // ' pointer is not associated'
             ENDIF

             ! Nullify pointer
             Ptr2D => NULL()
          ENDDO

          ! Compute State_Met variables IREG, ILAND, IUSE, and FRCLND
          CALL Compute_Olson_Landmap_GCHP( am_I_Root, State_Met, RC=STATUS )
          VERIFY_(STATUS)    ! <--- I added this error trap
       ENDIF

Also not shown above are some debug print statements.

I compiled and ran a C24 GCHP Rn-Pb-Be simulation with gfortran 8.2, using 6 cores of Odyssey. The modules were:

Currently Loaded Modules:
  1) git/2.17.0-fasrc01   5) gcc/8.2.0-fasrc01       9) hdf5/1.8.12-fasrc12
  2  gmp/6.1.2-fasrc01    6) openmpi/3.1.1-fasrc01  10) netcdf/4.1.3-fasrc02
  3) mpfr/3.1.5-fasrc01   7) zlib/1.2.8-fasrc07
  4) mpc/1.0.3-fasrc06    8) szip/2.1-fasrc02

And I got the following output in the gchp.log file:

  ### Reading OLSON01 from imports
 %%%LTF:            0           5          25   0.0000000000000000        0.0000000000000000     
  ### Reading OLSON02 from imports
  ### Reading OLSON03 from imports
 %%%LTF:            0           5          25   0.0000000000000000        0.0000000000000000     
 ... etc...
  ### Reading OLSON72 from imports
 %%%LTF:            0           5          25   0.0000000000000000        0.0000000000000000     
===============================================================================
GEOS-Chem ERROR: Invalid value of maxFracInd: 0!  This can indicate a problem 
reading Olson data from the Import State, and that State_Met%LandTypeFrac 
array has all zeroes.
 -> at Compute_Olson_Landmap_GCHP (in module GeosCore/olson_landmap_mod.F90)
===============================================================================

GIGCchem::Run_                                1863
GIGCchem::Run2                                1281
GCHP::Run                                      420
MAPL_Cap                                       792
===> Run ended at Fri Dec 21 12:25:57 EST 2018

The error message is from the new error trap that I committed to the bugfix/GCHP_issues branches of the gchp and geos-chem repos on Github. The numbers in each line beginning with "%%%LTF" indicate the core number, I & J value, and the sum of State_Met%LandTypeFrac for that I,J and Olson type. As you can see all of the Olson values are coming into State_Met%LandTypeFrac as zeroes.

It seems that the issue is happening somewhere in MAPL, as reading in the Olson data uses a new feature of MAPL to return the fraction of the grid box that is covered by land type N, where N is an integer.

In MAPL_ExtDataGridCompMod.F90, then there is this snippet, where there are calls down to MAPL_CFIORead

     if (present(field)) then
        if (trans == MAPL_HorzTransOrderBilinear) then
           call MAPL_CFIORead(name, file, time, field, &
                time_is_cyclic=.false., time_interp=.false., ignoreCase = ignoreCase, &
                __RC__) 
        else if (trans == MAPL_HorzTransOrderBinning) then
           call MAPL_CFIORead(name, file, time, field, &
                time_is_cyclic=.false., time_interp=.false., ignoreCase = ignoreCase, &
                Conservative = .true., __RC__) 
        else if (trans == MAPL_HorzTransOrderSample) then
           call MAPL_CFIORead(name, file, time, field, &
                time_is_cyclic=.false., time_interp=.false., ignoreCase = ignoreCase, &
                Conservative = .true., Voting = .true., __RC__) 
        else if (trans == MAPL_HorzTransOrderFraction) then
           call MAPL_CFIORead(name, file, time, field, &
                time_is_cyclic=.false., time_interp=.false., ignoreCase = ignoreCase, &
                Conservative = .true., getFrac = val, __RC__) 
        end if

     else if (present(bundle)) then

        if (trans == MAPL_HorzTransOrderBilinear) then
           call MAPL_CFIORead(file, time, bundle, &
                time_is_cyclic=.false., time_interp=.false., ignoreCase = ignoreCase, &
                __RC__)
        else if (trans == MAPL_HorzTransOrderBinning) then
           call MAPL_CFIORead(file, time, bundle, &
                time_is_cyclic=.false., time_interp=.false., ignoreCase = ignoreCase, &
                Conservative = .true., __RC__)
        else if (trans == MAPL_HorzTransOrderSample) then
           call MAPL_CFIORead(file, time, bundle, &
                time_is_cyclic=.false., time_interp=.false., ignoreCase = ignoreCase, &
                Conservative = .true., Voting = .true., __RC__)
        else if (trans == MAPL_HorzTransOrderFraction) then
           call MAPL_CFIORead(file, time, bundle, &
                time_is_cyclic=.false., time_interp=.false., ignoreCase = ignoreCase, &
                Conservative = .true., getFrac = val, __RC__)
        end if

     end if

The relevant calls are the ones where trans == MAPL_HorzTransOrderFraction. In MAPL_CFIO, there are further calls to MAPL_HorzTransformRun, which is where I suspect the error may be happening. MAPL_HorzTransformRun is an overloaded interface for several other module procedures

Unfortunately, at this point my knowledge of the innards of MAPL is not very comprehensive. If anyone has any other suggestions to try, then please let me know. My guess is that deep into MAPL there is some code that gfortran isn't parsing properly, or for which an unexpected side-effect is occurring.

NOTE: This could potentially be caused by the ESMF version which is 5.2. ESMF 5.2 pre-dates the newest versions of gfortran, so there could conceivably be some incompatibility. But who knows.

THE UPSHOT: Until we find & fix this issue, we should not use gfortran for GCHP simulations. While GCHP can run on the AWS cloud in tutorial mode, the error is still present and you will get erroneous output.

I verifiied that compiling and running GCHP using ifort 17 correctly read the Olson land map values from the Import State into State_Met%LandFracType. So this issue only happens with GNU Fortran.

Also, I will mark #13 and #14 as closed, as this issue is the root cause.

JiaweiZhuang commented 5 years ago

Thanks for this deep diagnosis.

If the core issue is

It seems that the issue is happening somewhere in MAPL, as reading in the Olson data uses a new feature of MAPL to return the fraction of the grid box that is covered by land type N, where N is an integer.

Some dirty fixes to correct this:

This would be like the last resort. It is indeed best to make gfortran working properly.

lizziel commented 5 years ago

I am going to do a benchmark run with fortran8.2 on Odyssey in January. There may be other problems beyond this one that we don't know about since all GCHP benchmarks have been with ifort.

yantosca commented 5 years ago

I poked around in MAPL_RegridConservative Run a bit. There is this code:

   FF = 0.0
    do N = 1, size(TT%OUT%II)
       II = TT%IN%II(N)
       JI = TT%IN%JJ(N)
!@       if (II<1 .or. II>size(INPUT,1) .or.JI<1 .or. JI>size(INPUT,2)) then
!@          print *,'we have a problem'
!@       endif
       VALI = INPUT(II,JI)
       if(VALI/=MAPL_UNDEF) then
          IO = TT%OUT%II(N)
          JO = TT%OUT%JJ(N)
          W  = TT%OUT%W(N)
          if (uSAMPLE) then
             if (W > FF(IO,JO)) then
                OUTPUT(IO,JO) = VALI
                FF    (IO,JO) = W
             end if
          %%% THIS IS WHERE THE FRACTION OF LANDTYPE PER GRIDBOX IS COMPUTED
          %%% GETFRAC= the desired land type value (0..72)
          %%% VALI = the land type read in from the file
          %%% OUTPUT = the fraction of grid box with land type GETFRAC
          %%% FF = contains the sum of the mapping weights on the ouptut grid 
          else if (doFrac) then
             if (VALI .gt. GetFrac_-eps .and. VALI .lt. GetFrac_+eps) then
                OUTPUT(IO,JO) = OUTPUT(IO,JO)+W
             end if
             FF(IO,JO)=FF(IO,JO)+W
          else
             OUTPUT(IO,JO) = OUTPUT(IO,JO) + W * VALI
             FF    (IO,JO) = FF    (IO,JO) + W
          end if
       endif
    end do

    if(.not. uSAMPLE) then
       where(FF /= 0.0)
          OUTPUT = OUTPUT / FF
       elsewhere 
          OUTPUT = MAPL_Undef
       end where
    end if

But this code seems to be working OK. I put in some debug print and as far as I can tell the output array is as expected.

The only thing I would do differently would be to replace the

WHERE( FF /= 0.0 ) 

with

WHERE( FF > 0.0 .or. FF < 0.0 )

since sometimes an equality test for zero will fail due to roundoff. But the values of FF always seem to be close to 1, so I don't think it's an issue. The error may be upstream from there.

Maybe in January we can forward this to the MAPL dev team.

JiaweiZhuang commented 5 years ago

I am going to do a benchmark run with fortran8.2 on Odyssey in January.

@lizziel Consider checking the 4 cases: 1) ifort standard 2) gfortran standard 3) ifort with DryDep turned off (to remove landmap effect) 4) gfortran with DryDep turned off

Looks at surface ozone. (3) and (4) should be very close if there are no extra bugs besides this landmap one.

lizziel commented 5 years ago

Reading land type binary masks instead of the land map should bypass this issue. I generated a new file and confirmed that the two methods give identical results when using ifort. Tests with gfortran are in progress.

lizziel commented 5 years ago

Using an alternative regridding method does not correct this issue. Further testing is in progress.

yantosca commented 5 years ago

I have created a better error trap (one that only requires modifications to GCHP/Chem_GridCompMod.F90). We now get this error message if the Olson values are all zeroes:

  ### Reading OLSON01 from imports
  ... etc ...
  ### Reading OLSON72 from imports 
State_Met%LandTypeFrac contains all zeroes!  This error is a known issue in MAPL 
when using gfortran. This should not happen if you compiled with ifort.
GIGCchem::Run_                                1857
GIGCchem::Run2                                1277
GCHP::Run                                      420
MAPL_Cap                                       792
lizziel commented 5 years ago

Investigation of MAPL CFIO shows that the reading and regridding of the Olson land map file is correct for all regridding methods. The issue thus occurs somewhere between reading/regridding and retrieving the pointer to the import.

Until this issue is fixed we recommend not using gfortran with GCHP. The documentation is updated for this but there is also now a forced stop if the error occurs (see above comment from Bob).

lizziel commented 5 years ago

This issue will not be fixed in GCHP 12.2.0. I will post here again when a fix is found and slated for an upcoming version.

lizziel commented 5 years ago

GMAO tested our Olson land map file with their current integration branch of GEOS using gfortran. They did not encounter the problem we have using the old MAPL in GCHP. Updating GCHP to use a new version of MAPL, even newer than GMAO tested with, is in progress. I am currently working under the assumption that it will resolve this bug but it is too soon to verify this.

JiaweiZhuang commented 5 years ago

@lizziel Thanks for the update! That's great to know and seems like new MAPL will solve a lot of crucial problems (gfortran bug, inefficient I/O).

lizziel commented 5 years ago

I have confirmed that implementation of the new MAPL version in GCHP resolves this issue. The land mask imports are no longer all zero. It is still uncertain which version of GCHP the new MAPL will go into but I will announce it here once it is decided.

JiaweiZhuang commented 5 years ago

I have confirmed that implementation of the new MAPL version in GCHP resolves this issue.

This is wonderful news! We should finally switch to the GNU stack for benchmarks to see if there are still undiscovered issues like this.

lizziel commented 5 years ago

I am closing this issue since it is fixed with MAPL update going into 12.5.