geoschem / geos-chem

GEOS-Chem "Science Codebase" repository. Contains GEOS-Chem science routines, run directory generation scripts, and interface code. This repository is used as a submodule within the GCClassic and GCHP wrappers, as well as in other modeling contexts (external ESMs).
http://geos-chem.org
Other
171 stars 166 forks source link

GCHP crashing when turning on ProdLoss diagnostic collections for new tracers #2192

Open amy1916 opened 8 months ago

amy1916 commented 8 months ago

Name and Institution (Required)

Name: Amy Lees Institution: University of York

Confirm you have reviewed the following documentation

New GEOS-Chem feature or discussion

Hi, I am experiencing the same error as #2135 (https://github.com/geoschem/geos-chem/issues/2135) when trying to output ProdLoss rates of tracers I have added with KPP. I am using GCHP v14.2.3 (with the addition of the fix for KPP integration failures https://github.com/geoschem/geos-chem/pull/2006) for a C48 simulation.

I have tried both adding the diagnostic to the DefaultCollection and the ProdLoss collection in HISTORY.rc.

When adding to the DefaultCollection it returns the same error as above (GCHP_budget_4004937_defaultcollection.log):


GEOS-Chem ERROR [0023]: PROD_T001 is not found in the registry for the DIAG object!
 --> LOCATION:  -> at Registry_Lookup (in Headers/registry_mod.F90)

GEOS-Chem ERROR [0023]: Registry pointer not found for Prod_T001. Check that the tag (e.g. species) is valid for this diagnostic.
 --> LOCATION: HistoryExports_SetDataPointers

However when I tried adding it to the ProdLoss collection in HISTORY.rc I get this error (GCHP_budget_3900364_prodloss.log) :

pe=00001 FAIL at line=03324    MAPL_HistoryGridComp.F90                 <status=41>
pe=00001 FAIL at line=00958    MAPL_HistoryGridComp.F90                 <status=41>
pe=00001 FAIL at line=01807    MAPL_Generic.F90                         <status=41>
pe=00001 FAIL at line=00713    MAPL_CapGridComp.F90                     <status=41>
pe=00001 FAIL at line=00658    MAPL_CapGridComp.F90                     <status=41>
pe=00001 FAIL at line=00959    MAPL_CapGridComp.F90                     <status=41>
pe=00001 FAIL at line=00311    MAPL_Cap.F90                             <status=41>
pe=00001 FAIL at line=00258    MAPL_Cap.F90                             <status=41>
pe=00001 FAIL at line=00192    MAPL_Cap.F90                             <status=41>
pe=00001 FAIL at line=00169    MAPL_Cap.F90                             <status=41>
pe=00001 FAIL at line=00029    GCHPctm.F90                              <status=41>

Do you have any ideas why this is happening? I believe I have correctly added the species to the species database file and all the KPP files have been generated with these new tracers added.

Attached are the relevant log files, HISTORY.rc and species_database.yml

Thank you for your help. species_database.yml.txt HISTORY.rc.txt GCHP_budget_4004937_defaultcollection.log GCHP_budget_3900364_prodloss.log

amy1916 commented 8 months ago

Also, with further testing it appears that turning the ProdLoss collection on in HISTORY.rc just for the standard diagnostics (not my new tracers) also gives the error as above

yantosca commented 8 months ago

Thanks for writing @amy1916. I suspect the PT001 etc. species haven't been included in the KPP fullchem mechanism, and thus can't be found. Make sure to also add all of the new prod/loss species to your KPP/fullchem/fullchem.kpp file, e.g.:

#FAMILIES                 { Chemical families for prod/loss diagnostic }
POx : O3 + NO2 + 2NO3 + PAN + PPN + MPAN + HNO4 + 3N2O5 + HNO3 + BrO + HOBr + BrNO2 + 2BrNO3 + MPN + ETHLN + MVKN + MCRHN + MCRHNB + PROPNN + R4N2 + PRN1 + PRPN + R4N1 + HONIT + MONITS + MONITU + OLND + OLNN + IHN1 + IHN2 + IHN3 + IHN4 + INPB + INPD + ICN + 2IDN + ITCN + ITHN + ISOPNOO1 + ISOPNOO2 + INO2B + INO2D + INA + IDHNBOO + IDHNDOO1 + IDHNDOO2 + IHPNBOO + IHPNDOO + ICNOO + 2IDNOO + MACRNO2 + ClO + HOCl + ClNO2 + 2ClNO3 + 2Cl2O2 + 2OClO + O + O1D + IO + HOI + IONO + 2IONO2 + 2OIO + 2I2O2 + 3I2O3 + 4I2O4 + NIT + NITs;
LOx : O3 + NO2 + 2NO3 + PAN + PPN + MPAN + HNO4 + 3N2O5 + HNO3 + BrO + HOBr + BrNO2 + 2BrNO3 + MPN + ETHLN + MVKN + MCRHN + MCRHNB + PROPNN + R4N2 + PRN1 + PRPN + R4N1 + HONIT + MONITS + MONITU + OLND + OLNN + IHN1 + IHN2 + IHN3 + IHN4 + INPB + INPD + ICN + 2IDN + ITCN + ITHN + ISOPNOO1 + ISOPNOO2 + INO2B + INO2D + INA + IDHNBOO + IDHNDOO1 + IDHNDOO2 + IHPNBOO + IHPNDOO + ICNOO + 2IDNOO + MACRNO2 + ClO + HOCl + ClNO2 + 2ClNO3 + 2Cl2O2 + 2OClO + O + O1D + IO + HOI + IONO + 2IONO2 + 2OIO + 2I2O2 + 3I2O3 + 4I2O4 + NIT + NITs;
PCO : CO;
LCO : CO;
PSO4 : SO4;
LCH4 : CH4;
PH2O2 : H2O2;
PT001: T001
PT002: T002
... etc for other species ...

Then rebuild the KPP mechanism with:

$ cd GCClassic/src/GEOS-Chem/KPP
$ ./build_mechanism.sh fullchem

and then recompile GEOS-Chem.

lizziel commented 8 months ago

Hi @amy1916, could you try uncommenting the below lines in GCHP, rebuilding, and rerunning? This will print detailed information about the parsing of HISTORY.rc and how the GEOS-Chem diagnostic registry will interpret the names. It might give clues about why it is failing for each case.

https://github.com/geoschem/geos-chem/blob/b288af1563a9a49a154a93bdd0f88be1f58f3f2f/Interfaces/GCHP/gchp_historyexports_mod.F90#L161-L162

https://github.com/geoschem/geos-chem/blob/b288af1563a9a49a154a93bdd0f88be1f58f3f2f/Interfaces/GCHP/gchp_historyexports_mod.F90#L179-L180

lizziel commented 8 months ago

@yantosca, I think the species are included in KPP since they appear in the log with valid IDs.

amy1916 commented 8 months ago

Thank you both for your responses. @yantosca the species have been added to fullchem.kpp and I have re-built the mechanism and recompiled the code. @lizziel I have uncommented the debugging lines and re-built the model, then tested for both adding my tracer to the DefaultCollection which returned the same error as before:

GEOS-Chem ERROR [0023]: PROD_T001 is not found in the registry for the DIAG object!
 --> LOCATION:  -> at Registry_Lookup (in Headers/registry_mod.F90)

GEOS-Chem ERROR [0023]: Registry pointer not found for Prod_T001. Check that the tag (e.g. species) is valid for this diagnostic.
 --> LOCATION: HistoryExports_SetDataPointers
pe=00023 FAIL at line=01095    gchp_historyexports_mod.F90              <informative message here>
pe=00023 FAIL at line=03070    Chem_GridCompMod.F90                     <status=1>
pe=00023 FAIL at line=01963    Chem_GridCompMod.F90                     <status=1>
pe=00023 FAIL at line=01807    MAPL_Generic.F90                         <status=1>
pe=00023 FAIL at line=00559    GCHP_GridCompMod.F90                     <status=1>
pe=00023 FAIL at line=01807    MAPL_Generic.F90                         <status=1>
pe=00023 FAIL at line=01308    MAPL_CapGridComp.F90                     <status=1>
pe=00023 FAIL at line=01260    MAPL_CapGridComp.F90                     <status=1>
pe=00023 FAIL at line=00837    MAPL_CapGridComp.F90                     <status=1>
pe=00023 FAIL at line=00977    MAPL_CapGridComp.F90                     <status=1>
pe=00023 FAIL at line=00313    MAPL_Cap.F90                             <status=1>
pe=00023 FAIL at line=00258    MAPL_Cap.F90                             <status=1>
pe=00023 FAIL at line=00192    MAPL_Cap.F90                             <status=1>
pe=00023 FAIL at line=00169    MAPL_Cap.F90                             <status=1>
pe=00023 FAIL at line=00029    GCHPctm.F90                              <status=1>

and when uncommenting ProdLoss in HISTORY.rc:

pe=00013 FAIL at line=03324    MAPL_HistoryGridComp.F90                 <status=41>
pe=00013 FAIL at line=00958    MAPL_HistoryGridComp.F90                 <status=41>
pe=00013 FAIL at line=01807    MAPL_Generic.F90                         <status=41>
pe=00013 FAIL at line=00713    MAPL_CapGridComp.F90                     <status=41>
pe=00013 FAIL at line=00658    MAPL_CapGridComp.F90                     <status=41>
pe=00013 FAIL at line=00959    MAPL_CapGridComp.F90                     <status=41>
pe=00013 FAIL at line=00311    MAPL_Cap.F90                             <status=41>
pe=00013 FAIL at line=00258    MAPL_Cap.F90                             <status=41>
pe=00013 FAIL at line=00192    MAPL_Cap.F90                             <status=41>
pe=00013 FAIL at line=00169    MAPL_Cap.F90                             <status=41>
pe=00013 FAIL at line=00029    GCHPctm.F90                              <status=41>
pe=00006 FAIL at line=03324    MAPL_HistoryGridComp.F90                 <status=41>
pe=00006 FAIL at line=00958    MAPL_HistoryGridComp.F90                 <status=41>
pe=00006 FAIL at line=01807    MAPL_Generic.F90                         <status=41>
pe=00006 FAIL at line=00713    MAPL_CapGridComp.F90                     <status=41>
pe=00006 FAIL at line=00658    MAPL_CapGridComp.F90                     <status=41>
pe=00006 FAIL at line=00959    MAPL_CapGridComp.F90                     <status=41>
pe=00006 FAIL at line=00311    MAPL_Cap.F90                             <status=41>
pe=00006 FAIL at line=00258    MAPL_Cap.F90                             <status=41>
pe=00006 FAIL at line=00192    MAPL_Cap.F90                             <status=41>
pe=00006 FAIL at line=00169    MAPL_Cap.F90                             <status=41>
pe=00006 FAIL at line=00029    GCHPctm.F90                              <status=41>
pe=00017 FAIL at line=03324    MAPL_HistoryGridComp.F90                 <status=41>
pe=00017 FAIL at line=00958    MAPL_HistoryGridComp.F90                 <status=41>
pe=00017 FAIL at line=01807    MAPL_Generic.F90                         <status=41>

It doesn't appear to have given any extra debugging output? I'm unsure why this is the case.

I have attached the relevant log files. Thanks again for your help with this.

GCHP_budget_4051590_defaultcollectiontest.log

lizziel commented 8 months ago

I think the issue with the ProdLoss collection without your updates has to do with format. I checked the traceback and it is erroring out after looking for a label in MAPL history. I will try uncommenting it in a GCHP run to reproduce the problem. Maybe there is a typo in our HISTORY.rc template for GCHP.

Regarding adding Prod_T001 to the default collection, the log print shows that PROD_T001 is successfully added to the GEOS-Chem diagnostics list and created as a MAPL export:

 ====================
 Contents of DiagList

 Prod_T001
    state:      DIAG
    metadataID: PROD
    registryID: PROD_T001
    isTagged:   T
    tag:    T001
 ===========================
 History Exports List:

 Name:        Prod_T001
  MetadataID: PROD
  RegistryID: PROD_T001
  Long name:  Chemical production of for T001
  Units:      molec cm-3 s-1
  Vert loc:              2
  Rank:                  3
  Type:                  1
  isMet:       F
  isChem:      F
  isDiag:      T

@yantosca, do you have any idea why PROD_T001 would be in DiagList but not be found in the registry. It seems like it should find it.

yantosca commented 8 months ago

@lizziel @amy1916 I wonder if there are TABs in the HISTORY.rc file, and maybe that impacts the parsing?

amy1916 commented 8 months ago

Hi @lizziel @yantosca I don’t think tabs were the issue as I removed all tabs in HISTORY.rc and still received the same error when using ProdLoss collection.

I currently have managed to get my tracers outputting in the ProdLoss collection using a newly downloaded v14.2.3 code directory so I’m assuming it’s a problem present in my previous code directory. Attempting to output my tracers in the DefaultCollection is still giving the same error though.

I will do some more testing and update here when I figure out what’s happening. Thanks for your help, please let me know if you have any other ideas what could be happening!

yantosca commented 8 months ago

Thanks @amy1916. Keep us posted!

lizziel commented 8 months ago

Hi @amy1916, thanks for the update. Have you tried running with GEOS-Chem verbose set to true in geoschem_config.yml? This will print out the list of diagnostics that are registered in GEOS-Chem. If PROD_T001 is not on the list then that is a clue something is going wrong with the diagnostic registration.

amy1916 commented 7 months ago

Hi both, I am again getting the error when trying to output ProdLoss diagnostics, when I have set up a new code directory with a new set of tracers. I am really struggling to figure out why this issues occurs sometimes but not others?

I have compared my files between a run directory (and the code dir) where ProdLoss is working fine and where it isn't and there seems to be no obvious difference. My colleague @hazel008 has also had issues with getting ProdLoss diagnostics to output consistently, and again we haven't been able to figure out what is causing these problems.

Thanks again for any help or ideas on this!

HISTORY.rc from the rundir that works:

HISTORY_working_tracers.txt

HISTORY.rc from the rundir where ProdLoss gives the error:

HISTORY_broken_tracers.txt

Log file from the rundir where ProdLoss gives the error:

broken_tracers.log

yantosca commented 7 months ago

Hi @amy1916! The error is happening at line 3343 of src/MAPL/gridcomps/History/MAPL_HistoryGridComp.F90. I have been running into a similar issue on my end. I have put a debug print statement for the collection name as shown below.

    subroutine parse_fields(cfg, label, field_set, collection_name, items, rc)
       type(ESMF_Config), intent(inout) :: cfg
       character(*), intent(in) :: label
       type (FieldSet), intent(inout) :: field_set
       character(*), intent(in), optional :: collection_name
       type(GriddedIOitemVector), intent(inout), optional :: items
       integer, optional, intent(out) :: rc
       logical :: table_end
       logical :: vectorDone,match_alias
       integer :: m,i,j
       character(ESMF_MAXSTR), pointer:: fields (:,:)

       type(GriddedIOitem) :: item
       integer :: status
       character(len=:), allocatable :: usable_collection_name

       if (present(collection_name)) then
          usable_collection_name = trim(collection_name)
       else
          usable_collection_name = "unknown"
       end if
       print*, '### usable collection name: ', TRIM(usable_collection_name)
       call ESMF_ConfigFindLabel ( cfg, label=label//':', rc=status)
       _VERIFY(status)

       table_end = .false.
       m = 0
       do while (.not.table_end)
          m = m+1

! Get EXPORT Name
! ---------------
          call ESMF_ConfigGetAttribute ( cfg,value=export_name,rc=STATUS)
          if (status /= ESMF_SUCCESS)  then
              if( MAPL_AM_I_ROOT(vm) ) then
                  print *
                  print *, '**************************************************************'
                  print *, 'Attributes NOT set for Collection: ',trim( list(n)%collection )
                  print *, '**************************************************************'
                  print *
              endif
          endif
          _VERIFY(STATUS)   ! <==== this is the error trap
          export_name = extract_unquoted_item(export_name)

You could add a similar print statement then recompile and run.

Also, you've done a good job in narrowing down which collections work and which don't. I think the remaining ones to test are:

Try commenting those out one at a time to see if you can get the simulation to work. You can do short simulations like an hour. Also make sure to set DIag_Monthly=0 for each collection if you are running for less than a month.

@lizziel, any other suggestions?

amy1916 commented 7 months ago

Hi @yantosca! Thanks for this, I have added the print statement and tested the remaining collections.

Aerosols seems to trigger the error and cause the model to fail. LevelEdgeDiags seems to run with no problems.

The ProdLoss collection works ok when I tested 'Prod_Ox' but when I try and output my new tracers I receive the error where the tracer isn't found in the registry for the DIAG object (similar to previously, when I was trying to use the DefaultCollection)

image
lizziel commented 7 months ago

Hi @amy1916, I am able to reproduce your error using your settings for the Aerosols collection. The problem is you commented out the species that is on the same line as the fields declaration:

  Aerosols.fields:        #'AODHygWL1_SO4                 ', 'GCHPchem',
                          #'AODHygWL1_BCPI                ', 'GCHPchem',
                          #'AODHygWL1_OCPI                ', 'GCHPchem',
                          #'AODHygWL1_SALA                ', 'GCHPchem',
                          #'AODHygWL1_SALC                ', 'GCHPchem',
                          'AODDust                       ', 'GCHPchem',

You can fix that by either uncommenting AODHygWL1_SO4 or moving AODDust to the first line. This constraint should be better documented in our docs and I can make a feature request with NASA GMAO to add in the flexibility to comment out the first diagnostic in their next version.

Regarding the new ProdLoss tracers, it looks like we are back to where we were before with an error in registry_mod.F90 in GEOS-Chem. Did you try a run with verbose on in GEOS-Chem (config file geoschem_config.yml)? I am curious to see if the diagnostic it is failing on was registered in GEOS-Chem history. If it was registered then a pointer should be found. If a pointer is not found then something is going on with the GEOS-Chem registration of that species for that diagnostic. This is a good thing since it means you can probably figure it out by adding prints to GEOS-Chem rather than MAPL.

The error message says "Check that the tag (e.g. species) is valid for this diagnostic.". @yantosca, is there a way to print out all of the ProdLoss diagnostic tags? Would that get printed if verbose is on?

amy1916 commented 7 months ago

Hi @lizziel. Thank you! I will do a test run fixing the aerosol collection issue.

verbose_true.log

Attached is the log file with verbose on. It looks to me that all the new tracers are being registered and given a ModelId and KppSpcId? I am also currently running a debug simulation to try and get more information on this.

lizziel commented 7 months ago

Hi @amy1916, from your log file it looks like your diagnostic is not being registered in GEOS-Chem:

Registered variables contained within the State_Diag object:
===============================================================================
 PROD_CO                        | Chemical production  | xyz C | molec cm-3 s-1
 PROD_OX                        | Chemical production  | xyz C | molec cm-3 s-1
 LOSS_OX                        | Chemical loss of Ox  | xyz C | molec cm-3 s-1

Registered variables contained within the State_Met object:
===============================================================================

There is an object in GEOS-Chem called DiagList that parses HISTORY.rc in GEOS-Chem to determine what diagnostics should be registered and with what tag. Earlier you printed out the contents of DiagList by uncommenting a line, rebuilding, and rerunning. Could you do this test run again with that enabled?

I think RXN_S_SO2_SALAAL should appear as a tag of a PROD diagnostic in the DiagList print, and so should be registered. But maybe that is not happening. I would guess maybe there is a parsing problem from all of the underscores, but your earlier example of PROD_TOO1 had the same problem. What I want to verify here is that the diagnostic does appear in DiagList but then does not appear in the Registered variables list in the log. If that is the case there must be a bug somewhere in the code.

amy1916 commented 7 months ago

Hi @lizziel.

Yes, it seems that RXN_S_SO2_SALAAL does appear in DiagList, but not in the Registered variables list.

====================$
 Contents of DiagList$
  $
 Prod_RXN_s_SO2_SALAAL$
    state:      DIAG$
    metadataID: PROD$
    registryID: PROD_RXN_S_SO2_SALAAL$
    isTagged:   T$
    tag:    RXN_S_SO2_SALAAL$
  $
 Loss_Ox$
    state:      DIAG$
    metadataID: LOSS$
    registryID: LOSS_OX$
    isTagged:   T$
    tag:    OX$
  $
 Prod_Ox$
    state:      DIAG$
    metadataID: PROD$
    registryID: PROD_OX$
    isTagged:   T$
    tag:    OX$
  $
Registered variables contained within the State_Diag object:$
===============================================================================$
 PROD_CO                        | Chemical production  | xyz C | molec cm-3 s-1$
 PROD_OX                        | Chemical production  | xyz C | molec cm-3 s-1$
 LOSS_OX                        | Chemical loss of Ox  | xyz C | molec cm-3 s-1$

I'll attach the log file. alli_summer_c48_5214259.log

Thanks for your continued support with this.

yantosca commented 7 months ago

Hi @amy1916 and @lizziel! Am wondering if the multiple underscores are affecting the parsing. Typically we only allow for one underscore in the diagnostic name, as that is the separator between the diagnostic prefix and species name (or in GEOS-Chem Classic, you can also have a wildcard like Prod_?PRD?) I wonder if you changed the species names to e.g.. RxnSSo2Salaal if it would work.

I can try a test in both GCClassic and GCHP to see if this would break the diagnostics.

amy1916 commented 7 months ago

Hi @yantosca and @lizziel, I tested removing the multiple underscores and using the diagnostic name Prod_RXNsSO2SALAAL and it seems to be in the DiagList but not in the registered variables list (same as above).

It seems to be giving a different error now though that I haven't seen before?

alli_summer_c48_5281661.log

amy1916 commented 7 months ago

Hi @yantosca and @lizziel, I think the error in the above log file was an issue with memory (a few of my other simulations are experiencing similar). I am still finding that these new tracers are appearing in the DiagList but not the registered variables list, even when underscores are removed. Thanks, Amy

lizziel commented 7 months ago

Hi @amy1916, it makes sense the new error is memory since it is failing during creation of the restart file. One of us will further investigate the Prod diagnostic issue as we have time. In the meantime, if you want to continue looking into it then I suggest adding prints around where the Prod diagnostic is registered from state_diag_mod.F90.