geoschem / GCHP

The "superproject" wrapper repository for GCHP, the high-performance instance of the GEOS-Chem chemical-transport model.
https://gchp.readthedocs.io
Other
21 stars 25 forks source link

[BUG/ISSUE] GCHP v13.4.1 error: KPP failed to converge after 2 iterations! #253

Closed 1Dandan closed 1 year ago

1Dandan commented 1 year ago

What institution are you from?

Washington University in St. Louis

Description of the problem

I am running a GCHP C360 simulation for 2018 but received errors at time stepping at 2018-01-19 01:20:00 saying KPP failed to converge after 2 iterations!. I have tested it multiple times and across platforms (WashU compute1 & NASA Pleiades) and it is reproducible. Relevant lines of the error message are:

AGCM Date: 2018/01/19  Time: 01:20:00  Throughput(days/day)[Avg Tot Run]:    5.5   19.2   19.3  TimeRemaining(Est) ***:41:30   58.0% :  49.7% Mem Comm:Used
                                                                      Mem/Swap Used (MB) at MAPL_Cap:TimeLoop=  2.271E+05  2.900E+01
        GENERIC: INFO: Started the  'Run' stage of the gridded component 'EXTDATA'
        GENERIC: INFO: Finished the 'Run' stage of the gridded component 'EXTDATA'
        GENERIC: INFO: Started the  'WriteRestart' stage of the gridded component 'GCHP'
        GENERIC: INFO: Started the  'WriteRestart' stage of the gridded component 'GCHPctmEnv'
        GENERIC: INFO: Finished the 'WriteRestart' stage of the gridded component 'GCHPctmEnv'
        GENERIC: INFO: Started the  'WriteRestart' stage of the gridded component 'GCHPchem'
        GENERIC: INFO: Finished the 'WriteRestart' stage of the gridded component 'GCHPchem'
        GENERIC: INFO: Started the  'WriteRestart' stage of the gridded component 'DYNAMICS'
        GENERIC: INFO: Finished the 'WriteRestart' stage of the gridded component 'DYNAMICS'
        GENERIC: INFO: Finished the 'WriteRestart' stage of the gridded component 'GCHP'
        GENERIC: INFO: Started the  'WriteRestart' stage of the gridded component 'HIST'
        GENERIC: INFO: Finished the 'WriteRestart' stage of the gridded component 'HIST'
        GENERIC: INFO: Started the  'Run' stage of the gridded component 'GCHP'
        GENERIC: INFO: Started the  'Run' stage of the gridded component 'GCHPctmEnv'
        GENERIC: INFO: Finished the 'Run' stage of the gridded component 'GCHPctmEnv'
        GENERIC: INFO: Started the  'Run' stage of the gridded component 'DYNAMICS'
        GENERIC: INFO: Finished the 'Run' stage of the gridded component 'DYNAMICS'
        GENERIC: INFO: Started the  'Run' stage of the gridded component 'GCHPchem'
 Forced exit from Rosenbrock due to the following error:
 --> Step size too small: T + 10*H = T or H < Roundoff
 T=   332.927641929585      and H=  1.512347627860436E-013
 ### INTEGRATE RETURNED ERROR AT:           15          19           4
 Forced exit from Rosenbrock due to the following error:
 --> Step size too small: T + 10*H = T or H < Roundoff
 T=   333.326753320219      and H=  1.339337158519358E-013
## INTEGRATE FAILED TWICE !!!
 ###############################################################################
 ###############################################################################
 ###############################################################################
 ### KPP DEBUG OUTPUT!
 ### Species concentrations at problem box           15          19           4

......

GEOS-Chem ERROR [0333]: KPP failed to converge after 2 iterations!
 --> LOCATI
 ON:  -> at Do_FullChem (in module GeosCore/FullChem_mod.F90)

GEOS-Chem ERROR [0333]: Incorrect species units after Do_FullChem!
 --> LOCATI
 ON:  -> at Do_Chemistry  (in module GeosCore/chemistry_mod.F90)
pe=00333 FAIL at line=01332    gchp_chunk_mod.F90                       <Error calling Do_Chemistr>
pe=00333 FAIL at line=03680    Chem_GridCompMod.F90                     <status=1>
pe=00333 FAIL at line=02734    Chem_GridCompMod.F90                     <status=1>
pe=00333 FAIL at line=01844    MAPL_Generic.F90                         <Error during the 'Run' stage of the gridded component 'GCHPchem'>
pe=00333 FAIL at line=00556    GCHP_GridCompMod.F90                     <status=1>
pe=00333 FAIL at line=01844    MAPL_Generic.F90                         <Error during the 'Run' stage of the gridded component 'GCHP'>
pe=00333 FAIL at line=01257    MAPL_CapGridComp.F90                     <status=1>
pe=00333 FAIL at line=01181    MAPL_CapGridComp.F90                     <status=1>
pe=00333 FAIL at line=00804    MAPL_CapGridComp.F90                     <status=1>
pe=00333 FAIL at line=00934    MAPL_CapGridComp.F90                     <status=1>
pe=00333 FAIL at line=00247    MAPL_Cap.F90                             <status=1>
pe=00333 FAIL at line=00211    MAPL_Cap.F90                             <status=1>
pe=00333 FAIL at line=00154    MAPL_Cap.F90                             <status=1>
pe=00333 FAIL at line=00129    MAPL_Cap.F90                             <status=1>
pe=00333 FAIL at line=00031    GCHPctm.F90                              <status=1>
Abort(0) on node 333 (rank 333 in comm 0): application called MPI_Abort(MPI_COMM_WORLD, 0) - process 333

GEOS-Chem version

v13.4.1

Description of code modifications

I use GEOS-FP as the meteorology inputs and full chem scheme with LUO_WETDEP turned on. The source code is left as default. But I checked small bugs and thus modified it. Specifically: 1) MetDir created by createRunDir.sh is pointed to /ExtData/GEOS_0.25x0.3125/MERRA2', which is a mistake and thus I changed to GEOS_FP directory. 2) TheVolcano_TableinHEMCO_Config.rcis pointed to$ROOT/VOLCANO/v2021-09/$YYYY/$MM/so2_volcanic_emissions_Carns..rc, which looks like a small mistake and thus I changed it to$ROOT/VOLCANO/v2021-09/$YYYY/$MM/so2_volcanic_emissions_Carns.$YYYY$MM$DD.rc`

Log files

Software versions

yantosca commented 1 year ago

Thanks for writing @1Dandan. There have been a few similar issues on the geoschem/geos-chem repo: https://github.com/geoschem/geos-chem/issues/1337, https://github.com/geoschem/geos-chem/issues/1315, https://github.com/geoschem/geos-chem/issues/1087, https://github.com/geoschem/geos-chem/issues/66.

This can happen when a concentration gets too low and the solver cannot converge. In the log file that you attached (*.log.txt) there is a printout of the concentrations and reaction rates at the problem box. I think the issue is that these rates are very large:

   11712265.8780294      ClOO --> Cl + O2
    7.85077589665824      Cl2O2 --> 2 ClO

whereas if you look, most of the other rates are of the order 1e-10 to 1e-20. This was the problem in https://github.com/geoschem/geos-chem/issues/1337.

You said you changed the MetDir link from MERRA-2 to GEOS-FP. Have you tried building a fresh run directory? Am wondering if there are some configuration files that weren't updated for GEOS-FP.

1Dandan commented 1 year ago

Hi @yantosca, thanks for following up. I used the createRunDir.sh and select GEOS-FP when it asked me to select meteorology, and the initial MetDir is linked to /ExtData/GEOS_0.25x0.3125/MERRA2 in the fresh run directory. I modified the linked MetDir in my run directory. I think it indicates the template in source code is not correct, but I don't know where to change it. The source code for GCHP v13.4.1 is intact. Associated emission inventories look correct with offline emissions linked to 0.25x0.3125 directory. Not sure about other configuration files. Here I attached the configuration files in my run directory: input.geos.txt logging.yml.txt runConfig.sh.txt GCHP.rc.txt HEMCO_Config.rc.txt ExtData.rc.txt CAP.rc.txt

lizziel commented 1 year ago

Hi @1Dandan, regarding the incorrect meteorology, that is a bug in 13.4. See https://github.com/geoschem/geos-chem/pull/1224 for the fix which is changing "MERRA2" to "GEOS-FP" in one of the GEOS-Chem source code files. That pull request page lists the Milestone for the update as 13.4.0, but it looks like the fix never actually made it into 13.4.0 or 13.4.1. It is included in 14.0.0. I'll correct the GitHub page for that and make sure it is listed as a fix in 14.0.

1Dandan commented 1 year ago

Hi @lizziel, thanks. I removed the link MetDir and relinked MetDir to /ExtData/GEOS_0.25x0.3125/GEOS_FP, so meteorological problem should be fixed already in my run directory.

stale[bot] commented 1 year ago

This issue has been automatically marked as stale because it has not had recent activity. If there are no updates within 7 days it will be closed. You can add the "never stale" tag to prevent the Stale bot from closing this issue.

1Dandan commented 1 year ago

Just to update the final response for this ticket. I tried turning off cloud convection as suggested in https://github.com/geoschem/geos-chem/issues/1337 but did not succeed. I finally just skipped the simulation at 2018/1/19 and proceed it to 2018/1/20. It is still running for the simulation, though it is not clear to me what's wrong with 1/19 01:20 UTC.

lizziel commented 1 year ago

Thanks @1Dandan for following up on the resolution to this. When something like that happens, on a specific day, it usually means there is something wrong with a meteorology file. @Jourdan-He, would someone at WashU be able to look into whether there is a met-field issue in GEOS-FP 0.25x0.3125 files for January 19 2018 at 01:20 UTC?

Jourdan-He commented 1 year ago

Hi @lizziel , I checked the GEOS-FP files on 20180119 with Panoply and they all seem normal.

1Dandan commented 1 year ago

I also encountered a similar KPP error at 2018/07/22 04:20 UTC for GCHP C90 with version v13.4.1. I am using the same settings for GCHP simulations with HTAPv3 as global emission inventory, GFED4 daily and AEIC daily. Other settings are left as default. This C90 simulation is conducted on NASA NAS system with MPT library and intel compilers. The error message is: run-C90-HTAPv3-20221024_1232.log where very large reaction rates are also observed:

 9481216.73379060      ClOO --> Cl + O2
 4.32260904565101      Cl2O2 --> 2 ClO

I did not have problem at this specific date time for simulations at C48 and C360 with same settings for GCHP simulations.

It looks very similar probably related to same issue. Thus I just put the problem here.

All other log files:

CAP.rc.txt ExtData.rc.txt GCHP.rc.txt HEMCO_Config.rc.txt input.geos.txt logging.yml.txt runConfig.sh.txt

Submitting scripts and environments:

Run scripts: run-600.pbs.txt Loaded environments:

  1) git/latest               8) comp-intel/2020.4.304
  2) cmake/latest             9) mpi-hpe/mpt.2.25
  3) other/manage_externals  10) hdf4/4.2.12
  4) other/mepo              11) szip/2.1.1
  5) other/gh                12) hdf5/1.8.18_mpt
  6) ImageMagick/7.0.8-53    13) netcdf/4.4.1.1_mpt
  7) comp-gcc/11.2.0-TOSS3   14) GCHPenv/2022-02.Intel