geoschem / geos-chem

GEOS-Chem "Science Codebase" repository. Contains GEOS-Chem science routines, run directory generation scripts, and interface code. This repository is used as a submodule within the GCClassic and GCHP wrappers, as well as in other modeling contexts (external ESMs).
http://geos-chem.org
Other
164 stars 156 forks source link

[BUG/ISSUE] TOMAS simulations failing GCHP and GCClassic integration tests #1741

Closed msulprizio closed 3 months ago

msulprizio commented 1 year ago

Name and Institution (Required)

Name: Melissa Sulprizio Institution: Harvard/GCST

Confirm you have reviewed the following documentation

Description of your issue or question

Following the merge of https://github.com/geoschem/geos-chem/pull/1378 and https://github.com/geoschem/geos-chem/pull/1569 in dev/14.2.0, TOMAS simulations are failing GCHP and GCClassic integration tests. TOMAS passes compilation tests for both models but fails during the simulations. It appears to be out-of-bounds errors. Log files are attached below.

We will look into fixing this during the 14.2.0 benchmarking or for a subsequent version release.

Tagging @bettycroft @yantosca

BettyCroft commented 1 year ago

@msulprizio I think that TOMAS40 will not work with GCHP, only TOMAS15 - additional code developments are needed for TOMAS40. It is TOMAS15 that fails with GCHP or TOMAS40?

BettyCroft commented 1 year ago

@msulprizio looking at those files - it does look to be an issue with TOMAS15

yantosca commented 1 year ago

@msulprizio, @BettyCroft: There is an out-of-bounds error in state_diag%tomasmnfixcheck3mass. I think this may be because the diagnostic slot indexing may be off.

BettyCroft commented 1 year ago

@msulprizio @yantosca - should it be ok if I checkout /dev/14.2.0 at this point - and try a few tests with GCHP-TOMAS in this version? I was trying to compile on WashU cluster just now and ran into a compilation error

[ 80%] Building Fortran object src/GCHP_GridComp/HEMCO_GridComp/HEMCO/src/Extensions/CMakeFiles/HCOX.dir/hcox_tomas_dustdead_mod.F.o /storage1/fs1/rvmartin/Active/bcroft/GCHP_apr7_2023/src/MAPL/gridcomps/ExtData2G/ExtDataGridCompNG.F90(1129): error #6632: Keyword arguments are invalid without an explicit interface. [RC] call ESMF_InfoGetFromHost(grid,infoh,rc=status) ---------------------------------------------^

msulprizio commented 1 year ago

@BettyCroft Could you checkout dev/tomas instead? It's at the current state of dev/14.2.0 but I will be reverting dev/14.2.0 in the next few minutes to just before the TOMAS merges. Our benchmark tests showed differences with respect to the reference run before the TOMAS updates, which we were not expecting. (If the updates were for TOMAS simulations they should not impact simulations with TOMAS off.) See https://github.com/geoschem/geos-chem/issues/1742 and feel free to follow up there as well.

After checking out dev/tomas in the GCHP or GCClassic superproject, make sure to do git submodule update --init --recursive to ensure all submodules are at the proper commit for that version.

BettyCroft commented 1 year ago

Thanks @msulprizio I'm trying from the main branch to type git checkout dev/tomas and it seems to be unknown - whereas I was able to chekcout dev/14.2.0 - should I wait before trying?

[bcroft@compute1-client-1 GCHP_apr7_2023]$ git checkout main M src/GCHP_GridComp/GEOSChem_GridComp/geos-chem M src/GCHP_GridComp/HEMCO_GridComp/HEMCO M src/MAPL Switched to branch 'main' Your branch is up to date with 'origin/main'. Post-checkout: Fixing file modes... done. [bcroft@compute1-client-1 GCHP_apr7_2023]$ git checkout dev/tomas error: pathspec 'dev/tomas' did not match any file(s) known to git

msulprizio commented 1 year ago

@BettyCroft Can you try git fetch -p and then git checkout dev/tomas?

BettyCroft commented 1 year ago

@msulprizio thanks! That worked. Should I be able to use that version of the code for a few tests?

I'm also interested in working with v14 because I'm finding some memory leak issues with the v13 code that I have for GCHP-TOMAS.

msulprizio commented 1 year ago

Should I be able to use that version of the code for a few tests?

Please do! If you are able to fix any issues we welcome you to create a new pull request so we can pull them in on our end.

BettyCroft commented 1 year ago

@msulprizio thanks again! I'd be happy to pass along any fixes - the compilation (esm:intel-2021.1.2) in dev/tomas aborts with an error related to RC in some MAPL subroutines - would you understand and be able to advice on a fix for this compilation issue? Those are not subroutines that I usually interact with.

[ 80%] Building Fortran object src/GCHP_GridComp/HEMCO_GridComp/HEMCO/src/Core/CMakeFiles/HCO.dir/hco_driver_mod.F90.o
/storage1/fs1/rvmartin/Active/bcroft/GCHP_apr7_2023/src/MAPL/gridcomps/ExtData2G/ExtDataGridCompNG.F90(1129): error #6632: Keyword arguments are invalid without an explicit interface.   [RC]
        call ESMF_InfoGetFromHost(grid,infoh,rc=status)
---------------------------------------------^
/storage1/fs1/rvmartin/Active/bcroft/GCHP_apr7_2023/src/MAPL/gridcomps/ExtData2G/ExtDataGridCompNG.F90(1131): error #6632: Keyword arguments are invalid without an explicit interface.   [RC]
        isPresent = ESMF_InfoIsPresent(infoh,'STRETCH_FACTOR',rc=status)
--------------------------------------------------------------^
msulprizio commented 1 year ago

We use gcc 10.2.0 here and on AWS and do not get this error.

@lizziel @Jourdan-He Have you seen this error before? Has anyone attempted to build GCHP with intel-2021.1.2?

yantosca commented 1 year ago

@BettyCroft: You may be still using ESMF 8.0.0 or 8.0.1. A recent update now requires at least ESMF 8.1.0. Updating the ESMF library should get past this error.

yantosca commented 1 year ago

@BettyCroft: Also see our uploaded library instructions in the "latest" ReadTheDocs. We now instruct users to use GCC instead of Intel compilers.

BettyCroft commented 1 year ago

@yantosca Thanks! I'll work on making the suggested updates.

BettyCroft commented 1 year ago

@msulprizio - thanks again for your help with dev/tomas. I've been able to compile GCHP with TOMAS after checking out dev/tomas and updating submodules - and now am trying a few test simulations with GCHP-TOMAS - and have a quick question.

Is GET_TAUb in GeosUtil/time_mod.F90 becoming obsolete?

tomas_mod.F90 uses GET_TAUb() and the value is printing out as the year 1960 value from CAP.rc - as opposed to the year 2019 value that I have in cap_restart for the start time. This issue causes the simulation to abort due to unacceptable conditions outside of a spinup period for TOMAS . Would you have suggestions on on to pass the value in cap_restart to tomas_mod.F90. I have sourced setCommonRunSettings.sh. I can make the simulation run by overwriting GET_TAUb() in tomas_mod.F90 - but am looking for a longterm solution.

yantosca commented 1 year ago

tomas_mod.F90 uses GET_TAUb() and the value is printing out as the year 1960 value from CAP.rc - as opposed to the year 2019 value that I have in cap_restart for the start time.

Tagging @lizziel who may have an idea about this.

lizziel commented 1 year ago

Hi @BettyCroft, I am looking into why the time values are not correct. In general we prefer using elapsed time rather than differencing start and current time so I am also looking into whether elapsed time has a problem. I will report back soon.

BettyCroft commented 1 year ago

Hi @lizziel - thanks!

lizziel commented 1 year ago

Both begin date and elapsed time in time_mod.F90 reflect BEG_DATE in CAP.rc, which is counter-intuitively the ESMF clock start in MAPL. I do not think this is an issue in current GCHP simulations because we try to steer clear of date-dependent code, even relative in time. That being said, I am going to put in a fix and see if there is an impact. The easiest quick fix for you is to change BEG_DATE to be the same as what is in cap_restart. You can do this manually or edit setCommonRunSettings.sh to do it automatically along with all of the other config file updates.

Fyi, I will be taking over for Melissa in GCHP TOMAS validation and preparation for inclusion in the model. I will be in touch about this soon.

BettyCroft commented 1 year ago

Thanks @lizziel, I made the suggested change to CAP.rc, as a quick fix. Look forward to hearing from you about the GCHP-TOMAS validation and preparation for inclusion in the model.

stale[bot] commented 1 year ago

This issue has been automatically marked as stale because it has not had recent activity. If there are no updates within 7 days it will be closed. You can add the "never stale" tag to prevent the Stale bot from closing this issue.

BettyCroft commented 12 months ago

Hi @lizziel, @msulprizio and @yantosca, As an update on the TOMAS developments, I've been able to successfully checkout the dev/tomas branch, compile and run with TOMAS15 for both the GCClassic and GCHP implementations when using dev/tomas. I found that 1) the HISTORY.rc.template needed an update in order to run the GCClassic and that 2) the dust tuning parameter was not properly defined for TOMAS for GCHP. I can send along these updates and would be great if we could move towards getting these TOMAS updates in the standard release.

msulprizio commented 12 months ago

Hi @BettyCroft. Thanks for these updates. Please do send us your updates via a pull request off of the dev/tomas branch. I will incorporate that into dev/14.3.0 as early as next week.

BettyCroft commented 12 months ago

Hi @msulprizio, Thanks for your message! I will plan to send the updates this week, and am available for discussion if any issues arise as you incorporate into dev/14.3.0.

BettyCroft commented 12 months ago

Hi @msulprizio, I created PR #1959 to try to send the updated version of HISTORY.rc.fullchem for GCClassic - changing that one file should be all that is needed for GCClassic to integrate with dev/tomas branch. However as I am relatively new to navigating the process for PRs, I'm not sure the update was sent correctly in PR #1959. Please let me know. Thanks!

BettyCroft commented 12 months ago

Hi @msulprizio, @lizziel and @yantosca I've opened PR #1960 today - with these updates both GCClassic and GCHP are expected to integrate successfully using dev/tomas branch with both TOMAS15 and TOMAS40 with GCClassic - and with TOMAS15 for GCHP (TOMAS40 in GCHP is the subject of future developments). I've been able to successfully integrate all of the above on the WashU cluster. Please let me know if you are encountering any remaining issues with integration after these fixes. Thanks!

msulprizio commented 9 months ago

All TOMAS pull requests to date have now been merged into dev/14.3.0 (#1959, #1960, #2060). GCClassic integration tests are all passing. GCHP compilation tests all pass, but the execution tests are failing for TOMAS15 still.

==============================================================================
GCHP: Compilation Test Results

GCHP      #a0aa482 Merge pull request #359 from branch 'dev/tomas' into dev/14.3.0
GEOS-Chem #5517e59fb Update unit conversion code for TOMAS following merge with 14.2.3
HEMCO     #33583d2 Merge pull request #251 from branch 'dev/tomas' into dev/3.8.0

Number of compilation tests: 4

Submitted as SLURM job: 11943405
==============================================================================

Compiliation tests:
------------------------------------------------------------------------------
GCHP................................................Configure & Build.....PASS
GCHP w/ carbon gases (as a KPP mechanism)...........Configure & Build.....PASS
GCHP with RRTMG.....................................Configure & Build.....PASS
GCHP with TOMAS15...................................Configure & Build.....PASS

Summary of compilation test results:
------------------------------------------------------------------------------
Complilation tests passed:        4
Complilation tests failed:        0
==============================================================================
GCHP: Execution Test Results

GCHP      #a0aa482 Merge pull request #359 from branch 'dev/tomas' into dev/14.3.0
GEOS-Chem #5517e59fb Update unit conversion code for TOMAS following merge with 14.2.3
HEMCO     #33583d2 Merge pull request #251 from branch 'dev/tomas' into dev/3.8.0

Number of execution tests: 6

Submitted as SLURM job: 11943406
==============================================================================

Execution tests:
------------------------------------------------------------------------------
gchp_merra2_fullchem................................Execute Simulation....PASS
gchp_merra2_fullchem_benchmark......................Execute Simulation....PASS
gchp_merra2_fullchem_RRTMG..........................Execute Simulation....PASS
gchp_merra2_fullchem_TOMAS15........................Execute Simulation....FAIL
gchp_merra2_tagO3...................................Execute Simulation....PASS
gchp_merra2_TransportTracers........................Execute Simulation....PASS

Summary of test results:
------------------------------------------------------------------------------
Execution tests passed: 5
Execution tests failed: 1
Execution tests not yet completed: 0

The GCHP log file is attached: execute.gchp_merra2_fullchem_TOMAS15.log.

There seems to be an out-of-bounds error occurring in HEMCO extension hcox_tomas_jeagle_mod.F90:

 IN HCOX_TOMAS_Jeagle_Mod.F90
At line 534 of file /n/holylfs05/LABS/jacob_lab/msulprizio/IntTests/GCHP_14.3.0-alpha.5/CodeDir/src/GCHP_GridComp/HEMCO_GridComp/HEMCO/src/Extensions/hcox_tomas_jeagle_mod.F90
Fortran runtime error: Index '50' of dimension 1 of array 'inst%tc1' above upper bound of 12

This bug should be addressed in a future pull request.

Tagging @BettyCroft @yantosca

yantosca commented 9 months ago

@msulprizio @BettyCroft I'll take a look

yantosca commented 9 months ago

@msulprizio @BettyCroft: I've found an out-of-bounds error in hco_interp_mod.F90 as well. This might be the root cause.

Fortran runtime error: Index '48' of dimension 3 of array 'regr_4d' above upper bound of 47

Error termination. Backtrace:
#0  0x14d3b75 in __hco_interp_mod_MOD_regrid_mapa2a
        at /n/holyscratch01/jacob_lab/ryantosca/tests/tomas/test_tomas/CodeDir/src/HEMCO/src/Core/hco_interp_mod.F90:398
#1  0x1473c34 in __hcoio_read_mod_MOD_hcoio_read
        at /n/holyscratch01/jacob_lab/ryantosca/tests/tomas/test_tomas/CodeDir/src/HEMCO/src/Core/hcoio_read_std_mod.F90:1489
#2  0x142bfb6 in __hcoio_dataread_mod_MOD_hcoio_dataread
        at /n/holyscratch01/jacob_lab/ryantosca/tests/tomas/test_tomas/CodeDir/src/HEMCO/src/Core/hcoio_dataread_mod.F90:206
#3  0x14133a6 in readlist_fill
        at /n/holyscratch01/jacob_lab/ryantosca/tests/tomas/test_tomas/CodeDir/src/HEMCO/src/Core/hco_readlist_mod.F90:502
#4  0x1414c98 in __hco_readlist_mod_MOD_readlist_read
        at /n/holyscratch01/jacob_lab/ryantosca/tests/tomas/test_tomas/CodeDir/src/HEMCO/src/Core/hco_readlist_mod.F90:371
#5  0x13bc491 in __hco_driver_mod_MOD_hco_run
        at /n/holyscratch01/jacob_lab/ryantosca/tests/tomas/test_tomas/CodeDir/src/HEMCO/src/Core/hco_driver_mod.F90:165
#6  0x58739b in __hco_interface_gc_mod_MOD_hcoi_gc_run
        at /n/holyscratch01/jacob_lab/ryantosca/tests/tomas/test_tomas/CodeDir/src/GEOS-Chem/GeosCore/hco_interface_gc_mod.F90:998
#7  0x4ced0f in __emissions_mod_MOD_emissions_run
        at /n/holyscratch01/jacob_lab/ryantosca/tests/tomas/test_tomas/CodeDir/src/GEOS-Chem/GeosCore/emissions_mod.F90:185
#8  0x4076d4 in geos_chem
        at /n/holyscratch01/jacob_lab/ryantosca/tests/tomas/test_tomas/CodeDir/src/GEOS-Chem/Interfaces/GCClassic/main.F90:701
#9  0x40cfa5 in main
        at /n/holyscratch01/jacob_lab/ryantosca/tests/tomas/test_tomas/CodeDir/src/GEOS-Chem/Interfaces/GCClassic/main.F90:32

This is happening in this loop:

          ! REGR_4D are the remapped fractions.
          DO T  = 1, NTIME
          DO L  = 1, HcoState%NZ
          DO J  = 1, HcoState%NY
          DO I2 = 1, HcoState%NX
             IF ( REGFRACS(I2,J,L,T) > MAXFRACS(I2,J,L,T) ) THEN
                MAXFRACS(I2,J,L,T) = REGR_4D(I2,J,L,T)
                INDECES (I2,J,L,T) = IVAL
             ENDIF
          ENDDO
          ENDDO
          ENDDO
          ENDDO

It is because the REGR_4D array is dimensioned with NLEV (which is 47) but the loop goes up to HcoState%NZ (which is 72):

       ALLOCATE( REGR_4D(HcoState%NX,HcoState%NY,NLEV,NTIME), STAT=AS )

I think resizing REGR_4D would fix this.

BettyCroft commented 9 months ago

Hi @msulprizio and @yantosca - GCHP-TOMAS in dev/14.3.0 appears to be compiling and executing successfully on my local cluster with the resizing of REGR_4D in hco_interp_mod.F90. I'll check the output today and plan to send a pull request by tomorrow for the updates to tomas_mod.F90 to address the TOMAS diagnostic collection issue and also to add TOMAS tracers to SpeciesConc diagnostic collection.

yantosca commented 9 months ago

Thanks @BettyCroft. We'll await your PR.

msulprizio commented 9 months ago

Thanks @BettyCroft. I suspect the GCHP TOMAS runs only fail with the additional debug flags utilized in the integration tests. Have you tried building with -DCMAKE_BUILD_TYPE=Debug?

BettyCroft commented 9 months ago

Hi @msulprizio and @yantosca - thanks for clarifying. As you suspected, I did not have the debug flag turned on. I will try again with this flag on. Please let me know if you have any additional insights on what might be causing the issue.

BettyCroft commented 9 months ago

Hi @msulprizio and @yantosca- I am now reproducing the error that Melissa had coming from hcox_tomas_jeagle_mod.F90. I printed out the values of HcoState%NY and HcoState%NX in this subroutine and they print out as 6 and 8 respectively for my C24 simulation - as a result model is not able to print out those hard-coded values of 50 and 10 as in the debug statement (copied below). My understanding of the GCHP grid setup is not perfect - are those values of HcoState%NY and HcoState%NX correct? Could we address the issue by reassigning the values of 50 and 10 to ones that will match any grid?

!### Debug
print*, 'Aerosol number AT 50, 10,: ', Inst%TC1(ii,jj,1,k)

yantosca commented 9 months ago

Ah OK. I see the issue. I think you can I = 1..24, J=1..24, since C24 is the minimum resolution for GCHP. @lizziel, is that right?

lizziel commented 9 months ago

Every processor will have a subset of the global grid and therefore I=50 and J=10 will only work if you are using few cores. Are 50 and 10 chosen as a random grid box or should they correspond to a specific lat/lon? If you need a value at any grid box then use I,J = 1,1. Note that it will be printed by ALL cores which is something we avoid.

lizziel commented 9 months ago

I checked the code that was merged in and see a few debug prints that are uncommented. These should either be commented out or put into verbose code blocks. https://github.com/geoschem/HEMCO/blob/33583d207f098c5843bc193fb2bafe148bd8a9d6/src/Extensions/hcox_tomas_jeagle_mod.F90#L284 https://github.com/geoschem/HEMCO/blob/33583d207f098c5843bc193fb2bafe148bd8a9d6/src/Extensions/hcox_tomas_jeagle_mod.F90#L534-L535

BettyCroft commented 9 months ago

Thanks @lizziel ! I think that the 50 and 10 were originally chosen as random points over the oceans. The second debug print that you identified as needing modification is the one that was causing the simulation to abort because those indices do not match with GCHP resolutions.

BettyCroft commented 9 months ago

Hi @lizziel - I have also identified an issue at lines 6501-6510 in tomas_mod.F90 where there are hard-coded GC-Classic grid resolutions such as IF ( TRIM(State_Grid%GridRes) == '4.0x5.0' etc. Would you be able to provide me with some guidance on the syntax to use to modify this IF block to include GCHP grid resolutions?

lizziel commented 9 months ago

Hi Betty, all of GEOS-Chem needs to be grid-independent to be compatible with GCHP and external models that use it. If there is any grid-dependence, i.e. our dust mass scaling factors, they should be input via configuration file and not hard-coded in the source code. The exception to this is vertical grids. Would it be possible to rewrite the tomas_mod.F90 code block for 4x5 to be independent of horizontal grid? The code is here: https://github.com/geoschem/geos-chem/blob/e4145feec74e8e8ca3dc63c5c22cfe5e1175963f/GeosCore/tomas_mod.F90#L529-L555. If you are seeing it at lines 6501-6510 we must be using different code bases. Are you using dev/14.3.0?

If you need a debug print for a random point over the ocean I suggest using the ocean mask variable in State_Met to find a grid box. It's complicated, however, since some core regions may be over land only. I'll think more about the best way to do it.

BettyCroft commented 9 months ago

Hi @lizziel - thanks for your helpful message! Sorry- I meant to indicate lines 6358-6367 in tomas_mod.F90 - yes I'm using dev/14.3.0 but forgot I had already added a few lines of code updates - that changed the numbering I was viewing.

Looks like we need to rewrite the lines at 529-540 and also 6358-6367 in tomas_mod.F90 in order to be fully grid independent. We could add something with the same approach as for dust - but we need to decide on the values for SGCTSCALE and also ionrate for the various horizontal resolutions of GCHP. I'll add @theloniuspunk here in case Jeff has suggestions on the settings to use across resolutions.

BettyCroft commented 8 months ago

Hi @msulprizio, @lizziel and @yantosca - as an update - I have implemented the fixes that we discussed above and the model is now running with debug option on in dev/14.3.0 (on both on the WashU and Dal platforms). @theloniuspunk and I will review the model output in the next day or two - and thereafter I plan to make a pull request for these fixes. Thanks again for your advice!

msulprizio commented 8 months ago

Thanks for the updates @BettyCroft!

BettyCroft commented 8 months ago

Hi @msulprizio - is the timeline too tight for getting these updates into v14.3.0?

msulprizio commented 8 months ago

@BettyCroft Yes, probably, but we can release Z versions as needed so the fixes could go into 14.3.1 or later version.

BettyCroft commented 8 months ago

Thanks @msulprizio! I'll prioritize sending the pull request as soon as possible.

BettyCroft commented 8 months ago

Hi @msulprizio @lizziel @yantosca, I have just made pull requests on geos-chem (2116) HEMCO (259) and GCHP (369). These updates are expected to enable TOMAS simulations to integrate successfully in debug mode in the version dev/14.3.0. I think that that have uploaded these code updates correctly - however it looks like I indicated that I wanted to merge them into main and I meant to indicate dev/14.3.0. Please let me know if you enable any issues with implementation. Thanks again for your help!

BettyCroft commented 8 months ago

Hi @msulprizio - as a post script, please let me know if you'd like me to provide a restart file with TOMAS tracers for dev/14.3.0 - I have ones readily available.

yantosca commented 3 months ago

We can close out this issue as TOMAS integration tests for GCClassic and GCHP now run successfully in 14.4.0.