geoschem / geos-chem

GEOS-Chem "Science Codebase" repository. Contains GEOS-Chem science routines, run directory generation scripts, and interface code. This repository is used as a submodule within the GCClassic and GCHP wrappers, as well as in other modeling contexts (external ESMs).
http://geos-chem.org
Other
169 stars 165 forks source link

[QUESTION] How to fix known RRTMG bugs in GC12.9.3? #481

Closed FeiYao-Edinburgh closed 4 years ago

FeiYao-Edinburgh commented 4 years ago

Hi there,

Following @msulprizio hint, I read your RRTMG fixes for GEOS-Chem 13.0.0. But before GC 13.0.0 is released, I'd like stick to GC 12.9.3 for RRTMG simulation. Therefore, I should at least make some code modifications to GC 12.9.3 to overcome some of the known issues. These include: 1) Fix bug in RRTMG with stratospheric aerosol by following code changes here 2) Fix Seg fault in RRTMG when using Intel compiler if nitrates (NI) output enabled by following code changes here 3) Beware that currently RadAllSkyLWTOA_PM actually denotes RadAllSkyLWTOA_NOPM, and the like

However, I did not find corresponding code changes for the True “bugs” in RRTMG code described here, which I believe is also necessary for my simulation. Could you please give me some hints? If I can see codes changes like in 1) and 2) that would be excellent!

BTW, does the 5th update described in the second-to-last comment here matter too much?

Personally I feel this question might be appropriate for @lizziel . Thanks in advance for any help!

lizziel commented 4 years ago

Hi @FeiYao-Edinburgh,

The code updates are dispersed throughout our dev/13.0.0 branch but you should be able to adapt them fairly easily by looking at the code changes, especially a commit I made for use with GCHP that consolidated all updates that had come before. Assuming you are all set with 1 and 2 that you list, here is the commit that contains changes for 3, which is the main cause of differences between the binary and netcdf RRTMG diagnostics in GEOS-Chem 12. commit f587bc2d ("Updates for RRTMG negcdf diagnostics in GEOS-Chem")

You can ignore the files in the Interfaces/GCHP and run/GCHPctm folders if you are using GEOS-Chem Classic.

You may also need updates in commit 9eb68c3d which contains modifications for the GC-Classic HISTORY.rc file.

Finally, there was another update to make the binary and netcdf match by shifting the RRTMG call time to the first timestep rather than the middle of the radiation timestep. Changes for that update are in commit https://github.com/geoschem/geos-chem/commit/408d4b37a348fa1c3db304963c910a8a7417257a.

FeiYao-Edinburgh commented 4 years ago

Hi @lizziel

Many thanks for your reply. I can basically understand commits f587bc2d and 9eb68c3d are for addressing 3. Commit 408d4bc is mainly for the 5th update you described here. But I also mentioned True “bugs” in RRTMG code described by @sdeastham also in this page (copied to below for your convenience). Is this issue also solved following the three commit links you provided? Or is there another commit link for this?

I will complete this work asap and update here timely!

True “bugs” in RRTMG code
I discovered one genuine bug in the RRTMG code, which I believe crept in when we went from 1-D (“CSPEC”-style) indexing to
 3-D (“STT”-style) indexing. This was causing any aerosol in the stratosphere to have incorrect single scattering albedos and 
asymmetry parameters, which would then mess up fluxes beneath those layers as well. The fix to this is simple (extending a 
couple of arithmetic calculations to all grid boxes, rather than restricting it to lower grid cells). I’ve provided the fix (which comes 
down to deleting 6 lines) to the GCST as a pull request on GitHub. This bug would affect both BPCH and NetCDF diagnostics.

With that fixed, I believe that RRTMG should now be working perfectly in GEOS-Chem Classic with NetCDF diagnostics. I should 
note that it currently looks like the NetCDF diagnostics only put out fluxes, so SSA and asymmetry parameters are not output. 
This should be relatively straightforward to fix, though.
lizziel commented 4 years ago

The fix for the bug in stratosphere is on the GitHub page for PR https://github.com/geoschem/geos-chem/pull/347. See the "Commit" tab on that page for the file diffs. It should be changes to one file only, GeosCore/rrtmg_rad_transfer_mod.F90.

That plus the commits I provided should give complete coverage for the GEOS-Chem Classic updates. But let me know if you run into any problems.

FeiYao-Edinburgh commented 4 years ago

Ah I just realize True “bugs” in RRTMG code described here is exactly the stratospheric aerosol issue that corresponds to 1.

Up to now I am all set with 1 and 2. I will following your hints to address 3 and 4(shifting the RRTMG call time) soon. Once all set I will update here to let you know this issue can be closed. Many thanks for your continuous support!

lizziel commented 4 years ago

Okay, great. I will close this issue but feel free to reopen if you run into problems.

FeiYao-Edinburgh commented 4 years ago

commit f587bc2d ("Updates for RRTMG negcdf diagnostics in GEOS-Chem")

This commit targeting 3 indeed consolidates most but not complete updates that had come before. I add two more minor points below 1) expecting your confirmation; 2) for possible interest of other users who want to use GC12.9.3 to run RRTMG to archive correct netCDF diagnostics as will be done in 13.0.0.

  1. When compiling, GC will complain State_Chm%RRTMG_iSeed, State_Chm%RRTMG_iCld if only making changes informed by commit f587bc2d. This bug can be easily fixed by doing additional changes following this hint. As I compile successfully with this additional change, I think no more other changes are needed?
  2. I found two lines of State_Diag%Archive_RadOptics = .FALSE. in state_diag_mod.F90. This won't impact anything but we can for sure remove one.
lizziel commented 4 years ago
  1. Yes, those fields need to exist in state_chm. They came in with a pull request from S. Eastham for RRTMG in GCHP.
  2. The duplicate initialization was removed in a later commit within dev/13.0.0.
lizziel commented 4 years ago

I am closing this issue again since it sounds like you now have it working. You should be able to reopen if needed, but I'll check that that is the case after this is closed.

lizziel commented 4 years ago

Hi @FeiYao-Edinburgh, it turns out that GitHub does not allow users to reopen issues closed by repository collaborators such as myself. However, if an issue is closed you can still comment in the thread and we on the Support Team will still get notified. Please add a comment here if you have further issues with implementing RRTMG updates into 12.9.3.

FeiYao-Edinburgh commented 4 years ago

Hi @lizziel , I have followed all the commits to address issues 1-4. The model compiles successfully. It is also running smoothly. However, I have difficulty with diagnostics collections. For example, if I run from 20160701 000000 to 20160703 000000 and do the following collections:

  Restart.filename:           './GEOSChem.Restart.%y4%m2%d2_%h2%n2z.nc4',
  Restart.format:             'CFIO',
  Restart.frequency:          'End',
  Restart.duration:           'End',
  Restart.mode:               'instantaneous'

  Aerosols.template:          '%y4%m2%d2_%h2%n2z.nc4',
  Aerosols.format:            'CFIO',
  Aerosols.frequency:         00000000 030000
  Aerosols.duration:          00000001 000000
  Aerosols.mode:              'time-averaged'  

  RRTMG.template:             '%y4%m2%d2_%h2%n2z.nc4',
  RRTMG.format:               'CFIO',
  RRTMG.frequency:            00000000 030000
  RRTMG.duration:             00000001 000000
  RRTMG.mode:                 'time-averaged'

  StateMet.template:          '%y4%m2%d2_%h2%n2z.nc4',
  StateMet.format:            'CFIO',
  StateMet.frequency:         00000000 030000
  StateMet.duration:          00000001 000000
  StateMet.mode:              'time-averaged'

I will not be able to collect correct GEOSChem.Restart files as you can see the file size is at least incorrect.

464K Oct 16 15:52 GEOSChem.Restart.20160703_0000z.nc4

Also I will encounter error if using this Restart file to continue the model run say for 20160703 000000 to 20160705 000000.

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

In Ncrd_3d_R4 #2:  NetCDF: Index exceeds dimension bound
     65536       286

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

Code stopped from DO_ERR_OUT (in module NcdfUtil/m_do_err_out.F90)

This is an error that was encountered in one of the netCDF I/O modules,
which indicates an error in writing to or reading from a netCDF file!

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

Meanwhile, I found other diagnostics are incorrect at least on the END simulation date according to their size information.

image

I feel I must be missing some more but common changes since this issue applies to all diagnostics collections. I'd be grateful for your further help!

FeiYao-Edinburgh commented 4 years ago

Checking the GEOSChem.RRTMG.20160701_0000z.nc4 (GEOSChem.RRTMG.20160702_0000z.nc4 is unable to open as expected), I also found that RadAODWL1_?RRTMG? are all zeros (see below for instance). This means they have not been correctly assigned during RRTMG simulation. The Radiation Flux diagnostics seem correct, though.

Hence there are at least two issues: 1) Do not have correct diagnostics at least on the END simulation date 2) AOD diagnostics from RRTMG have not been correctly assigned during the simulation. I guess this is likely to be the issue for SSA and ASM.

I expect your further support. You can also check my modified codes here that mainly involve the several commits discussed above.

lizziel commented 4 years ago

Hi @FeiYao-Edinburgh, if your run crashed during file write then that would explain why the day 2 diagnostics files are so small. I suggest checking to see which collection is causing the fail by commenting out collections in HISTORY.rc. If the collection that is causing the problem is Restart or another collection that is not RRTMG then this issue may be unrelated to your RRTMG updates. You can check this by building an out-of-the-box 12.9.3 simulation and using the same run directory settings you are using now.

Regarding zero values in RadAOD, try adding prints within GeosCore/rrtmg_rad_transfer_mod.F90 where the State_Diag array for RadAOD wavelength 1 is set to make sure the logic is executed properly. Also try not using the wildcard in HISTORY.rc to see if that has any impact.

FeiYao-Edinburgh commented 4 years ago

Hi @lizziel

  1. I have run a tropchem simulation from 20160701 to 20160703 using the same settings and the geos compiled from the modified codes. All collections including Restart, Aerosols, StateMet are correctly archived on all days. Hence I believe the problem still lies in RRTMG updates. Below I summarize all the updates I have made based on GC12.9.3. 1.1 modify GeosCore/rrtmg_rad_transfer_mod.F90 1.2 modify GeosCore/rrtmg_rad_transfer_mod.F90 1.3 modify GeosCore/main.F90, GeosCore/rrtmg_rad_transfer_mod.F90, Headers/diaglist_mod.F90, Headers/state_diag_mod.F90 1.4 modify Headers/state_chm_mod.F90 1.5 modify GeosCore/main.F90, GeosCore/rrtmg_rad_transfer_mod.F90

I have checked all of these again and believe I have made all correct changes. All my modified codes can be found here.

  1. My RRTMG collection setting is as below (no ?RRTMG? and with BASE first)
    RRTMG.template:             '%y4%m2%d2_%h2%n2z.nc4',
    RRTMG.format:               'CFIO',
    RRTMG.frequency:            00000000 030000
    RRTMG.duration:             00000001 000000
    RRTMG.mode:                 'time-averaged'
    RRTMG.fields:               'RadClrSkySWSurf_BASE          ', 'GIGCchem',
                              'RadAllSkySWSurf_BASE          ', 'GIGCchem',
                              'RadAllSkySWSurf_PM            ', 'GIGCchem',
                              'RadAODWL1_SU                  ', 'GIGCchem',
                              'RadAODWL1_NI                  ', 'GIGCchem',
                              'RadAODWL1_AM                  ', 'GIGCchem',
                              'RadAODWL1_BC                  ', 'GIGCchem',
                              'RadAODWL1_OA                  ', 'GIGCchem',
                              'RadAODWL1_SS                  ', 'GIGCchem',
                              'RadAODWL1_DU                  ', 'GIGCchem',
                              'RadAODWL1_PM                  ', 'GIGCchem',

    I have checked that State_Diag%RADAODWL1(I,J,OUTIDX) = AODOUT was executed so no problem with many if evaluations. But I found AODOUT are generally close to zeros like below. I feel this might be the problem but do not know how to address it. Personally, I feel at least AOD spatial pattern collected from RRTMG should confirm with those collected from Aerosols.

    1.3063893E-03
    9.6710376E-04
FeiYao-Edinburgh commented 4 years ago

AODOUT

Using grep to search and I found AODOUT is a local variable defined and used in GeosCore/rrtmg_rad_transfer_mod.F90. All related codes around AODOUT in my rrtmg_rad_transfer_mod.F90 are identical to yours. This further makes it difficult for me to identify the problem.

Further prints indicates that AODTEMP and AODOUT were calculated but still do not know why all the values are close to zeros.

lizziel commented 4 years ago

Another thing you can do is plot your values and compare them to our validation plots for the binary versus netcdf diagnostics. If they are far off from these then there is something wrong. Note that these plots used 24-hour averages.

FeiYao-Edinburgh commented 4 years ago

Another thing you can do is plot your values and compare them to our validation plots for the binary versus netcdf diagnostics

Hi @lizziel thanks for this! I have compared my results with this one and found similar daily spatial patterns regarding RadFlux (a few positive values in RadAllSkySWSurf_PM though) but not RadAOD (mine are all zeros), say below as an instance. image image

Meanwhile, I still cannot obtain the correct diagnostics on the END simulation date. This is found with steps 1.1-1.5 or 1.1-1.4. I discard 1.5 as I thought incomplete diagnostics might be associated with shifting call time of RRTMG a bit earlier? Also I feel confused about how to shift call time for RRTMG as 1.5 seems mainly removes First_RT codes?

Below I provide you the error message when close to the end of simulation. You might be able to give me some further hints using these info.

===============================================================================
 Mass-Weighted OH Concentration
 Mean OH =    13.6402941324198       [1e5 molec/cm3]
===============================================================================
*** Error in `./geos': free(): invalid next size (normal): 0x0000000014adcff0 ***
======= Backtrace: =========
/lib64/libc.so.6(+0x81299)[0x2b4fccb20299]
/geos/usrgeos/intel/Compiler/xe2016/compilers_and_libraries_2016.0.109/linux/compiler/lib/intel64_lin/libifcoremt.so.5(for_deallocate+0x12e)[0x2b4fca72dede]
./geos[0x12513ac]
./geos[0x406b45]
./geos[0x403a8e]
/lib64/libc.so.6(__libc_start_main+0xf5)[0x2b4fccac1555]
./geos[0x403999]
======= Memory map: ========
00400000-017ed000 r-xp 00000000 00:41 249957753                          /geos/d21/s1855106/GC.12.9.3/merra2_2x25_RRTMG/geos
019ed000-019ee000 r--p 013ed000 00:41 249957753                          /geos/d21/s1855106/GC.12.9.3/merra2_2x25_RRTMG/geos
019ee000-01b0f000 rw-p 013ee000 00:41 249957753                          /geos/d21/s1855106/GC.12.9.3/merra2_2x25_RRTMG/geos
01b0f000-08567000 rw-p 00000000 00:00 0
09ef1000-124f6a000 rw-p 00000000 00:00 0                                 [heap]
2b4fc6be4000-2b4fc6c06000 r-xp 00000000 08:31 1049311                    /usr/lib64/ld-2.17.so
2b4fc6c06000-2b4fc6c50000 rw-p 00000000 00:00 0
2b4fc6c53000-2b4fc6d13000 rw-p 00000000 00:00 0
2b4fc6d13000-2b4fc6d14000 ---p 00000000 00:00 0
2b4fc6d14000-2b4fc6d24000 rw-p 00000000 00:00 0
2b4fc6e05000-2b4fc6e06000 r--p 00021000 08:31 1049311                    /usr/lib64/ld-2.17.so
2b4fc6e06000-2b4fc6e07000 rw-p 00022000 08:31 1049311                    /usr/lib64/ld-2.17.so
2b4fc6e07000-2b4fc6e08000 rw-p 00000000 00:00 0
2b4fc6e08000-2b4fc6e73000 r-xp 00000000 08:31 1082877                    /usr/lib64/libnetcdff.so.5.3.1
2b4fc6e73000-2b4fc7072000 ---p 0006b000 08:31 1082877                    /usr/lib64/libnetcdff.so.5.3.1
2b4fc7072000-2b4fc7073000 r--p 0006a000 08:31 1082877                    /usr/lib64/libnetcdff.so.5.3.1
2b4fc7073000-2b4fc7074000 rw-p 0006b000 08:31 1082877                    /usr/lib64/libnetcdff.so.5.3.1
2b4fc7074000-2b4fc71d9000 r-xp 00000000 08:31 1082760                    /usr/lib64/libnetcdf.so.7.2.0
2b4fc71d9000-2b4fc73d8000 ---p 00165000 08:31 1082760                    /usr/lib64/libnetcdf.so.7.2.0
2b4fc73d8000-2b4fc7427000 r--p 00164000 08:31 1082760                    /usr/lib64/libnetcdf.so.7.2.0
2b4fc7427000-2b4fc742c000 rw-p 001b3000 08:31 1082760                    /usr/lib64/libnetcdf.so.7.2.0
2b4fc742c000-2b4fca465000 rw-p 00000000 00:00 0
2b4fca465000-2b4fca48b000 r-xp 00000000 00:40 262645                     /geos/usrgeos/intel/Compiler/xe2016/compilers_and_libraries_2016.0.109/linux/compiler/lib/intel64_lin/libifport.so.5
2b4fca48b000-2b4fca68b000 ---p 00026000 00:40 262645                     /geos/usrgeos/intel/Compiler/xe2016/compilers_and_libraries_2016.0.109/linux/compiler/lib/intel64_lin/libifport.so.5
2b4fca68b000-2b4fca68c000 r--p 00026000 00:40 262645                     /geos/usrgeos/intel/Compiler/xe2016/compilers_and_libraries_2016.0.109/linux/compiler/lib/intel64_lin/libifport.so.5
2b4fca68c000-2b4fca68e000 rw-p 00027000 00:40 262645                     /geos/usrgeos/intel/Compiler/xe2016/compilers_and_libraries_2016.0.109/linux/compiler/lib/intel64_lin/libifport.so.5
2b4fca68e000-2b4fca694000 rw-p 00000000 00:00 0
2b4fca694000-2b4fca7d9000 r-xp 00000000 00:40 262641                     /geos/usrgeos/intel/Compiler/xe2016/compilers_and_libraries_2016.0.109/linux/compiler/lib/intel64_lin/libifcoremt.so.
5
2b4fca7d9000-2b4fca9d8000 ---p 00145000 00:40 262641                     /geos/usrgeos/intel/Compiler/xe2016/compilers_and_libraries_2016.0.109/linux/compiler/lib/intel64_lin/libifcoremt.so.
5
2b4fca9d8000-2b4fca9db000 r--p 00144000 00:40 262641                     /geos/usrgeos/intel/Compiler/xe2016/compilers_and_libraries_2016.0.109/linux/compiler/lib/intel64_lin/libifcoremt.so.
5
2b4fca9db000-2b4fca9de000 rw-p 00147000 00:40 262641                     /geos/usrgeos/intel/Compiler/xe2016/compilers_and_libraries_2016.0.109/linux/compiler/lib/intel64_lin/libifcoremt.so.
5
2b4fca9de000-2b4fcaa26000 rw-p 00000000 00:00 0
FeiYao-Edinburgh commented 4 years ago

I will check more

Later on I roll back the codes to an earlier version with only modifications 1.1 and 1.2. As such I overcome the incomplete diagnostics on the END simulation date. Thus I can confirm the problem of incomplete diagnostics on the END simulation date was due to modification of 1.3. Only including 1.1 and 1.2 will only return me RadFlux diagnostics that should be interpreted with caution like below. image

So up to now, I am still experiencing: 1) incomplete diagnostics on the END simulation date; 2) zero RadAOD diagnostics (it is due to AODOUT are all zeros during calculations). They are both due to modifications in 1.3. I also provide the error info above for you to help me check where the bug might occur. I suppose I need to modify more files that are missing from the aforementioned commits1.1-1.5? I expect and thanks your further support at your earliest convenience. And again my modified codes are available here.

lizziel commented 4 years ago

Hi @FeiYao-Edinburgh, the release candidate for 13.0.0 will be released within the next few weeks. I suggest that you upgrade to that version to use RRTMG netcdf diagnostics if you have not been able to get them to work in 12.9.3.

While we do not support the recent RRTMG diagnostics updates in GEOS-Chem 12 you can try to figure out the issues with your updates by going through all debugging recommendations listed on our Debugging GEOS-Chem wiki page.

FeiYao-Edinburgh commented 4 years ago

Hi @lizziel Okay. I think I will discard steps 1.3-1.5 and run small, sample RRTMG simulations in GC.12.9.3 for my research design. I will bear in mind that PM denotes NOPM and the like. Also I will collect speciated AOD via Aerosols diagnostics. For this, may I ask what's the difference between AOD calculated via Aerosols and RRTMG diagnostics? They intend to be the same, right? Once you release 13.0.0, I will upgrade to it for my large, real RRTMG run.

lizziel commented 4 years ago

Hi @FeiYao-Edinburgh, I do not think the Aerosols and RRTMG diagnostics are meant to be the same. However, I am not sure of the expected differences. @sdeastham, who has looked extensively at the RRTMG diagnostics, may be able to comment better on this.

sdeastham commented 4 years ago

I'm afraid it's been too long since I looked at them to remember, but I agree they may well be different - I believe they use different optical properties. I'm not at all sure though and you'd need to investigate the code to be certain. What is certain is that they will differ because the RRTMG diagnostics are only going to be calculated when RRTMG is run.

FeiYao-Edinburgh commented 4 years ago

Thank you! In this sense, using AOD from RRTMG diagnostics to interpret Radiation Flux from RRTMG diagnostics sounds more reasonable. I will keep an eye on your release of 13.0.0. Please feel free to close this issue now.