geoschem / geos-chem-cloud

Run GEOS-Chem easily on AWS cloud
http://cloud.geos-chem.org
MIT License
39 stars 9 forks source link

[QUESTION] What are the steps to set up an Aerosol Only Nested Grid Run on GeosChem 12.7.0 (No idea how to do it!) #40

Closed neyranaz closed 4 years ago

neyranaz commented 4 years ago

Model version: 12.7.0 AMI Name: GEOSChem_12.7.0_tutorial_20200205 Instance type: c5.18xlarge (To prevent the EC2 from being killed or running out of memory)

Hello, we've been trying to set up an Aerosol Only Nested Run for North America with a special focus on western U.S., but we haven't been able to identify the complete steps to accomplish this. We've decided to use GC12.7 because of this issue. Also, we've read in depth the following links: Nested Grid Wiki, Nested Model Clinic, Flex Grid Simulation Error, 3 hour boundary conditions for nested grid, and @FeiYao-Edinburgh steps to run a nested simulation for Complex SOA. We have also troubleshooted with the preconfigured nested grid (tropchem), to identify some errors, but we haven't succeeded at running a nested grid for the specialty of our interest (Aerosol). Because of this, we would like to have your input on the steps to tackle this.

Here is our understanding of the way to process it, along with some questions (In italics) associated to it, :

I don't want to confuse anyone so please take these steps as our notion of what might have to be done rather than what should actually be done.

Creating the Boundary Conditions in geosfp_4x5_aerosol

1.- Creating the boundary conditions: First we create a Directory from /UT/perl/CopyRunDirs.Input for Offline Aerosol (or should we create the BCs by it activating geosfp standard?) by activating GEOSFP Aerosol only 4x5 aerosol (or is 2x2.5 better?), for the date in which we plan to generate our boundary conditions. This date has to be one day before the date of our nested run.

2.-Once the directory is generated, we go there, and modify HISTORY.rc, there we turn on 'Restart', 'SpeciesConc', and 'BoundaryConditions' (or should we turn on/off something else?), then we modify BoundaryConditions.frequency: 00000000 030000 to 1 hour, so that we can address the hourly issue specified by @msulprizio.

3.-In the HEMCO_config.rc file, we modify the paths to the data as it was [explained] (https://github.com/geoschem/geos-chem-cloud/issues/39#issuecomment-665073840) by Bob, so that we are able to download them once we do the dryrun as explained by @yantosca.

4.- Then in input.geos we modify the Latitude and Longitude to the area of interest, for example, lat and lon for california, (_or is it for North America or is it globally, (leaving it as it was)_). Here, should we turn Nested grid simulation on, or should we leave it off? Also, If we want to run, a Nested Grid for California, should we run the BC for a global area (-180 180; -90 90) for NA (-140 -40; 10 70), or can we do it for a Custom Area (Lets say, western US) ? Do nested and global dimensions have to agree, or can we do global for North America and nested for California?).

`%%% GRID MENU %%%       :
Grid resolution         : 4.0x5.0
Longitude min/max       : 114.0 119.0  (Questions regarding these specs)
Latitude  min/max       : 38.0 42.0     (Questions regarding these specs)
 Half-sized polar boxes?: F
Number of levels        : 47
Nested grid simulation? : F              (Questions regarding these specs)
 Buffer zone (N S E W ) :  3  3  3  3`

5.- Then compile, using make realclean ; make -j4 build NC_DIAG=y BPCH_DIAG=n TIMERS=1, (Or is it something else, as @FeiYao was doing for version 12.2?).
5.1.- Download data, using the dry run, and execute ./geos to get the boundary conditions in OutputDir

Creating the nested grid directory in geosfp_4x5_aerosol_na

6.- Once we have created the BC, we move again to /UT/perl/CopyRunDirs.Input and create a new directory by activating GEOSFP Aerosol only 4x5 aerosol and modifying it to nested as @FeiYao-Edinburgh did (or would it be advisable to modify it from the tropchem nested grid?)

`
#--------|-----------|------|------------|------------|------------|---------|
# MET    | GRID      | NEST | SIMULATION | START DATE | END DATE   | EXTRA?  |
#--------|-----------|------|------------|------------|------------|---------|
  geosfp   4x5         -      aerosol      2013070100   2013070101   -  (Questions regarding these specs)
## ======= Nested model runs ==================================================
# merra2   05x0625     as     tropchem         2016070100   2016080100     -
# merra2   05x0625     na     tropchem         2016070100   2016080100     -
# geosfp   025x03125   ch     tropchem         2016070100   2016080100     -
    geosfp   025x03125   na     tropchem         2016070100   2016080100     - (Questions regarding these specs)
## ======= HEMCO standalone ===================================================` 

We want:

`## ======= Nested model runs ==================================================
# merra2   05x0625     as     tropchem         2016070100   2016080100     -
# merra2   05x0625     na     tropchem         2016070100   2016080100     -
# geosfp   025x03125   ch     tropchem         2016070100   2016080100     -
    geosfp   025x03125   na     **aerosol**         2016070100   2016080100     -`

7.-We modify the generated directory (either geosfp_4x5_aerosol or geosfp_025x03125_tropchemna). How do we do this? What are we supposed to change to make this happen?_ . This is one of the most important questions, but it was not entirely clear how Fei-Yao approached it here

8.-Once the geosfp_4x5_aerosol_na directory is set up, and we modify everything to make it run (How does input.geos look?). Can we select the lat and longitud of our interest (P.ex to Los Angeles).

`%%% GRID MENU %%%       :
Grid resolution         : 025x03125     (Questions regarding these specs)
Longitude min/max       : 114.0 119.0 (Questions regarding these specs)
Latitude  min/max       : 38.0 42.0   (Questions regarding these specs)
 Half-sized polar boxes?: F
Number of levels        : 47
Nested grid simulation? : T               (Questions regarding these specs)
 Buffer zone (N S E W ) :  3  3  3  3`

9.- We modify HEMCO_config.rc, just as we did in step 3, to be able to download the data, and we also modify the Metdir, to be able to read _NA data.

METDIR: /project/data/ExtData/GEOS_0.5x0.625_NA/GEOSFP

in the same file, we also modify the path to our boundary conditions just as @msulprizio Explained (For 12.7 solved for later versions).

`#==============================================================================
# --- GEOS-Chem boundary condition file ---
#==============================================================================
(((GC_BCs
* BC_                 $**MyPathToBoundaryConditions/OutputDir**/GEOSChem.BoundaryConditions.$YYYY$MM$DD_$HH$MNz.nc4 SpeciesBC_?ADV?  1980-2019/1-12/1-31/* RFY xyz 1 * - 1 1   (Note that we substituted 1-23 for *)
)))GC_BCs

(((CHEMISTRY_INPUT`

Also from the wiki

`# ExtNr ExtName                on/off  Species 
0       Base                   : on    *
# ----- RESTART FIELDS ----------------------
    --> GC_RESTART             :       true     
    --> GC_BCs                 :       true
    --> HEMCO_RESTART          :       true
`

10.- _Do we modify HISTORY.rc t_o ask for boundary conditions? Or we don't really need to do this__. If we need to, should we also change BoundaryConditions.frequency: 00000000 030000 to 1 hour, just as we did in step 2 or is it not really necessary? Also, in HISTORY.rc we will request 'Restart', as output. (This might bring an error later on).

#==============================================================================
# %%%%% THE BoundaryConditions COLLECTION %%%%%
#
# GEOS-Chem boundary conditions for use in nested grid simulations
#
# Available for all simulations
#==============================================================================
  BoundaryConditions.template:   '%y4%m2%d2_%h2%n2z.nc4',
  BoundaryConditions.format:     'CFIO',
  BoundaryConditions.frequency:  00000000 030000    **Should we change this to 1**
  BoundaryConditions.duration:   00000001 000000
  BoundaryConditions.mode:       'instantaneous'
  BoundaryConditions.LON_RANGE:  -130.0 -60.0,    **This is does not show up in HISTORY.rc should we add it?**
  BoundaryConditions.LAT_RANGE:  10.0 60.0,           **This is does not show up in HISTORY.rc should we add it?**
  BoundaryConditions.fields:     'SpeciesBC_?ADV?             ', 'GIGCchem',`

11.- We compile using this (Or is it something else? @FeiYao-Edinburgh)

make realclean make -j4 build NC_DIAG=y BPCH_DIAG=n TIMERS=1

12.- We download files from dry run, and modify some file extensions manually like ".NA.nc" to ".nc" in ExtData to be able to pull them

13.- We then run ./geos, and then get our .nc file in Output Dir?

I should be clearly missing many steps, or this might be entirely wrong so your input would be very valuable

Once we have some input on how to approach this, we will be sharing with you a Public ami, to troubleshoot some of the errors that we expect to see, also we will share the following files (once we make them): GC_log.txt, HEMCO.log.txt, input.geos.txt, HISTORY.rc.txt, HEMCO_Config.rc.txt

yantosca commented 4 years ago

Thanks for writing. I agree, the nested-grid documentation is somewhat out of date. We are going to be rewriting our documentation for the GEOS-Chem 13.0.0 release, which will happen later this year. Also in 13.0.0, the run directory generation will happen from the source code directory instead of from the unit tester (which will be retired).

In the meantime, I will try to answer your questions the best I can:

Creating the Boundary Conditions in geosfp_4x5_aerosol

1.- Creating the boundary conditions: First we create a Directory from /UT/perl/CopyRunDirs.Input for Offline Aerosol _(or should we create the BCs by it activating geosfp standard?

The global simulation from which you will save out the boundary conditions for the nested run has to be the same type of simulation as the nested run. In this case, you'll want to set up a geosfp_4x5_aerosol run directory and use that to save out the boundary conditions. You should start the global run at least 1 day before the start of the nested run.

_ by activating GEOSFP Aerosol only 4x5 aerosol (or is 2x2.5 better?), for the date in which we plan to generate our boundary conditions. This date has to be one day before the date of our nested run.

You could also set up a geosfp_2x25_aerosol simulation for the boundary conditions but that will of course take longer to run and the boundary condition files will be larger. So I think the 4x5 is fine for the purpose of boundary conditions.

As noted, you should start the global run to save out BC's at least one day before the start of the nested run.

2.-Once the directory is generated, we go there, and modify HISTORY.rc, there we turn on 'Restart', 'SpeciesConc', and 'BoundaryConditions' (or should we turn on/off something else?),

If you are only saving out boundary conditions then you can omit saving out the SpeciesConc diagnostic. That will save disk space and time.

then we modify BoundaryConditions.frequency: 00000000 030000 to 1 hour, so that we can address the hourly issue specified by @msulprizio.

That is correct.

4.- Then in input.geos we modify the Latitude and Longitude to the area of interest, for example, lat and lon for california, (_or is it for North America or is it globally, (leaving it as it was)_). Here, should we turn Nested grid simulation on, or should we leave it off? Also, If we want to run, a Nested Grid for California, should we run the BC for a global area (-180 180; -90 90) for NA (-140 -40; 10 70), or can we do it for a Custom Area (Lets say, western US) ? Do nested and global dimensions have to agree, or can we do global for North America and nested for California?).

For the global simulation you will need these specifications:

%%% GRID MENU %%%       :
Grid resolution         : 4.0x5.0
Longitude min/max       : -180.0 180.0
Latitude  min/max       :  -90.0  90.0
 Half-sized polar boxes?: T
Number of levels        : 72
Nested grid simulation? : F
 Buffer zone (N S E W ) :  0  0  0  0

5.- Then compile, using make realclean ; make -j4 build NC_DIAG=y BPCH_DIAG=n TIMERS=1, (Or is it something else, as @FeiYao was doing for version 12.2?).

This should work. But you can also compile using CMake](http://wiki.seas.harvard.edu/geos-chem/index.php/Compiling_with_CMake), as described here:

5.1.- Download data, using the dry run, and execute ./geos to get the boundary conditions in OutputDir

Correct.

Creating the nested grid directory in geosfp_4x5_aerosol_na

6.- Once we have created the BC, we move again to /UT/perl/CopyRunDirs.Input and create a new directory by activating GEOSFP Aerosol only 4x5 aerosol and modifying it to nested as @FeiYao-Edinburgh did (or would it be advisable to modify it from the tropchem nested grid?)

  1. -We modify the generated directory (either geosfp_4x5_aerosol or geosfp_025x03125_tropchemna). How do we do this? What are we supposed to change to make this happen?_ . This is one of the most important questions, but it was not entirely clear how Fei-Yao approached it here

8.-Once the geosfp_4x5_aerosol_na directory is set up, and we modify everything to make it run (How does input.geos look?). Can we select the lat and longitud of our interest (P.ex to Los Angeles).

Right now, the GEOS-Chem Unit Tester does not have a nested-aerosol option. I think the quick thing to do would be to create a new geosfp_4x5_aerosol run directory (the same way as before) and rename it. Then copy the boundary condition files from the 4x5 run directory to the 025x03125 run directory:

cp -R geosfp_4x5_aerosol geosfp_025x3125_aerosol_na
cd  geosfp_025x3125_aerosol_na
mkdir BC_4x5
cp ../ geosfp_4x5_aerosol/OutputDir/GEOSChem.BoundaryConditions* ./BC_4x5/

ALSO NOTE: You might need to concatenate boundary condition files together (you can do this with nco ncrcat) so that each file has 24 time points.

Settings for geosfp_025x03125_aerosol_na/input.geos

`%%% GRID MENU %%%       :
Grid resolution         : 025x03125    
Longitude min/max       : 114.0 119.0 
Latitude  min/max       : 38.0 42.0   
 Half-sized polar boxes?: F
Number of levels        : 72
Nested grid simulation? : T             
 Buffer zone (N S E W ) :  3  3  3  3

You may want to go a little bit further in E/W longitude and N/S latitude to make sure that the area of interest does not fall within the buffer zone.

Settings for geosfp_025x03125_aerosol_na/HISTORY.rc

ALSO VERY IMPORTANT!!! In the geosfp_025x3125_aerosol_na/HISTORY.rc file, make sure you turn off the BoundaryConditions collection.

At a minimum you'll want Restart and SpeciesConc. Any other diagnostics are up to you.

COLLECTIONS: 'Restart',
             'SpeciesConc',
             #'Budget',
             #'AerosolMass',
             #'Aerosols',
             #'CloudConvFlux',
             #'DryDep',
             #'LevelEdgeDiags',
             #'ProdLoss',
             #'StateChm',     
             #'StateMet',      
             #'WetLossConv',
             #'WetLossLS',
             #'Transport',
             #'BoundaryConditions',

Settings for geosfp_025x03125_aerosol_na/HEMCO_Config.rc

9.- We modify HEMCO_config.rc, just as we did in step 3, to be able to download the data, and we also modify the Metdir, to be able to read _NA data.

METDIR: /project/data/ExtData/GEOS_0.5x0.625_NA/GEOSFP

in the same file, we also modify the path to our boundary conditions just as @msulprizio Explained (For 12.7 solved for later versions).

That's correct. Since my instructions above have the boundary condition files in BC_4x5, then this is the path to use:

`#==============================================================================
# --- GEOS-Chem boundary condition file ---
#==============================================================================
(((GC_BCs
* BC_                 ./BC_4x5/GEOSChem.BoundaryConditions.$YYYY$MM$DD_$HH$MNz.nc4 SpeciesBC_?ADV?  1980-2019/1-12/1-31/* RFY xyz 1 * - 1 1   (Note that we substituted 1-23 for *)
)))GC_BCs

That is all correct. The HEMCO_Config.rc file should now point to the BC_4x5 subdirectory (where we copied the boundary conditions above).

Also from the wiki

`# ExtNr ExtName                on/off  Species 
0       Base                   : on    *
# ----- RESTART FIELDS ----------------------
--> GC_RESTART             :       true     
--> GC_BCs                 :       true
--> HEMCO_RESTART          :       true
`

These are the correct settings.

10.- _Do we modify HISTORY.rc t_o ask for boundary conditions?

No, this is only necessary from the global simulation.

11.- We compile using this (Or is it something else? @FeiYao-Edinburgh)

make realclean
make -j4 build NC_DIAG=y BPCH_DIAG=n TIMERS=1`

12.- We download files from dry run, and modify some file extensions manually like ".NA.nc" to ".nc" in ExtData to be able to pull them

I'm not sure you need to rename files. Try it first.

13.- We then run ./geos, and then get our .nc file in Output Dir?

Yes, that's it!

Also tagging @msulprizio who has done a lot of work with FlexGrid and the nested simulations.

Hope this helps!!!

neyranaz commented 4 years ago

Thanks a lot @Yantosca, your comments are exceptionally helpful.
I am in the process of generating the Boundary Conditions (End of Step 5), working on geosfp_4x5_aerosol, but I keep getting a specific error:

===============================================================================
GEOS-Chem ERROR: Error encountered in "HCO_Run"!
 -> at HCOI_GC_Run (in module GeosCore/hcoi_gc_main_mod.F90)

THIS ERROR ORIGINATED IN HEMCO!  Please check the HEMCO log file for 
additional error messages!
===============================================================================

===============================================================================
GEOS-Chem ERROR: Error encountered in "HCOI_GC_Run"!
 -> at Emissions_Run (in module GeosCore/emissions_mod.F90)
===============================================================================

===============================================================================
GEOS-CHEM ERROR: Error encountered in "Emissions_Run", Phase 0

This error can indicate a missing file. Please check the HEMCO log file for 
additional error messages!

This error, seems to be a result associated to the use of BIOVOC, because when I run it without BIOVOC, the code does not bring any errors. It seems the issue is a result of this modification which was done about 9 months ago.

To address it I played for a while with the HEMCO_Config.rc script and managed to get rid of some of the errors by including the paths to read variables such as BIOGENIC_LIMO_SOAP. However, the code would bring other errors, when those paths were added.This tells me that by adding the data paths to solve the data pulling errors, we might end up uncorrecting the errors that @Msulprizio corrected by ignoring variables such as BIOGENIC_SOAP and BIOGENICSOAS. What would you recommend me to do about this one. (Files in this link)

yantosca commented 4 years ago

Thanks for writing. I think the problem is that the OFFLINE_BIOVOC data at 0.25 x 0.3125 only extends from 2015 - 2017. Because you are running for 2014, you are outside of that range.

I think it is OK to turn off the OFFLINE_BIOVOC emissions for your work.

neyranaz commented 4 years ago

Thank you for your reply. I did observe that those data only extended from 2015-2017, however I corrected this issue before running it. I did this by changing the paths to run coarser data (As 2014 was available in 0.5x0.625). This approach has worked for me before, for instance, when performing a 4x5 standard run for the same dates, I was successful at running it including Biovoc, so I am not sure why the same approach is not working when doing it for a 4x5 aerosol only run.

#==============================================================================
# --- Offline biogenic VOC emissions ---
#==============================================================================
(((OFFLINE_BIOGENICVOC
0 BIOGENIC_ACET      $ROOT/OFFLINE_BIOVOC/v2019-01/0.5x0.625/$YYYY/$MM/biovoc_05.$YYYY$MM$DD.nc ACET_MEGAN    1980-2017/1-12/1-31/* C xy kgC/m2/s ACET -   4 2
0 BIOGENIC_ALD2      $ROOT/OFFLINE_BIOVOC/v2019-01/0.5x0.625/$YYYY/$MM/biovoc_05.$YYYY$MM$DD.nc ALD2_MEGAN    1980-2017/1-12/1-31/* C xy kgC/m2/s ALD2 -   4 2
neyranaz commented 4 years ago

We have made great progress thanks to @Yantosca 's insights. However, as we have advanced on the generation of the boundary conditions, we have encountered three issues (A,B,C) that we are not sure how to tackle:

A) We first ran the boundary conditions for aerosol only, from 20140101 until 20141231 without any problem (Using some debugging that we will specify later in this thread). However, when trying to transition from 2014 to 2015, we obtained the following error:

HISTORY (INIT): Opening ./HISTORY.rc
===============================================================================
GEOS-Chem ERROR: No diagnostic output will be created for collection: 
"Restart"!  Make sure that the length of the simulation as specified in 
"input.geos" (check the start and end times) is not shorter than the frequency 
setting in HISTORY.rc!  For example, if the frequency is 010000 (1 hour) but 
the simulation is set up to run for only 20 minutes, then this error will occur.
 -> at History_ReadCollectionData (in module History/history_mod.F90)

 -> ERROR occurred at (or near) line     65 of the HISTORY.rc file
===============================================================================

===============================================================================
GEOS-Chem ERROR: Error encountered in "History_ReadCollectionData"!
 -> at History_Init (in module History/history_mod.F90)

This occurs when we try to get the BCs from 20141231 - 20150105. To troubleshoot, we did a dry-run download, to discard any issues with missing files and file-paths inconsistencies for such a date period. So, we were thinking that this error could be related to something similar to what occurred here. Do you have any recommendation to be able to transition from 2014 to 2015?

-

B) We intend to do a five year run, so we wanted to know if it would be advisable to run the Boundary Conditions in segments (2014, 2015, 2016...) or to run the five years altogether? As @FeiYao-Edimburgh mentioned in a still open issue, GC generates the 'Restart' file at the end of each run, as it is preset in this way in 'History.rc', so having the possibility to obtain the Restart files on a daily basis, would justify running the years altogether, since it would allow the generation of restart files in case GC crashes. (Also related to this issue, is a potential bug described here).

-

C) When generating BC, we were considering generating our BCs in a smaller spatial subset, instead of a global run. Would this be possible? And if so, should we just add the two lines below to the HISTORY.rc file as described in the wiki? We looked for them in our HISTORY.rc file but they are not there in GC 12.7.

BoundaryConditions.LON_RANGE: -130.0 -60.0,
BoundaryConditions.LAT_RANGE:  10.0 60.0,
#==============================================================================
# %%%%% THE BoundaryConditions COLLECTION %%%%%
#
# GEOS-Chem boundary conditions for use in nested grid simulations
#
# Available for all simulations
#==============================================================================
  BoundaryConditions.template:   '%y4%m2%d2_%h2%n2z.nc4',
  BoundaryConditions.format:     'CFIO',
  BoundaryConditions.frequency:  00000000 030000
  BoundaryConditions.duration:   00000001 000000
  BoundaryConditions.mode:       'instantaneous'
  BoundaryConditions.LON_RANGE:  -130.0 -60.0,
  BoundaryConditions.LAT_RANGE:  10.0 60.0,
  BoundaryConditions.fields:     'SpeciesBC_?ADV?             ', 'GIGCchem',
::
yantosca commented 4 years ago

Thanks for replying @neyranaz. Glad to know most of your issues are solved. Here are my recommendations for this last set of issues you brought up:

  1. Your issue (A) above was due to a bug that wasn't fixed until 12.8.1. See: https://github.com/geoschem/geos-chem/issues/305, You can apply that same fix to your version of 12.7.0 and then you should be able to go past the year boundary without any problems.

  2. The global run to generate the boundary conditions should probably be sequential (i.e. 2014 first, then 2015, etc.) This will allow you to capture the changing conditions as your run evolves. You can set this up as a 5-year GEOS-Chem simulation and then save out restart files each month (so that if the run dies you can start again from the last restart file). As @msulprizio pointed out in https://github.com/geoschem/geos-chem/issues/398, you can use 00000100 000000 in the Restart collection to request monthly output.

  3. It is possible to subset any HISTORY diagnostic (including boundary conditions) using LON_RANGE and LAT_RANGE to specify the bounding box that you want to save out. So you have correctly noted:

    #==============================================================================
    # %%%%% THE BoundaryConditions COLLECTION %%%%%
    #
    # GEOS-Chem boundary conditions for use in nested grid simulations
    #
    # Available for all simulations
    #==============================================================================
    BoundaryConditions.template:   '%y4%m2%d2_%h2%n2z.nc4',
    BoundaryConditions.format:     'CFIO',
    BoundaryConditions.frequency:  00000000 030000
    BoundaryConditions.duration:   00000001 000000
    BoundaryConditions.mode:       'instantaneous'
    BoundaryConditions.LON_RANGE:  -130.0 -60.0,
    BoundaryConditions.LAT_RANGE:  10.0 60.0,
    BoundaryConditions.fields:     'SpeciesBC_?ADV?             ', 'GIGCchem',
    ::

    then that will save out only the rectangular region with diagonally opposite corners (-130, -60) and (10,60). HEMCO should be able to read and regrid that into your nested simulation. So even though your simulation is running globally, the boundary conditions is adapted to your nested-grid region (and that saves disk space too).

neyranaz commented 4 years ago

This information is exceptionally valuable, thank you @Yantosca We will be adding our troubleshoot-results, and other bug-fixes soon!

neyranaz commented 3 years ago

Heads up, with using v12.7.0 to perform a Nested Grid Simulation, as there have been multiple issues with the dry run, as explained here here, and here . The issues mentioned, led to Dust accumulation as GC was repeating the use of certain files.