geoschem / gchp_legacy

Repository for GEOS-Chem High Performance: software that enables running GEOS-Chem on a cubed-sphere grid with MPI parallelization.
http://wiki.geos-chem.org/GEOS-Chem_HP
Other
7 stars 13 forks source link

[BUG/ISSUE] GCHP terminating when reading NEI99 seasonal scaling #26

Closed kilicomu closed 5 years ago

kilicomu commented 5 years ago

Hi GCST.

I have attached my runConfig.sh, HEMCO_Config.rc, and model run logs for both a c48 and c360 run.

At both C48 and C360, running from between 2015-12-01 00:00:00 and 2015-12-07 00:00:00, GCHP terminates seemingly due to being unable to read information about NEI99 seasonal scaling.

The relevant log entries are:

>> Reading  CO from ./MainDataDir/NEI2005/v2014-09/scaling/NEI99.season.geos.1x1.nc
 DEBUG: Scanning fixed file ./MainDataDir/NEI2005/v2014-09/scaling/NEI99.season.geos.1x1.nc for side L
 DEBUG: Opening file: ./MainDataDir/NEI2005/v2014-09/scaling/NEI99.season.geos.1x1.nc
WARNING: Requested sample not found in file ./MainDataDir/NEI2005/v2014-09/scaling/NEI99.season.geos.1x1.nc

and:

  >> >> Reading times from ./MainDataDir/NEI2005/v2014-09/scaling/NEI99.season.geos.1x1.nc
  >> >> File timing info: 0019990101 0000000000 0012
 DEBUG: GetBracketTimeOnSingleFile called for ./MainDataDir/NEI2005/v2014-09/scaling/NEI99.season.geos.1x1.nc
  >> >> Reading times from fixed (F) file ./MainDataDir/NEI2005/v2014-09/scaling/NEI99.season.geos.1x1.nc
 >> >> >> File start    : 1999-01-01 00:00:00
 >> >> >> File end      : 1999-12-01 00:00:00
 >> >> >> Time requested: 2015-12-02 00:00:00
 DEBUG: Extrapolation flags (0) are F F F T for file ./MainDataDir/NEI2005/v2014-09/scaling/NEI99.season.geos.1x1.nc
 DEBUG: Requested time is after or on last available sample in file ./MainDataDir/NEI2005/v2014-09/scaling/NEI99.season.geos.1x1.nc
 DEBUG: Extrapolation flags (2) are F F F F for file ./MainDataDir/NEI2005/v2014-09/scaling/NEI99.season.geos.1x1.nc
WARNING: Requested sample not found in file ./MainDataDir/NEI2005/v2014-09/scaling/NEI99.season.geos.1x1.nc
 ERROR: Bracket timing request failed on fixed file ./MainDataDir/NEI2005/v2014-09/scaling/NEI99.season.geos.1x1.nc for side L

At both resolutions, after these messages appear in the log, an MPI_ABORT is issued from one process and this cascades to the others. At a glance, it looks as though the extrapolation flags aren't being handled in the same way as for other datasets.

Apologies in advance if I have not applied some config that I obviously need to apply!

NEI99_ISSUE_REPORT.tar.gz

lizziel commented 5 years ago

Hi Killian,

I was able to reproduce the issue at c24 with a 2016 run starting 12/3. It is a mysterious issue since there is a user here at Harvard who has been doing multiple year full-chem simulations with GCHP and has not encountered it before. I am looking into what the trigger is exactly and how it can be fixed.

In the meantime I think you can get around this by commenting out the NEI.season lines in HEMCO_config.rc. I checked the usage of the scale factors by searching for the presence of the integer starting each line, and it appears none of the NEI99.season data is actually used. Disabling them should have no impact on your run.

Lizzie

lizziel commented 5 years ago

This appears to be a bug in GCHP ExtData.rc that only triggers an error if starting a run on a day other than the first of the month.

NEI99.season.geos.1x1.nc is a climatology file with 12 values per variable, one value per month. The climatology column of ExtData.rc should therefore be Y, and a monthly temporal frequency should also be set. In GCHP 12.3 and prior, however, climatology is set to N and no frequency is given. This makes MAPL look for the current date in the file and errors when none is found. This problem is only encountered if not starting on the first of the month since the times in the file correspond to day 1 across all months.

The new entries in ExtData.rc should look like this: NEI99_SEASON_CO 1 xy C Y N F1999-%m2-01T00:00:00 none none CO ./MainDataDir/NEI2005/v2014-09/scaling/NEI99.season.geos.1x1.nc NEI99_SEASON_ALK4 1 xy C Y N F1999-%m2-01T00:00:00 none none ALK4 ./MainDataDir/NEI2005/v2014-09/scaling/NEI99.season.geos.1x1.nc NEI99_SEASON_ACET 1 xy C Y N F1999-%m2-01T00:00:00 none none ACET ./MainDataDir/NEI2005/v2014-09/scaling/NEI99.season.geos.1x1.nc NEI99_SEASON_MEK 1 xy C Y N F1999-%m2-01T00:00:00 none none MEK ./MainDataDir/NEI2005/v2014-09/scaling/NEI99.season.geos.1x1.nc NEI99_SEASON_PRPE 1 xy C Y N F1999-%m2-01T00:00:00 none none PRPE ./MainDataDir/NEI2005/v2014-09/scaling/NEI99.season.geos.1x1.nc NEI99_SEASON_C3H8 1 xy C Y N F1999-%m2-01T00:00:00 none none C3H8 ./MainDataDir/NEI2005/v2014-09/scaling/NEI99.season.geos.1x1.nc NEI99_SEASON_C2H6 1 xy C Y N F1999-%m2-01T00:00:00 none none C2H6 ./MainDataDir/NEI2005/v2014-09/scaling/NEI99.season.geos.1x1.nc NEI99_SEASON_SO2 1 xy C Y N F1999-%m2-01T00:00:00 none none SO2 ./MainDataDir/NEI2005/v2014-09/scaling/NEI99.season.geos.1x1.nc NEI99_SEASON_SO4 1 xy C Y N F1999-%m2-01T00:00:00 none none SO4 ./MainDataDir/NEI2005/v2014-09/scaling/NEI99.season.geos.1x1.nc NEI99_SEASON_MSA 1 xy C Y N F1999-%m2-01T00:00:00 none none MSA ./MainDataDir/NEI2005/v2014-09/scaling/NEI99.season.geos.1x1.nc NEI99_SEASON_BCPI 1 xy C Y N F1999-%m2-01T00:00:00 none none BCPI ./MainDataDir/NEI2005/v2014-09/scaling/NEI99.season.geos.1x1.nc NEI99_SEASON_OCPI 1 xy C Y N F1999-%m2-01T00:00:00 none none OCPI ./MainDataDir/NEI2005/v2014-09/scaling/NEI99.season.geos.1x1.nc

Killian, your log files indicate that you started your runs on Dec 2 even though the start date in runConfig.sh is Dec 1. Was this intentional? In case it wasn't, here is an explanation of why. You must have done a previous run that finished on Dec 2. That run output a text file called cap_restart with the end date string. Unless cap_restart is removed prior to running, the next run will start on that date if it is within the start/end range you specified in runConfig.sh. This is a feature of MAPL used in GEOS and therefore carries over into GCHP. The sample run scripts automatically delete cap_restart to avoid this. Doing make cleanup_output will also delete it as part of cleaning up the run directory for a new run. For more information on this see the GCHP user manual chapter called Running GCHP: Basics, section Rerunning without cleaning.

All this being said, if the factors are not being used then they should be turned off in HEMCO_Config.rc to not slow down run-time by reading. This update and the above fix will go into GCHP 12.4.0.

kilicomu commented 5 years ago

Thanks for looking into this Lizzie - I have rerun with the NEI99 seasonal scaling commented out in HEMCO_Config.rc and this problem no longer manifests itself.

Thanks also for reminding me about cap_restart! I forgot to include the removal of it in my batch script - I have corrected this.

I'm happy for this issue to be closed, unless you wish to keep it open until the fix is implemented in 12.4.0.