geoschem / geos-chem

GEOS-Chem "Science Codebase" repository. Contains GEOS-Chem science routines, run directory generation scripts, and interface code. This repository is used as a submodule within the GCClassic and GCHP wrappers, as well as in other modeling contexts (external ESMs).
http://geos-chem.org
Other
169 stars 165 forks source link

[BUG/ISSUE] GEOS-Chem dry run doesn't pull files if earlier ones are available #312

Closed sdeastham closed 4 years ago

sdeastham commented 4 years ago

Describe the bug

When running the GEOS-Chem dry run, it will usually correctly identify if (for example) OFFLINE_BIOVOC files are missing. However, if there are files present from before the target date (e.g. if I have downloaded files for Y2007 but not Y2008 or Y2009), then the dry run will reuse the earlier files rather than flagging that the target files are missing. If the cycling flag is changed from "C" to "EF" or "E", this still does not resolve the error - no new BIOVOC files are flagged.

To Reproduce

Include the steps that must be done in order to reproduce the observed behavior:

Preparation

  1. In HEMCO_Config.rc, point your OFFLINE_BIOVOC files to use a new folder, e.g. LOCAL_BIOVOC
  2. Copy (or softlink) only the year 2007 BIOVOC files to LOCAL_BIOVOC

Run commands

  1. Set your input.geos to run from 2008-01-01 to 2008-02-01
  2. Run ./geos --dryrun > log.dryrun
  3. Inspect log.dryrun

You should see that the old OFFLINE_BIOVOC files are being used, rather than the missing target files being flagged.

Expected behavior

The missing files should be flagged in the log.

Required information

Please include the following:

Input and log files to attach

log.dryrun.2008.txt The attached log file shows the run output - it correctly opens the 2008-07-01 file for BIOVOC (which is present on the system), but then starts looping over earlier years. For a clear example see lines 898-901; HEMCO is reading in data for 2008-07-27, but the lines for the four offline emissions read:

HEMCO: Opening /n/holyscratch01/jacob_lab/seastham/ExtData/HEMCO/OFFLINE_DUST/v2019-01/0.5x0.625/2008/07/dust_emissions_05.20080727.nc
HEMCO: Opening /n/holyscratch01/jacob_lab/seastham/ExtData/HEMCO/OFFLINE_BIOVOC/v2019-10/0.5x0.625/2008/07/biovoc_05.20080701.nc
HEMCO: Opening /n/holyscratch01/jacob_lab/seastham/ExtData/HEMCO/OFFLINE_SEASALT/v2019-01/0.5x0.625/2008/07/seasalt_05.20080727.nc
HEMCO: Opening /n/holyscratch01/jacob_lab/seastham/ExtData/HEMCO/OFFLINE_SOILNOX/v2019-01/0.5x0.625/2008/07/soilnox_05.20080727.nc

In this case, I'm using directories which are softlinked to all the correct ones on the Harvard Cannon cluster, but my own directories for OFFLINE_BIOVOC (as the MERRA-2 biogenic VOC emissions aren't all available at Harvard). The key point is that the BIOVOC file is still 20080701, whereas the others are all (correctly) 2008-07-27.

Additional context

This is a slightly difficult issue to resolve because sometimes we specifically want to loop earlier data. Perhaps for the dryrun option we should be expecting that fields with the R field actually have requirements satisfied?

Edit: tagging @msulprizio and @jimmielin as I think you'll both have insights on this particular issue!

yantosca commented 4 years ago

Thanks for writing. I think this is because the dryrun code in hcoio_read_std_mod.F90 is a little simple-minded in that it lists the file for the given date as missing.

For example, on the AWS c;loud, we have these years of offline biogenics available:

$ s3ls s3://gcgrid/HEMCO/OFFLINE_BIOVOC/v2019-10/0.5x0.625/
                           PRE 2014/
                           PRE 2015/
                           PRE 2016/
                           PRE 2017/

So there are offline BIOVOC files from 2014-2017 available. But the HEMCO code where we look for missing data files is in this permalink:

https://github.com/geoschem/geos-chem/blob/e3b3b6570af07b1cde3a6d25b143451da25dbb9f/HEMCO/Core/hcoio_read_std_mod.F90#L264-L408

As you can see, at line 408, we exit the routine without going deeper into the HEMCO code to figure out what the closest date would be.

Perhaps this could be a feature request for the new HEMCO that will go into dev/13.0.0. Right now for some of these edge cases the dry-run might need to be augmented by manual download.

yantosca commented 4 years ago

Or perhaps this is an issue that can be solved in the download_data.py script.

jennyfisher commented 4 years ago

Hi @yantosca - I think I am having the same issue. I was wondering why my run kept dying with an Invalid time index error for the WMO_2018 surface VMR files. Eventually I realised that my 2015 run was trying to use the 2008 file we already had instead of telling me that the 2015 one was missing.

It would be good to fix this, because with the dry run (which is great!), it means someone in the group might be doing a 2015 run so the download script will download just the 2015 files that they need. But then if someone tries to run a different year, the dry run won't report any files they need to download.

Obviously some of them (like WMO_2018) cause the run to die (I'm not quite sure why, as the "C" flag is set). But I am now nervous that there might be others where the run doesn't die, and the model is ticking along using the wrong year's data without us being aware. Is there a way we could test for that?

Thanks, Jenny

pkasibhatla commented 4 years ago

I am having the same issue with met files. I set up a dry run to get MERRA2 files from 20190701 to 20200501. Running download_data.py downloaded met data through 20190828 and stopped (I presume because wget failed for some reason). All my subsequent attempts to generate a dryrun log file with the correct remaining files failed.

yantosca commented 4 years ago

Hi all. I replicated Prasad's dry run on the AWS cloud. I am finding file paths such as:

/home/ubuntu/ExtData/GEOS_4x5/MERRA2/0001/01/MERRA2.00010101.A1.4x5.nc4
/home/ubuntu/ExtData/GEOS_4x5/MERRA2/0001/01/MERRA2.00010101.A3cld.4x5.nc4
/home/ubuntu/ExtData/GEOS_4x5/MERRA2/0001/01/MERRA2.00010101.A3dyn.4x5.nc4
/home/ubuntu/ExtData/GEOS_4x5/MERRA2/0001/01/MERRA2.00010101.A3mstC.4x5.nc4
/home/ubuntu/ExtData/GEOS_4x5/MERRA2/0001/01/MERRA2.00010101.A3mstE.4x5.nc4
/home/ubuntu/ExtData/GEOS_4x5/MERRA2/0001/01/MERRA2.00010102.I3.4x5.nc4

which are obviously not correct. I am looking into if any modifications since 12.7.0 (when the dry-run was introduced) can be causing this. Stay tuned.

pkasibhatla commented 4 years ago

This may be obvious, but thought I would post . I modified the input.geos and HEMCO_Config.rc files to point to a non-existent data directory and reran the dry run - in this case the correct dryrun log file was generated. So it is the existence of files (even if they are not the correct ones) that seems to be creating problems in the dry run.

pkasibhatla commented 4 years ago

So based on my previous comment, I came up with a crude fix that seems to work. I create my dry run logfile by modifying my input.geos and HEMCO_Config.rc files to point to a non-existent root data directory (instead of /work/psk9/Data/ExtData, I specify it as /work/psk9x/Data/ExtData) and generate the log file with the correct listing of files. Then I edit input.geos and HEMCO_Config.rc to specify the correct root data directory and run download_data.py - this only downloads missing files, though it is a bit slow because it checks for all the files and prints the message saying it is not retrieving files that already exist.

yantosca commented 4 years ago

Hi all, I think I have found the problem. I am not sure if this is a side-effect of recent modifications to HEMCO, or if it always was this way. I put in some debug print to subroutine SrcFile_Parse in hcoio_mod.F90:

    !=================================================================
    ! SrcFile_Parse
    !=================================================================

    ! Initialize to input string
    srcFile = Lct%Dct%Dta%ncFile
    IF ( INDEX(  Lct%Dct%Dta%ncFile, '$METDIR' ) > 0 ) THEN
       print*, '@@@ in sfp 0: ', TRIM(srcFile)
    ENDIF

    ! verbose mode
    IF ( HCO_IsVerb(HcoState%Config%Err,3) ) THEN
       WRITE(MSG,*) 'Parsing source file and replacing tokens'
       CALL HCO_MSG(HcoState%Config%Err,MSG)
    ENDIF

    ! Get preferred dates (to be passed to parser)
    CALL HCO_GetPrefTimeAttr ( HcoState, Lct, &
                               prefYr, prefMt, prefDy, prefHr, prefMn, RC )
    IF ( RC /= HCO_SUCCESS ) RETURN

    ! Make sure dates are not negative
    IF ( prefYr <= 0 ) THEN
       CALL HcoClock_Get( HcoState%Clock, cYYYY = prefYr, RC = RC )
       IF ( RC /= HCO_SUCCESS ) RETURN
    ENDIF
    IF ( prefMt <= 0 ) THEN
       CALL HcoClock_Get( HcoState%Clock, cMM   = prefMt, RC = RC )
       IF ( RC /= HCO_SUCCESS ) RETURN
    ENDIF
    IF ( prefDy <= 0 ) THEN
       CALL HcoClock_Get( HcoState%Clock, cDD   = prefDy, RC = RC )
       IF ( RC /= HCO_SUCCESS ) RETURN
    ENDIF
    IF ( prefHr <  0 ) THEN
       CALL HcoClock_Get( HcoState%Clock, cH    = prefHr, RC = RC )
       IF ( RC /= HCO_SUCCESS ) RETURN
    ENDIF

    ! Eventually replace default preferred year with specified one
    IF ( PRESENT(Year) ) prefYr = Year

    ! Call the parser
    CALL HCO_CharParse ( HcoState%Config, srcFile, prefYr, prefMt, prefDy, prefHr, prefMn, RC )
    IF ( RC /= HCO_SUCCESS ) RETURN
    srcFileOrig = TRIM(srcFile)
    IF ( INDEX(  Lct%Dct%Dta%ncFile, '$METDIR' ) > 0 ) THEN
       print*, '@@@ in sfp 1: ', TRIM(srcFile)
    ENDIF

    ! Check if file exists
    INQUIRE( FILE=TRIM(srcFile), EXIST=HasFile )
    IF ( INDEX(  Lct%Dct%Dta%ncFile, '$METDIR' ) > 0 ) THEN
       print*, '@@@ in sfp 2: ', Hasfile
    ENDIF

    ! If the direction flag is on, force HasFile to be false.
    IF ( PRESENT(Direction) ) THEN
       IF ( Direction /= 0 ) HasFile = .FALSE.
    ENDIF

    ! If file does not exist, check if we can adjust prefYr, prefMt, etc.
    IF ( .NOT. HasFile .AND. Lct%Dct%DctType /= HCO_CFLAG_EXACT ) THEN

       ! Check if any token exist
       HasYr = ( INDEX(TRIM(Lct%Dct%Dta%ncFile),'YYYY') > 0 )
       HasMt = ( INDEX(TRIM(Lct%Dct%Dta%ncFile),'MM'  ) > 0 )
       HasDy = ( INDEX(TRIM(Lct%Dct%Dta%ncFile),'DD'  ) > 0 )
       HasHr = ( INDEX(TRIM(Lct%Dct%Dta%ncFile),'HH'  ) > 0 )

       ! Search for file
       IF ( HasYr .OR. HasMt .OR. HasDy .OR. HasHr ) THEN

          ! Date increments
          INC = -1
          IF ( PRESENT(Direction) ) THEN
             INC = Direction
          ENDIF

          ! Initialize counters
          CNT = 0

          ! Type is the update type (see below)
          TYP = 0

          ! Mirror preferred variables
          origYr = prefYr
          origMt = prefMt
          origDy = prefDy
          origHr = prefHr

          ! Do until file is found or counter exceeds threshold
          DO WHILE ( .NOT. HasFile )

             ! Inrease counter
             CNT = CNT + 1
             IF ( CNT > MAXIT ) EXIT

             ! Increase update type if needed:
             nextTyp = .FALSE.

             ! Type 0: Initialization
             IF ( TYP == 0 ) THEN
                nextTyp = .TRUE.
             ! Type 1: update hour only
             ELSEIF ( TYP == 1 .AND. TYPCNT > 24 ) THEN
                nextTyp = .TRUE.
             ! Type 2: update day only
             ELSEIF ( TYP == 2 .AND. TYPCNT > 31 ) THEN
                nextTyp = .TRUE.
             ! Type 3: update month only
             ELSEIF ( TYP == 3 .AND. TYPCNT > 12 ) THEN
                nextTyp = .TRUE.
             ! Type 4: update year only
             ELSEIF ( TYP == 4 .AND. TYPCNT > 300 ) THEN
                nextTyp = .TRUE.
             ! Type 5: update hour and day
             ELSEIF ( TYP == 5 .AND. TYPCNT > 744 ) THEN
                nextTyp = .TRUE.
             ! Type 6: update day and month
             ELSEIF ( TYP == 6 .AND. TYPCNT > 372 ) THEN
                nextTyp = .TRUE.
             ! Type 7: update month and year
             ELSEIF ( TYP == 7 .AND. TYPCNT > 3600 ) THEN
                EXIT
             ENDIF

             ! Get next type
             IF ( nextTyp ) THEN
                NEWTYP = -1
                IF     ( hasHr .AND. TYP < 1 ) THEN
                   NEWTYP = 1
                ELSEIF ( hasDy .AND. TYP < 2 ) THEN
                   NEWTYP = 2
                ELSEIF ( hasMt .AND. TYP < 3 ) THEN
                   NEWTYP = 3
                ELSEIF ( hasYr .AND. TYP < 4 ) THEN
                   NEWTYP = 4
                ELSEIF ( hasDy .AND. TYP < 2 ) THEN
                   NEWTYP = 5
                ELSEIF ( hasDy .AND. TYP < 2 ) THEN
                   NEWTYP = 6
                ELSEIF ( hasDy .AND. TYP < 2 ) THEN
                   NEWTYP = 7
                ENDIF

                ! Exit if no other type found
                IF ( NEWTYP < 0 ) EXIT

                ! This is the new type, reset type counter
                TYP    = NEWTYP
                TYPCNT = 0

                ! Make sure we reset all values
                prefYr = origYr
                prefMt = origMt
                prefDy = origDy
                prefHr = origHr

             ENDIF

             ! Update preferred datetimes
             SELECT CASE ( TYP )
                ! Adjust hour only
                CASE ( 1 )
                   prefHr = prefHr + INC
                ! Adjust day only
                CASE ( 2 )
                   prefDy = prefDy + INC
                ! Adjust month only
                CASE ( 3 )
                   prefMt = prefMt + INC
                ! Adjust year only
                CASE ( 4 )
                   prefYr = prefYr + INC
                ! Adjust hour and day
                CASE ( 5 )
                   prefHr = prefHr + INC
                   IF ( MOD(TYPCNT,24) == 0 ) prefDy = prefDy + INC
                ! Adjust day and month
                CASE ( 6 )
                   prefDy = prefDy + INC
                   IF ( MOD(TYPCNT,31) == 0 ) prefMt = prefMt + INC
                ! Adjust month and year
                CASE ( 7 )
                   prefMt = prefMt + INC
                   IF ( MOD(TYPCNT,12) == 0 ) prefYr = prefYr + INC
                CASE DEFAULT
                   EXIT
             END SELECT

             ! Check if we need to adjust a year/month/day/hour
             IF ( prefHr < 0 ) THEN
                prefHr = 23
                prefDy = prefDy - 1
             ENDIF
             IF ( prefHr > 23 ) THEN
                prefHr = 0
                prefDy = prefDy + 1
             ENDIF
             IF ( prefDy < 1  ) THEN
                prefDy = 31
                prefMt = prefMt - 1
             ENDIF
             IF ( prefDy > 31 ) THEN
                prefDy = 1
                prefMt = prefMt + 1
             ENDIF
             IF ( prefMt < 1  ) THEN
                prefMt = 12
                prefYr = prefYr - 1
             ENDIF
             IF ( prefMt > 12 ) THEN
                prefMt = 1
                prefYr = prefYr + 1
             ENDIF

             ! Make sure day does not exceed max. number of days in this month
             prefDy = MIN( prefDy, Get_LastDayOfMonth( prefMt, prefYr ) )

             ! Mirror original file
             srcFile = Lct%Dct%Dta%ncFile

             ! Call the parser with adjusted values
             CALL HCO_CharParse ( HcoState%Config, srcFile, prefYr, prefMt, prefDy, prefHr, prefMn, RC )
             IF ( RC /= HCO_SUCCESS ) RETURN

             ! Check if this file exists
             INQUIRE( FILE=TRIM(srcFile), EXIST=HasFile )

             IF ( INDEX(  Lct%Dct%Dta%ncFile, '$METDIR' ) > 0 ) THEN
                print*, '@@@ in sfp 2a: ', TRIM(srcFile)
                print*, '@@@ in sfp 2b: ', Hasfile
             ENDIF

             ! Update counter
             TYPCNT = TYPCNT + 1
          ENDDO
       ENDIF
    ENDIF

    ! Additional check for data with a given range: make sure that the selected
    ! field is not outside of the given range
    IF ( HasFile .AND. ( Lct%Dct%Dta%CycleFlag == HCO_CFLAG_RANGE ) ) THEN
       HasFile = TIDX_IsInRange ( Lct, prefYr, prefMt, prefDy, prefHr )
    ENDIF

    ! Restore original source file name and date to avoid confusion in log file
    IF ( .not. HasFile ) THEN
       srcFile = Trim(srcFileOrig)
    ENDIF

    ! Return variable
    FOUND = HasFile

    ! Return w/ success
    RC = HCO_SUCCESS
    IF ( INDEX(  Lct%Dct%Dta%ncFile, '$METDIR' ) > 0 ) THEN
       print*, '@@@ in sfp 3: ', Hasfile
    ENDIF

and then I did a dry run for 2019/07/01 to 2019/08/01. All good, as Prasad says. Then I did another dry-run for 2019/08/01 to 2019/09/01 with the debug printout enabled. Here is a snippet of what I found:

 @@@ in sfp 0: $METDIR/$YYYY/$MM/$MET.$YYYY$MM$DD.A1.$RES.$NC
 @@@ in sfp 1: /home/ubuntu/ExtData/GEOS_4x5/MERRA2/2019/08/MERRA2.20190802.A1.4x5.nc4
 @@@ in sfp 2:  F
 @@@ in sfp 2a: /home/ubuntu/ExtData/GEOS_4x5/MERRA2/2019/08/MERRA2.20190801.A1.4x5.nc4
 @@@ in sfp 2b:  T
 @@@ in sfp 3:  T
...
 @@@ in sfp 0: $METDIR/$YYYY/$MM/$MET.$YYYY$MM$DD.A1.$RES.$NC
 @@@ in sfp 1: /home/ubuntu/ExtData/GEOS_4x5/MERRA2/2019/08/MERRA2.20190803.A1.4x5.nc4
 @@@ in sfp 2:  F
 @@@ in sfp 2a: /home/ubuntu/ExtData/GEOS_4x5/MERRA2/2019/08/MERRA2.20190802.A1.4x5.nc4
 @@@ in sfp 2b:  F
 @@@ in sfp 2a: /home/ubuntu/ExtData/GEOS_4x5/MERRA2/2019/08/MERRA2.20190801.A1.4x5.nc4
 @@@ in sfp 2b:  T
 @@@ in sfp 3:  T
... and further down ...
 @@@ in sfp 0: $METDIR/$YYYY/$MM/$MET.$YYYY$MM$DD.A1.$RES.$NC
 @@@ in sfp 1: /home/ubuntu/ExtData/GEOS_4x5/MERRA2/2019/08/MERRA2.20190815.A1.4x5.nc4
 @@@ in sfp 2:  F
 @@@ in sfp 2a: /home/ubuntu/ExtData/GEOS_4x5/MERRA2/2019/08/MERRA2.20190814.A1.4x5.nc4
 @@@ in sfp 2b:  F
 @@@ in sfp 2a: /home/ubuntu/ExtData/GEOS_4x5/MERRA2/2019/08/MERRA2.20190813.A1.4x5.nc4
 @@@ in sfp 2b:  F
 @@@ in sfp 2a: /home/ubuntu/ExtData/GEOS_4x5/MERRA2/2019/08/MERRA2.20190812.A1.4x5.nc4
 @@@ in sfp 2b:  F
 @@@ in sfp 2a: /home/ubuntu/ExtData/GEOS_4x5/MERRA2/2019/08/MERRA2.20190811.A1.4x5.nc4
 @@@ in sfp 2b:  F
 @@@ in sfp 2a: /home/ubuntu/ExtData/GEOS_4x5/MERRA2/2019/08/MERRA2.20190810.A1.4x5.nc4
 @@@ in sfp 2b:  F
 @@@ in sfp 2a: /home/ubuntu/ExtData/GEOS_4x5/MERRA2/2019/08/MERRA2.20190809.A1.4x5.nc4
 @@@ in sfp 2b:  F
 @@@ in sfp 2a: /home/ubuntu/ExtData/GEOS_4x5/MERRA2/2019/08/MERRA2.20190808.A1.4x5.nc4
 @@@ in sfp 2b:  F
 @@@ in sfp 2a: /home/ubuntu/ExtData/GEOS_4x5/MERRA2/2019/08/MERRA2.20190807.A1.4x5.nc4
 @@@ in sfp 2b:  F
 @@@ in sfp 2a: /home/ubuntu/ExtData/GEOS_4x5/MERRA2/2019/08/MERRA2.20190806.A1.4x5.nc4
 @@@ in sfp 2b:  F
 @@@ in sfp 2a: /home/ubuntu/ExtData/GEOS_4x5/MERRA2/2019/08/MERRA2.20190805.A1.4x5.nc4
 @@@ in sfp 2b:  F
 @@@ in sfp 2a: /home/ubuntu/ExtData/GEOS_4x5/MERRA2/2019/08/MERRA2.20190804.A1.4x5.nc4
 @@@ in sfp 2b:  F
 @@@ in sfp 2a: /home/ubuntu/ExtData/GEOS_4x5/MERRA2/2019/08/MERRA2.20190803.A1.4x5.nc4
 @@@ in sfp 2b:  F
 @@@ in sfp 2a: /home/ubuntu/ExtData/GEOS_4x5/MERRA2/2019/08/MERRA2.20190802.A1.4x5.nc4
 @@@ in sfp 2b:  F
 @@@ in sfp 2a: /home/ubuntu/ExtData/GEOS_4x5/MERRA2/2019/08/MERRA2.20190801.A1.4x5.nc4
 @@@ in sfp 2b:  T
 @@@ in sfp 3:  T

So the routine SrcFile_Parse seems to keep wanting to go back in time if it can't find a file.

I think we need to put a shunt for the dry-run so that we don't enter that time stepping loop. That should fix it.

yantosca commented 4 years ago

This should now be fixed by commit https://github.com/geoschem/geos-chem/commit/f6586419c929dcee69790dcdc7499087ef90e076, which for now has been posted in the bugfix/dryrun branch, which is off of the master branch (12.8.2). You can pull this update to your repository. We will add this to 12.9.0. I will also make a pull request for this into the HEMCO repository (which is standalone).

yantosca commented 4 years ago

Here are the unique log files from 2019/07/01 (the first month, from a clean ExtData),

and 2019/08/01 (the second month, with files for 2019/07/01 in ExtData):

So I think this fixes it.

yantosca commented 4 years ago

This has now been merged into our 12.9.0 development branch. I will close out this issue for now.

Until 12.9.0 is released, you can still take this fix from the bugfix/dryrun branch.

pkasibhatla commented 4 years ago

Hi Bob, this fix looks like it is still not working - see attached dryrun.log file:

1) Many of the files in the list of files after TrashBurn_v2_generic.01x01.nc are in fact already downloaded on my system though the dryrun log file say not found. For example: HEMCO: REQUIRED FILE NOT FOUND /work/psk9/Data/ExtData/HEMCO/TrashEmis/v2015-03/TrashBurn_v2_generic.01x01.nc

/work/psk9/Data/ExtData/GEOS_4x5/MERRA2/2019/07% ls -l /work/psk9/Data/ExtData/HEMCO/TrashEmis/v2015-03/TrashBurn_v2_generic.01x01.nc -rw-rw-r--. 1 psk9 root 89227465 Mar 29 2018 /work/psk9/Data/ExtData/HEMCO/TrashEmis/v2015-03/TrashBurn_v2_generic.01x01.nc

2) The file lists seem to mess up when crossing the year boundary - my start date is 20190701 and end date is 20200501. So for example, the A3cld files seem to be listed correctly till Dec 31, but incorrectly after that.

dryrun.log

pkasibhatla commented 4 years ago

Just updating my previous comment - the problem re #2 in my previous comment seems to be related to 2020 met files, and not with crossing the year boundary. The path names of all the 2020 MERRA2 files, except for the I3 files, are incorrect in the dryrun output file.The I3 files are listed twice for each day, once correctly and once incorrectly. For example, here is a portion of the dryrun output:

HEMCO: REQUIRED FILE NOT FOUND /work/psk9/Data/ExtData/GEOS_4x5/MERRA2/0001/01/MERRA2.00010101.A1.4x5.nc4 HEMCO: REQUIRED FILE NOT FOUND /work/psk9/Data/ExtData/GEOS_4x5/MERRA2/0001/01/MERRA2.00010101.A3cld.4x5.nc4 HEMCO: REQUIRED FILE NOT FOUND /work/psk9/Data/ExtData/GEOS_4x5/MERRA2/0001/01/MERRA2.00010101.A3dyn.4x5.nc4 HEMCO: REQUIRED FILE NOT FOUND /work/psk9/Data/ExtData/GEOS_4x5/MERRA2/0001/01/MERRA2.00010101.A3mstC.4x5.nc4 HEMCO: REQUIRED FILE NOT FOUND /work/psk9/Data/ExtData/GEOS_4x5/MERRA2/0001/01/MERRA2.00010101.A3mstE.4x5.nc4 HEMCO: REQUIRED FILE NOT FOUND /work/psk9/Data/ExtData/GEOS_4x5/MERRA2/2020/01/MERRA2.20200101.I3.4x5.nc4 HEMCO: REQUIRED FILE NOT FOUND /work/psk9/Data/ExtData/GEOS_4x5/MERRA2/0001/01/MERRA2.00010102.I3.4x5.nc4

yantosca commented 4 years ago

Hi Prasad, thanks for looking into this again. I believe I have found the error. The files with "0001/01" in their file name in the dryrun output are caused because the year entries for the met fields in HEMCO_Config.rc end in 2019. In other words, change the time info for all of the met fields in HEMCO_Config.rc from e.g.:

* ALBEDO    $METDIR/$YYYY/$MM/$MET.$YYYY$MM$DD.A1.$RES.$NC     ALBEDO   1980-2019/1-12/1-31/*/+30minute RFY xy  1  * -  1 1
* CLDTOT    $METDIR/$YYYY/$MM/$MET.$YYYY$MM$DD.A1.$RES.$NC     CLDTOT   1980-2019/1-12/1-31/*/+30minute RFY xy  1  * -  1 1
* EFLUX     $METDIR/$YYYY/$MM/$MET.$YYYY$MM$DD.A1.$RES.$NC     EFLUX    1980-2019/1-12/1-31/*/+30minute RFY xy  1  * -  1 1
... etc ...

to

* ALBEDO    $METDIR/$YYYY/$MM/$MET.$YYYY$MM$DD.A1.$RES.$NC     ALBEDO   1980-2020/1-12/1-31/*/+30minute RFY xy  1  * -  1 1
* CLDTOT    $METDIR/$YYYY/$MM/$MET.$YYYY$MM$DD.A1.$RES.$NC     CLDTOT   1980-2020/1-12/1-31/*/+30minute RFY xy  1  * -  1 1
* EFLUX     $METDIR/$YYYY/$MM/$MET.$YYYY$MM$DD.A1.$RES.$NC     EFLUX    1980-2020/1-12/1-31/*/+30minute RFY xy  1  * -  1 1
...etc...

Once you do that, you get clean dry-run output with files that do not have 0001/01 in their paths. This output is from a TransportTracers dryrun from 20191231 to 20200102:

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
!!! LIST OF (UNIQUE) FILES REQUIRED FOR THE SIMULATION
!!! Start Date       : 20191231 000000
!!! End Date         : 20200102 000000
!!! Simulation       : TransportTracers
!!! Meteorology      : MERRA2
!!! Grid Resolution  : 4.0x5.0
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
./GEOSChem.Restart.20191231_0000z.nc4 --> /home/ubuntu/ExtData/GEOSCHEM_RESTARTS/v2018-11/initial_GEOSChem_rst.4x5_TransportTracers.nc
./HEMCO_Config.rc
./HEMCO_Diagn.rc
./HISTORY.rc
./input.geos
/home/ubuntu/ExtData/CHEM_INPUTS/Olson_Land_Map_201203/Olson_2001_Drydep_Inputs.nc
/home/ubuntu/ExtData/GEOS_4x5/MERRA2/2015/01/MERRA2.20150101.CN.4x5.nc4
/home/ubuntu/ExtData/GEOS_4x5/MERRA2/2019/12/MERRA2.20191231.A1.4x5.nc4
/home/ubuntu/ExtData/GEOS_4x5/MERRA2/2019/12/MERRA2.20191231.A3cld.4x5.nc4
/home/ubuntu/ExtData/GEOS_4x5/MERRA2/2019/12/MERRA2.20191231.A3dyn.4x5.nc4
/home/ubuntu/ExtData/GEOS_4x5/MERRA2/2019/12/MERRA2.20191231.A3mstC.4x5.nc4
/home/ubuntu/ExtData/GEOS_4x5/MERRA2/2019/12/MERRA2.20191231.A3mstE.4x5.nc4
/home/ubuntu/ExtData/GEOS_4x5/MERRA2/2019/12/MERRA2.20191231.I3.4x5.nc4
/home/ubuntu/ExtData/GEOS_4x5/MERRA2/2020/01/MERRA2.20200101.A1.4x5.nc4
/home/ubuntu/ExtData/GEOS_4x5/MERRA2/2020/01/MERRA2.20200101.A3cld.4x5.nc4
/home/ubuntu/ExtData/GEOS_4x5/MERRA2/2020/01/MERRA2.20200101.A3dyn.4x5.nc4
/home/ubuntu/ExtData/GEOS_4x5/MERRA2/2020/01/MERRA2.20200101.A3mstC.4x5.nc4
/home/ubuntu/ExtData/GEOS_4x5/MERRA2/2020/01/MERRA2.20200101.A3mstE.4x5.nc4
/home/ubuntu/ExtData/GEOS_4x5/MERRA2/2020/01/MERRA2.20200101.I3.4x5.nc4
/home/ubuntu/ExtData/GEOS_4x5/MERRA2/2020/01/MERRA2.20200102.A1.4x5.nc4
/home/ubuntu/ExtData/GEOS_4x5/MERRA2/2020/01/MERRA2.20200102.A3cld.4x5.nc4
/home/ubuntu/ExtData/GEOS_4x5/MERRA2/2020/01/MERRA2.20200102.A3dyn.4x5.nc4
/home/ubuntu/ExtData/GEOS_4x5/MERRA2/2020/01/MERRA2.20200102.A3mstC.4x5.nc4
/home/ubuntu/ExtData/GEOS_4x5/MERRA2/2020/01/MERRA2.20200102.A3mstE.4x5.nc4
/home/ubuntu/ExtData/GEOS_4x5/MERRA2/2020/01/MERRA2.20200102.I3.4x5.nc4
/home/ubuntu/ExtData/GEOS_4x5/MERRA2/2020/01/MERRA2.20200103.I3.4x5.nc4
/home/ubuntu/ExtData/HEMCO/CEDS/v2018-08/2014/CO-em-anthro_CMIP_CEDS_2014.nc
/home/ubuntu/ExtData/HEMCO/OLSON_MAP/v2019-02/Olson_2001_Land_Type_Masks.025x025.generic.nc
/home/ubuntu/ExtData/HEMCO/SF6/v2019-01/EDGAR_v42_SF6_IPCC_2.generic.01x01.nc
/home/ubuntu/ExtData/HEMCO/TIMEZONES/v2015-02/timezones_voronoi_1x1.nc
/home/ubuntu/ExtData/HEMCO/Yuan_XLAI/v2019-03/Yuan_proc_MODIS_XLAI.025x025.2016.nc
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
!!! LIST OF (UNIQUE) FILES REQUIRED FOR THE SIMULATION
!!! Start Date       : 20191231 000000
!!! End Date         : 20200102 000000
!!! Simulation       : TransportTracers
!!! Meteorology      : MERRA2
!!! Grid Resolution  : 4.0x5.0
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

We will update the HEMCO_Config.rc files in the unit tester so that 12.9.0 run directories have these.

yantosca commented 4 years ago

Also I am going to take a quick look to see why an error doesn't happen if you exceed the year range in HEMCO_Config.rc. If it is a real simulation and not a dry-run then an error is thrown.

pkasibhatla commented 4 years ago

Any update on item #1 in my question above - ie why is the dryrun listing files as not found when in fact they are there on the system? Thanks!