NOAA-EMC / global-workflow

Global Superstructure/Workflow supporting the Global Forecast System (GFS)
https://global-workflow.readthedocs.io/en/latest
GNU Lesser General Public License v3.0
74 stars 165 forks source link

Some atmos_ensstat jobs failing for GEFS C384 runs #2856

Open EricSinsky-NOAA opened 3 weeks ago

EricSinsky-NOAA commented 3 weeks ago

What is wrong?

Some atmos_ensstat jobs are failing when running the GEFS C384 with at least 2 memebers. A log file from a failed atmos_ensstat task can be found on WCOSS2:/lfs/h2/emc/ptmp/eric.sinsky/GEFS/COMROOT/customexp/gw_pr2788/logs/2021090900/atmos_ensstat_f003.log. The error seems to be due to a missing file that occurs for some mpmd jobs (there is a mpmd task for each product resolution). A sample mpmd log file from a failed mpmd task can be found here: /lfs/h2/emc/stmp/eric.sinsky/RUNDIRS/gw_pr2788/gefs.2021090900/atmos_ensstat.219722/mpmd.1.out. A sample mpmd log file from a successful mpmd task can be found here: /lfs/h2/emc/stmp/eric.sinsky/RUNDIRS/gw_pr2788/gefs.2021090900/atmos_ensstat.219722/mpmd.0.out

What should have happened?

All atmos_ensstat jobs should succeed when running C384 GEFS.

What machines are impacted?

WCOSS2

Steps to reproduce

  1. Set up a GEFS test case with at least 2 members and with an FHOUT_HF_GFS of 3.
  2. Run a test case. When the atmos_ensstat jobs run, not all these jobs will succeed.

Additional information

Some (not all) atmos_ensstat jobs failed running with and without replay ICs. This issue seems to not occur for C48 runs, which explains why this bug does not affect the GEFS CI tests. So far it has been found to only affect C384 GEFS runs. This issue has been occurring since the atmos_ensstat task has been added to the global workflow. Investigation of this issue has been ongoing, but the root cause has not been found yet.

Do you have a proposed solution?

No solution yet.

EricSinsky-NOAA commented 3 weeks ago

Update: This bug may have to do with setting FHOUT_HF_GFS to 3 and may not be due to the model resolution. atmos_ensstat jobs at forecast hours that are divisible by 6 are successful.

EricSinsky-NOAA commented 3 weeks ago

I found that this issue originates in the atmos_prod task. In parm/config/gefs/config.atmos_products, FHOUT_PGBS is equal to FHOUT_GFS by default. If FHOUT_GFS is equal to 6, then FHOUT_PGBS will also equal 6, which means that supplemental gfs pgb files at 1.0 and 0.5 deg will not be generated for f003, f009, f015, etc (when FHOUT_HF_GFS=3) . atmos_ensstat depends on pgrb files for 1.0 and 0.5 deg to be generated at f003, f009, f015, etc. otherwise atmos_ensstat will fail for f003, f009, f015, etc.