NOAA-EMC / global-workflow

Global Superstructure/Workflow supporting the Global Forecast System (GFS)
https://global-workflow.readthedocs.io/en/latest
GNU Lesser General Public License v3.0
74 stars 167 forks source link

C768 enkfgdaseupd task crashes on Hera #2506

Closed spanNOAA closed 2 weeks ago

spanNOAA commented 5 months ago

What is wrong?

The enkfgdaseupd task encounters failure.

I attempted three different node/core configurations, all resulting in crashes with distinct errors. However, all of these attempts stoped at the same point (ensemble statistics).

This problem is only observed when using the c768 resolution. I had no difficulties when running at c384 resolution.

What should have happened?

The tasks 'enkfgdaseupd' generate the analysis files required for the subsequent forecasts.

What machines are impacted?

Hera

Steps to reproduce

  1. Set up experiment and generate xml file. ./setup_expt.py gfs cycled --app ATM --pslot C768_6hourly_0210 --nens 80 --idate 2023021018 --edate 2023022618 --start cold --gfs_cyc 4 --resdetatmos 768 --resensatmos 384 --configdir /scratch2/BMC/wrfruc/Sijie.Pan/ufs-ar/arfs/parm/config/gfs --comroot ${COMROOT} --expdir ${EXPDIR} --icsdir /scratch2/BMC/wrfruc/Guoqing.Ge/ufs-ar/ICS/2023021018C768C384L128/output
  2. use rocoto to start the workflow.

Additional information

You can find stdout and stderr files in the following directory: 120nodes with 4 ppn: /scratch2/BMC/wrfruc/Sijie.Pan/ufs-ar/comroot/stmp/RUNDIRS/C768_6hourly_0210/eupd.572799 100nodes with 6 ppn: /scratch2/BMC/wrfruc/Sijie.Pan/ufs-ar/comroot/stmp/RUNDIRS/C768_6hourly_0210/eupd.583798 80nodes with 6 ppn: /scratch2/BMC/wrfruc/Sijie.Pan/ufs-ar/comroot/stmp/RUNDIRS/C768_6hourly_0210/eupd.841765

Do you have a proposed solution?

No response

SamuelTrahanNOAA commented 4 months ago

These directories don't exist:

120nodes with 4 ppn: /scratch2/BMC/wrfruc/Sijie.Pan/ufs-ar/comroot/stmp/RUNDIRS/C768_6hourly_0210/eupd.572799 100nodes with 6 ppn: /scratch2/BMC/wrfruc/Sijie.Pan/ufs-ar/comroot/stmp/RUNDIRS/C768_6hourly_0210/eupd.583798 80nodes with 6 ppn: /scratch2/BMC/wrfruc/Sijie.Pan/ufs-ar/comroot/stmp/RUNDIRS/C768_6hourly_0210/eupd.841765

guoqing-noaa commented 3 months ago

@spanNOAA Could you post your recent eupd rundirectories so that others can take a look?