NOAA-EMC / global-workflow

Global Superstructure/Workflow supporting the Global Forecast System (GFS)
https://global-workflow.readthedocs.io/en/latest
GNU Lesser General Public License v3.0
70 stars 162 forks source link

[NCO Bug] Remove misleading "No such file or directory" syntax errors from output files #1252

Open KateFriedman-NOAA opened 1 year ago

KateFriedman-NOAA commented 1 year ago

Bugzilla #1369

Details from bugzilla:

From NCO SPA:

In the output logs for gfs/gdas/enkfgdas there are hundres of instances of "No such file or directory" messages 
printed out across many different tasks, which makes it more difficult to troubleshoot production failures.
Please work towards removing all related syntax errors from the output logs

A list of tasks from the 12z cycle found with these messages:
enkfgdas_diag_12
enkfgdas_fcst_??_12
gdas_atmos_analysis_diag_12
gdas_atmos_tropcy_qc_reloc_12
gdas_atmos_verfozn_12
gdas_atmos_verfrad_12
gdas_forecast_12
gfs_atmos_gempak_meta_12
gfs_atmos_tropcy_qc_reloc_12
gfs_atmos_wafs_grib2_0p25_12
gfs_atmos_wafs_grib2_12
gfs_mos_ext_grd_prdgen_12
gfs_mos_ext_grd_prep_12
gfs_mos_ext_stn_fcst_12
gfs_mos_ext_stn_prdgen_12
gfs_mos_ext_stn_prep_12
gfs_mos_grd_fcst_12
gfs_mos_grd_prdgen_12
gfs_mos_stn_fcst_12
gfs_mos_stn_prdgen_12
gfs_mos_wx_ext_prdgen_12
gfs_mos_wx_prdgen_12
gfs_wave_prdgen_gridded_12

Comment from @RussTreadon-NOAA:

A check of the gdas_atmos_analysis_diag and enkfgdas_diag "No such file" messages shows that the messages 
originate from the same script, exglobal_diag.sh.  The following loop is responsible for generating the 
"No such file" messages

   # Restrict diagnostic files containing rstprod data
   rlist="conv_gps conv_ps conv_pw conv_q conv_sst conv_t conv_uv saphir"
   for rtype in $rlist; do
       ${CHGRP_CMD} *${rtype}*
   done

A check on file existence can be added to ensure ${CHGRP_CMD} is only executed when ${rtype} exists.

For GFS v16.3 exgdas_diag.sh resides in NOAA-EMC/GSI branch release/gfsda.v16.3.0.  Note that NOAA-EMC/GSI 
develop no longer maintains DA jobs or scripts.  GFS DA jobs and scripts are now maintained in the 
global-workflow repository.  Therefore, add Rahul Mahajan to the cc list for this bugzilla.

Comment from Steven Earle:

gfs_mos tasks are excluded from this... but please do a thorough review of all gfs/gdas/enkf tasks for 
version 17... we need to work toward remove all false/misleading errors from our output.
HenryRWinterbottom commented 7 months ago

@RussTreadon-NOAA I created a branch to address this issue. It can be found here. Would you please be able to test this branch and confirm whether the above issue is resolved?

If you do, thank you. And if so, can you please pass me the log files if the issue remains such that I can dig in further.

HenryRWinterbottom commented 1 month ago

@KateFriedman-NOAA I am seeing the following in the logs when addressing this issue, coming from ush/jjob_header.sh:

++ jjob_header.sh[81]: setpdy.sh
sed: can't read /scratch1/NCEPDEV/da/Henry.Winterbottom/work/global-workflow/COMROOT/date/t00z: No such file or directory
 completed cleanly
 completed cleanly
Source PDY script to export PDYm7, ..., PDY, ..., PDYp7 variables.

Do these fall into the category of problematic messages?

WalterKolczynski-NOAA commented 1 month ago

@KateFriedman-NOAA I am seeing the following in the logs when addressing this issue, coming from ush/jjob_header.sh:

++ jjob_header.sh[81]: setpdy.sh
sed: can't read /scratch1/NCEPDEV/da/Henry.Winterbottom/work/global-workflow/COMROOT/date/t00z: No such file or directory
 completed cleanly
 completed cleanly
Source PDY script to export PDYm7, ..., PDY, ..., PDYp7 variables.

Do these fall into the category of problematic messages?

Yes, BUT I think this is a bug with setpdy.sh when used on RDHPCS or outside of ops, not anything wrong in workflow. Skip it for now.

KateFriedman-NOAA commented 1 month ago

a bug with setpdy.sh when used on RDHPCS or outside of ops, not anything wrong in workflow

I second @WalterKolczynski-NOAA's reply.

HenryRWinterbottom commented 1 month ago

@KateFriedman-NOAA @WalterKolczynski-NOAA

There are also some No such file or directory messages being raised by /scratch1/NCEPDEV/global/glopara/git/obsproc/v1.1.2/scripts/exglobal_makeprepbufr.sh. Should these be addressed? If so, not in this PR but one for obsproc?

KateFriedman-NOAA commented 1 month ago

Yeah, that's most likely an obsproc thing to resolve on their end. Can you share a sample log so we can see the particular messages though? Thanks!

HenryRWinterbottom commented 1 month ago

Please see /scratch1/NCEPDEV/da/Henry.Winterbottom/work/global-workflow/COMROOT/x001_gfsv17_issue_1252/logs/2021122100/gdasprep.log.

The stmp path is gone, but grep for No such file or directory and you should see what I referenced above.

HenryRWinterbottom commented 1 month ago

There are also similar messages coming from the TC tracker and genesis applications. See respectively the following:

/scratch1/NCEPDEV/da/Henry.Winterbottom/work/global-workflow/COMROOT/x001_gfsv17_issue_1252/logs/2021122100/gfstracker.log

/scratch1/NCEPDEV/da/Henry.Winterbottom/work/global-workflow/COMROOT/x001_gfsv17_issue_1252/logs/2021122100/gfsgenesis.log

For the CI C96_atm3DVar, those are the remaining No such file or directory instances.

WalterKolczynski-NOAA commented 1 month ago

I think the pdy message is in every job we run.

HenryRWinterbottom commented 1 month ago

I think the pdy message is in every job we run.

Yes, so far it is showing up in all of the log files for the experiment referenced above.

@WalterKolczynski-NOAA If this is worth addressing, please open a new issue and I will self assign and fix it, if possible.

KateFriedman-NOAA commented 1 month ago

Please see /scratch1/NCEPDEV/da/Henry.Winterbottom/work/global-workflow/COMROOT/x001_gfsv17_issue_1252/logs/2021122100/gdasprep.log.

Those "No such"s are almost all from being unable to remove (because the file didn't already exist, which is normal) or are a file we don't need/use. The obsproc group needs to improve the error handling to remove these messages in their scripts. For this issue, I would ignore those "No such"s in the prep job.

HenryRWinterbottom commented 1 month ago

Please see /scratch1/NCEPDEV/da/Henry.Winterbottom/work/global-workflow/COMROOT/x001_gfsv17_issue_1252/logs/2021122100/gdasprep.log.

Those "No such"s are almost all from being unable to remove (because the file didn't already exist, which is normal) or are a file we don't need/use. The obsproc group needs to improve the error handling to remove these messages in their scripts. For this issue, I would ignore those "No such"s in the prep job.

Thank you @KateFriedman-NOAA for checking my work. I just wanted to make sure this wasn't our responsibility or a fix required of this bugzilla/PR.

KateFriedman-NOAA commented 1 month ago

/scratch1/NCEPDEV/da/Henry.Winterbottom/work/global-workflow/COMROOT/x001_gfsv17_issue_1252/logs/2021122100/gfstracker.log

/scratch1/NCEPDEV/da/Henry.Winterbottom/work/global-workflow/COMROOT/x001_gfsv17_issue_1252/logs/2021122100/gfsgenesis.log

Pretty much the same thing as the prep job...they are "cannot remove" "No such"s. If they are in scripts that global-workflow owns then we should improve the error handling to not complain if we can't remove a file that doesn't exist (i.e. check if the file exists first and then remove if it does, otherwise nothing). If these are from scripts that the TC_tracker repo owns then we should work with them to clean up.

DavidHuber-NOAA commented 2 weeks ago

Opened https://github.com/NOAA-EMC/TC_tracker/issues/8 to address issues in the TC Tracker.

DavidHuber-NOAA commented 2 weeks ago

Opened https://github.com/NOAA-EMC/obsproc/issues/85 to address issues in obsproc. Note that this issue is not strictly an NCO issue since these scripts are not run as part of the operational workflow.

KateFriedman-NOAA commented 2 weeks ago

Thanks @DavidHuber-NOAA !