NOAA-EMC / global-workflow

Global Superstructure/Workflow supporting the Global Forecast System (GFS)
https://global-workflow.readthedocs.io/en/latest
GNU Lesser General Public License v3.0
70 stars 161 forks source link

Update dependency check for ice netcdf file for GEFS #2721

Open aerorahul opened 4 days ago

aerorahul commented 4 days ago

What is wrong?

2561 introduced a dependency check for ice netcdf file for GEFS via the use of this script. Rocoto provides an option for checking if the file exists together with the age of the file and should be utilized instead of this script.

What should have happened?

Use Rocoto <datadep age=120> to check if the file exists and is older than 2 mins, instead of calling a script via <sh> command.

What machines are impacted?

All or N/A

Steps to reproduce

Create the xml and look at the gefs ice product dependency section.

Additional information

None

Do you have a proposed solution?

Use Rocoto's datadep tag to check for dependencies.

EricSinsky-NOAA commented 4 days ago

Issue #2674 creates some complications in the ice_prod dependencies which led me to create this script. The reason why I got rid of the <datadep age=120> dependency is because in cases where FHOUT_ICE_GFS = 24 and cycle=t12z, the lead hours in the fhr variable (in the xml file) in the ice_prod task do not match the lead hours in the ice hist filenames. This means the following file dependency would never be satisfied in these particular cases: gefs.ice.t@Hz.24hr_avg.f#fhr#.nc. That is why I removed that dependency.

For example, let's say that FHOUT_ICE_GFS=24 and cyc=12. For the ice_prod task, we have an fhr variable: <var name="fhr">024 048 072 096 120</var> If we were to keep this filename dependency (<datadep age="120"><cyclestr>&ROTDIR;/gefs.@Y@m@d/@H/mem#member#/model_data/ocean/history/gefs.ocean.t@Hz.24hr_avg.f#fhr#.nc</cyclestr></datadep>), the dependency would never be satisfied because these lead hours in the fhr variable are not written in the ice hist filenames. The ice hist filenames would have lead hours 036, 060, etc.

EricSinsky-NOAA commented 4 days ago

Here is the discussion we had about this in the PR: https://github.com/NOAA-EMC/global-workflow/pull/2561#discussion_r1626224605

EricSinsky-NOAA commented 4 days ago

@aerorahul I am creating a branch to restore the rocoto <datadep age=120> for the ice_prod task and remove check_ice_netcdf.sh. I think we have two options on how to proceed with this issue.

The first option is to submit a PR that only restores the rocoto <datadep age=120> and removes check_ice_netcdf.sh. Doing this would reintroduce issue #2674 (this would not affect the CI testing or the reforecast), but then I can submit a subsequent PR with a permanent fix for issue #2674 later on.

The second option is to submit one PR that fixes this issue and issue #2674.

Do you have a preference?

EricSinsky-NOAA commented 4 days ago

I should clarify that restoring rocoto <datadep age=120> and removing check_ice_netcdf.sh does not so much reintroduce issue #2674, but instead the ice_prod task would be impacted by issue #2674.

aerorahul commented 4 days ago

I think we can wait for the second option to be available. Thanks!

NeilBarton-NOAA commented 4 days ago

Note, ush/check_netcdf.sh needs to be removed from the dependencies of the ocean_prod tasks also. @aerorahul Do you want a new issue or combine with this one?