NOAA-EMC / global-workflow

Global Superstructure/Workflow supporting the Global Forecast System (GFS)
https://global-workflow.readthedocs.io/en/latest
GNU Lesser General Public License v3.0
70 stars 162 forks source link

Update gempak job to run one fcst hour per task #2671

Open GwenChen-NOAA opened 3 weeks ago

GwenChen-NOAA commented 3 weeks ago

Description

This PR updates gempak jobs (gfs, gdas, and goes) from processing all forecast hours at once to one forecast hour at a time. This will reduce the job runtime to less than 5 min, so restart capability is not needed.

Resolves #1250 Ref #2666 #2667

Type of change

Change characteristics

How has this been tested?

Tested using my driver scripts on WCOSS2 and successfully generated gempak files .

Checklist

GwenChen-NOAA commented 3 weeks ago

@WalterKolczynski-NOAA, please review this PR. This is a work in progress. I will work on the scripts for GDAS job. Workflow will need some updates as well.

GwenChen-NOAA commented 2 weeks ago

@WalterKolczynski-NOAA, I added the GDAS and GOES jobs into the PR. All 3 jobs test ran successfully using my driver scripts. The scripts are ready for review now. I am not familiar with the workflow part. Can you make those changes in PR #2667? Thanks!

WalterKolczynski-NOAA commented 2 weeks ago

Marked this as draft until I can get Gwen the needed rocoto changes.

GwenChen-NOAA commented 1 week ago

@WalterKolczynski-NOAA, I test ran the following experiment with DO_GOES, DO_NPOESS, DO_GEMPAK and DO_AWIPS set to "YES" in config.base after your rocoto changes.

./setup_expt.py gfs forecast-only --app ATM --resdetatmos 384 --start cold --comroot $COMROOT --expdir $EXPDIR --idate 2016070100 --edate 2016070100 --pslot test

The gempak job ran successfully for each forecast hour. Please see the task list below. I noticed that the goesupp job failed to generate goesmaster files. As a result, the gempakgrb2spec job did not kick off. I also noticed that the npoess job was missing in the task list. The npoess job should run after the goesupp job and before the gempakgrb2spec job. Is this something that can be fixed in this PR?

In parm/post/upp.yaml, the filename for GOES master files is reverted back to "special". https://github.com/NOAA-EMC/global-workflow/blob/f43a86276aaef91efa28faadc71a3cf50e749efe/parm/post/upp.yaml#L89

When I test ran using my driver script, the gempakgrb2spec job was looking for "goesmaster" files. So, I think there is a conflict somewhere causing the goesupp job to fail.

   CYCLE                    TASK                       JOBID               STATE         EXIT STATUS     TRIES      DURATION

================================================================================================================================ 201607010000 gfsstage_ic 153458493 SUCCEEDED 0 1 72.0 201607010000 gfsfcst 153458796 SUCCEEDED 0 1 4740.0 201607010000 gfsatmos_prod_f000 153459276 SUCCEEDED 0 1 98.0 201607010000 gfsatmos_prod_f003 153460062 SUCCEEDED 0 1 86.0 201607010000 gfsatmos_prod_f006 153460805 SUCCEEDED 0 1 83.0 201607010000 gfsatmos_prod_f009 153461554 SUCCEEDED 0 1 88.0 201607010000 gfsatmos_prod_f012 153460806 SUCCEEDED 0 1 86.0 201607010000 gfsatmos_prod_f015 153461556 SUCCEEDED 0 1 87.0 201607010000 gfsatmos_prod_f018 153461557 SUCCEEDED 0 1 89.0 201607010000 gfsatmos_prod_f021 153462518 SUCCEEDED 0 1 84.0 201607010000 gfsatmos_prod_f024 153461555 SUCCEEDED 0 1 83.0 201607010000 gfsatmos_prod_f027 153462519 SUCCEEDED 0 1 85.0 201607010000 gfsatmos_prod_f030 153463327 SUCCEEDED 0 1 87.0 201607010000 gfsatmos_prod_f033 153463328 SUCCEEDED 0 1 87.0 201607010000 gfsatmos_prod_f036 153463329 SUCCEEDED 0 1 87.0 201607010000 gfsatmos_prod_f039 153463330 SUCCEEDED 0 1 95.0 201607010000 gfsatmos_prod_f042 153464164 SUCCEEDED 0 1 90.0 201607010000 gfsatmos_prod_f045 153464915 SUCCEEDED 0 1 84.0 201607010000 gfsatmos_prod_f048 153464165 SUCCEEDED 0 1 90.0 201607010000 gfsatmos_prod_f051 153464916 SUCCEEDED 0 1 86.0 201607010000 gfsatmos_prod_f054 153464932 SUCCEEDED 0 1 85.0 201607010000 gfsatmos_prod_f057 153465685 SUCCEEDED 0 1 104.0 201607010000 gfsatmos_prod_f060 153464933 SUCCEEDED 0 1 86.0 201607010000 gfsatmos_prod_f063 153465686 SUCCEEDED 0 1 105.0 201607010000 gfsatmos_prod_f066 153466283 SUCCEEDED 0 1 100.0 201607010000 gfsatmos_prod_f069 153466285 SUCCEEDED 0 1 100.0 201607010000 gfsatmos_prod_f072 153466284 SUCCEEDED 0 1 100.0 201607010000 gfsatmos_prod_f075 153466286 SUCCEEDED 0 1 100.0 201607010000 gfsatmos_prod_f078 153466712 SUCCEEDED 0 1 95.0 201607010000 gfsatmos_prod_f081 153467134 SUCCEEDED 0 1 83.0 201607010000 gfsatmos_prod_f084 153466713 SUCCEEDED 0 1 83.0 201607010000 gfsatmos_prod_f087 153467135 SUCCEEDED 0 1 93.0 201607010000 gfsatmos_prod_f090 153467652 SUCCEEDED 0 1 106.0 201607010000 gfsatmos_prod_f093 153467653 SUCCEEDED 0 1 94.0 201607010000 gfsatmos_prod_f096 153467654 SUCCEEDED 0 1 105.0 201607010000 gfsatmos_prod_f099 153467655 SUCCEEDED 0 1 90.0 201607010000 gfsatmos_prod_f102 153468046 SUCCEEDED 0 1 86.0 201607010000 gfsatmos_prod_f105 153468822 SUCCEEDED 0 1 99.0 201607010000 gfsatmos_prod_f108 153468047 SUCCEEDED 0 1 85.0 201607010000 gfsatmos_prod_f111 153468819 SUCCEEDED 0 1 88.0 201607010000 gfsatmos_prod_f114 153468818 SUCCEEDED 0 1 86.0 201607010000 gfsatmos_prod_f117 153469536 SUCCEEDED 0 1 88.0 201607010000 gfsatmos_prod_f120 153468823 SUCCEEDED 0 1 89.0 201607010000 gfsgoesupp_f000 153460807 DEAD 1 2 23.0 201607010000 gfsgoesupp_f003 153461558 DEAD 1 2 19.0 201607010000 gfsgoesupp_f006 153462520 DEAD 1 2 19.0 201607010000 gfsgoesupp_f009 153462522 DEAD 1 2 19.0 201607010000 gfsgoesupp_f012 153461561 DEAD 1 2 20.0 201607010000 gfsgoesupp_f015 153462523 DEAD 1 2 19.0 201607010000 gfsgoesupp_f018 153463332 DEAD 1 2 19.0 201607010000 gfsgoesupp_f021 153464172 DEAD 1 2 18.0 201607010000 gfsgoesupp_f024 153463334 DEAD 1 2 20.0 201607010000 gfsgoesupp_f027 153464173 DEAD 1 2 18.0 201607010000 gfsgoesupp_f030 153464176 DEAD 1 2 18.0 201607010000 gfsgoesupp_f033 153464937 DEAD 1 2 18.0 201607010000 gfsgoesupp_f036 153464179 DEAD 1 2 18.0 201607010000 gfsgoesupp_f039 153464939 DEAD 1 2 18.0 201607010000 gfsgoesupp_f042 153465687 DEAD 1 2 20.0 201607010000 gfsgoesupp_f045 153465688 DEAD 1 2 23.0 201607010000 gfsgoesupp_f048 153465690 DEAD 1 2 24.0 201607010000 gfsgoesupp_f051 153465689 DEAD 1 2 20.0 201607010000 gfsgoesupp_f054 153466287 DEAD 1 2 17.0 201607010000 gfsgoesupp_f057 153466714 DEAD 1 2 21.0 201607010000 gfsgoesupp_f060 153466290 DEAD 1 2 19.0 201607010000 gfsgoesupp_f063 153466715 DEAD 1 2 21.0 201607010000 gfsgoesupp_f066 153466716 DEAD 1 2 21.0 201607010000 gfsgoesupp_f069 153467136 DEAD 1 2 18.0 201607010000 gfsgoesupp_f072 153466718 DEAD 1 2 21.0 201607010000 gfsgoesupp_f075 153467137 DEAD 1 2 17.0 201607010000 gfsgoesupp_f078 153467656 DEAD 1 2 26.0 201607010000 gfsgoesupp_f081 153467658 DEAD 1 2 27.0 201607010000 gfsgoesupp_f084 153467659 DEAD 1 2 27.0 201607010000 gfsgoesupp_f087 153467661 DEAD 1 2 27.0 201607010000 gfsgoesupp_f090 153468049 DEAD 1 2 19.0 201607010000 gfsgoesupp_f093 153468824 DEAD 1 2 23.0 201607010000 gfsgoesupp_f096 153468052 DEAD 1 2 19.0 201607010000 gfsgoesupp_f099 153468825 DEAD 1 2 21.0 201607010000 gfsgoesupp_f102 153469537 DEAD 1 2 18.0 201607010000 gfsgoesupp_f105 153469538 DEAD 1 2 18.0 201607010000 gfsgoesupp_f108 153469539 DEAD 1 2 18.0 201607010000 gfsgoesupp_f111 153469541 DEAD 1 2 18.0 201607010000 gfsgoesupp_f114 153470275 DEAD 1 2 19.0 201607010000 gfsgoesupp_f117 153470913 DEAD 1 2 18.0 201607010000 gfsgoesupp_f120 153470277 DEAD 1 2 18.0 201607010000 gfsgempak_f000 153460064 SUCCEEDED 0 1 15.0 201607010000 gfsgempak_f003 153460810 SUCCEEDED 0 1 20.0 201607010000 gfsgempak_f006 153461563 SUCCEEDED 0 1 14.0 201607010000 gfsgempak_f009 153462526 SUCCEEDED 0 1 16.0 201607010000 gfsgempak_f012 153461564 SUCCEEDED 0 1 15.0 201607010000 gfsgempak_f015 153462530 SUCCEEDED 0 1 16.0 201607010000 gfsgempak_f018 153462531 SUCCEEDED 0 1 16.0 201607010000 gfsgempak_f021 153463345 SUCCEEDED 0 1 15.0 201607010000 gfsgempak_f024 153462533 SUCCEEDED 0 1 16.0 201607010000 gfsgempak_f027 153463347 SUCCEEDED 0 1 16.0 201607010000 gfsgempak_f030 153464181 SUCCEEDED 0 1 15.0 201607010000 gfsgempak_f033 153464182 SUCCEEDED 0 1 16.0 201607010000 gfsgempak_f036 153464183 SUCCEEDED 0 1 15.0 201607010000 gfsgempak_f039 153464184 SUCCEEDED 0 1 15.0 201607010000 gfsgempak_f042 153464948 SUCCEEDED 0 1 15.0 201607010000 gfsgempak_f045 153465694 SUCCEEDED 0 1 20.0 201607010000 gfsgempak_f048 153464949 SUCCEEDED 0 1 15.0 201607010000 gfsgempak_f051 153465695 SUCCEEDED 0 1 22.0 201607010000 gfsgempak_f054 153465696 SUCCEEDED 0 1 22.0 201607010000 gfsgempak_f057 153466295 SUCCEEDED 0 1 14.0 201607010000 gfsgempak_f060 153465697 SUCCEEDED 0 1 22.0 201607010000 gfsgempak_f063 153466296 SUCCEEDED 0 1 14.0 201607010000 gfsgempak_f066 153466722 SUCCEEDED 0 1 17.0 201607010000 gfsgempak_f069 153466724 SUCCEEDED 0 1 17.0 201607010000 gfsgempak_f072 153466725 SUCCEEDED 0 1 17.0 201607010000 gfsgempak_f075 153466726 SUCCEEDED 0 1 18.0 201607010000 gfsgempak_f078 153467142 SUCCEEDED 0 1 15.0 201607010000 gfsgempak_f081 153467665 SUCCEEDED 0 1 25.0 201607010000 gfsgempak_f084 153467143 SUCCEEDED 0 1 15.0 201607010000 gfsgempak_f087 153467666 SUCCEEDED 0 1 24.0 201607010000 gfsgempak_f090 153468055 SUCCEEDED 0 1 15.0 201607010000 gfsgempak_f093 153468057 SUCCEEDED 0 1 15.0 201607010000 gfsgempak_f096 153468058 SUCCEEDED 0 1 15.0 201607010000 gfsgempak_f099 153468059 SUCCEEDED 0 1 15.0 201607010000 gfsgempak_f102 153468827 SUCCEEDED 0 1 19.0 201607010000 gfsgempak_f105 153469544 SUCCEEDED 0 1 14.0 201607010000 gfsgempak_f108 153468829 SUCCEEDED 0 1 17.0 201607010000 gfsgempak_f111 153469545 SUCCEEDED 0 1 15.0 201607010000 gfsgempak_f114 153469546 SUCCEEDED 0 1 15.0 201607010000 gfsgempak_f117 153470278 SUCCEEDED 0 1 18.0 201607010000 gfsgempak_f120 153469548 SUCCEEDED 0 1 15.0 201607010000 gfsgempakmeta 153470914 RUNNING - 0 0.0 201607010000 gfsgempakncdcupapgif 153470915 RUNNING - 0 0.0 201607010000 gfsgempakgrb2spec_f000 - - - - - 201607010000 gfsgempakgrb2spec_f003 - - - - - 201607010000 gfsgempakgrb2spec_f006 - - - - - 201607010000 gfsgempakgrb2spec_f009 - - - - - 201607010000 gfsgempakgrb2spec_f012 - - - - - 201607010000 gfsgempakgrb2spec_f015 - - - - - 201607010000 gfsgempakgrb2spec_f018 - - - - - 201607010000 gfsgempakgrb2spec_f021 - - - - - 201607010000 gfsgempakgrb2spec_f024 - - - - - 201607010000 gfsgempakgrb2spec_f027 - - - - - 201607010000 gfsgempakgrb2spec_f030 - - - - - 201607010000 gfsgempakgrb2spec_f033 - - - - - 201607010000 gfsgempakgrb2spec_f036 - - - - - 201607010000 gfsgempakgrb2spec_f039 - - - - - 201607010000 gfsgempakgrb2spec_f042 - - - - - 201607010000 gfsgempakgrb2spec_f045 - - - - - 201607010000 gfsgempakgrb2spec_f048 - - - - - 201607010000 gfsgempakgrb2spec_f051 - - - - - 201607010000 gfsgempakgrb2spec_f054 - - - - - 201607010000 gfsgempakgrb2spec_f057 - - - - - 201607010000 gfsgempakgrb2spec_f060 - - - - - 201607010000 gfsgempakgrb2spec_f063 - - - - - 201607010000 gfsgempakgrb2spec_f066 - - - - - 201607010000 gfsgempakgrb2spec_f069 - - - - - 201607010000 gfsgempakgrb2spec_f072 - - - - - 201607010000 gfsgempakgrb2spec_f075 - - - - - 201607010000 gfsgempakgrb2spec_f078 - - - - - 201607010000 gfsgempakgrb2spec_f081 - - - - - 201607010000 gfsgempakgrb2spec_f084 - - - - - 201607010000 gfsgempakgrb2spec_f087 - - - - - 201607010000 gfsgempakgrb2spec_f090 - - - - - 201607010000 gfsgempakgrb2spec_f093 - - - - - 201607010000 gfsgempakgrb2spec_f096 - - - - - 201607010000 gfsgempakgrb2spec_f099 - - - - - 201607010000 gfsgempakgrb2spec_f102 - - - - - 201607010000 gfsgempakgrb2spec_f105 - - - - - 201607010000 gfsgempakgrb2spec_f108 - - - - - 201607010000 gfsgempakgrb2spec_f111 - - - - - 201607010000 gfsgempakgrb2spec_f114 - - - - - 201607010000 gfsgempakgrb2spec_f117 - - - - - 201607010000 gfsgempakgrb2spec_f120 - - - - - 201607010000 gfsawips_20km_1p0deg_f000-f000 153470279 SUCCEEDED 0 1 96.0 201607010000 gfsawips_20km_1p0deg_f003-f003 153470280 SUCCEEDED 0 1 100.0 201607010000 gfsawips_20km_1p0deg_f006-f006 153470281 SUCCEEDED 0 1 101.0 201607010000 gfsawips_20km_1p0deg_f009-f009 153470282 SUCCEEDED 0 1 102.0 201607010000 gfsawips_20km_1p0deg_f012-f012 153470283 SUCCEEDED 0 1 102.0 201607010000 gfsawips_20km_1p0deg_f015-f015 153470284 SUCCEEDED 0 1 103.0 201607010000 gfsawips_20km_1p0deg_f018-f018 153470285 SUCCEEDED 0 1 102.0 201607010000 gfsawips_20km_1p0deg_f021-f021 153470286 SUCCEEDED 0 1 102.0 201607010000 gfsawips_20km_1p0deg_f024-f024 153470287 SUCCEEDED 0 1 102.0 201607010000 gfsawips_20km_1p0deg_f027-f027 153470288 SUCCEEDED 0 1 101.0 201607010000 gfsawips_20km_1p0deg_f030-f030 153470289 SUCCEEDED 0 1 102.0 201607010000 gfsawips_20km_1p0deg_f033-f033 153470290 SUCCEEDED 0 1 103.0 201607010000 gfsawips_20km_1p0deg_f036-f036 153470291 SUCCEEDED 0 1 102.0 201607010000 gfsawips_20km_1p0deg_f039-f039 153470292 SUCCEEDED 0 1 103.0 201607010000 gfsawips_20km_1p0deg_f042-f042 153470293 SUCCEEDED 0 1 103.0 201607010000 gfsawips_20km_1p0deg_f045-f045 153470294 SUCCEEDED 0 1 103.0 201607010000 gfsawips_20km_1p0deg_f048-f048 153470295 SUCCEEDED 0 1 102.0 201607010000 gfsawips_20km_1p0deg_f051-f051 153470296 SUCCEEDED 0 1 102.0 201607010000 gfsawips_20km_1p0deg_f054-f054 153470297 SUCCEEDED 0 1 102.0 201607010000 gfsawips_20km_1p0deg_f057-f057 153470298 SUCCEEDED 0 1 102.0 201607010000 gfsawips_20km_1p0deg_f060-f060 153470299 SUCCEEDED 0 1 102.0 201607010000 gfsawips_20km_1p0deg_f063-f063 153470916 SUCCEEDED 0 1 106.0 201607010000 gfsawips_20km_1p0deg_f066-f066 153470918 SUCCEEDED 0 1 109.0 201607010000 gfsawips_20km_1p0deg_f069-f069 153470919 SUCCEEDED 0 1 106.0 201607010000 gfsawips_20km_1p0deg_f072-f072 153470920 SUCCEEDED 0 1 107.0 201607010000 gfsawips_20km_1p0deg_f075-f075 153470922 SUCCEEDED 0 1 104.0 201607010000 gfsawips_20km_1p0deg_f078-f078 153470923 SUCCEEDED 0 1 104.0 201607010000 gfsawips_20km_1p0deg_f081-f081 153470924 SUCCEEDED 0 1 103.0 201607010000 gfsawips_20km_1p0deg_f084-f084 153470929 SUCCEEDED 0 1 105.0 201607010000 gfsawips_20km_1p0deg_f087-f087 153470930 SUCCEEDED 0 1 15.0 201607010000 gfsawips_20km_1p0deg_f090-f090 153470931 SUCCEEDED 0 1 106.0 201607010000 gfsawips_20km_1p0deg_f093-f093 153470932 SUCCEEDED 0 1 15.0 201607010000 gfsawips_20km_1p0deg_f096-f096 153470935 SUCCEEDED 0 1 105.0 201607010000 gfsawips_20km_1p0deg_f099-f099 153470937 SUCCEEDED 0 1 15.0 201607010000 gfsawips_20km_1p0deg_f102-f102 153470940 SUCCEEDED 0 1 105.0 201607010000 gfsawips_20km_1p0deg_f105-f105 153470942 SUCCEEDED 0 1 15.0 201607010000 gfsawips_20km_1p0deg_f108-f108 153470944 SUCCEEDED 0 1 104.0 201607010000 gfsawips_20km_1p0deg_f111-f111 153470946 SUCCEEDED 0 1 15.0 201607010000 gfsawips_20km_1p0deg_f114-f114 153470948 SUCCEEDED 0 1 105.0 201607010000 gfsawips_20km_1p0deg_f117-f117 153470950 SUCCEEDED 0 1 15.0 201607010000 gfsawips_20km_1p0deg_f120-f120 153470952 SUCCEEDED 0 1 104.0 201607010000 gfsfbwind 153462534 SUCCEEDED 0 1 34.0

GwenChen-NOAA commented 1 week ago

@WalterKolczynski-NOAA, I found more issues. Although the gempak job is marked "SUCCEEDED" in the task list, no gempak data files were produced under products/atmos/gempak, causing the gfsgempakmeta and gfsgempakncdcupapgif jobs to be "RUNNING" for more than 2 hours now.

WalterKolczynski-NOAA commented 1 week ago

@GwenChen-NOAA When I run the C96_atm3DVar_extended.yaml test, I get a gfsnpoess_pgrb2_0p5deg job, so I'm not sure why you don't.

The name for the GOES master files was reverted by Wen in #2499, but it looks like she missed updating the downstream jobs. You can go ahead and do that as part of this PR.

GwenChen-NOAA commented 1 week ago

@GwenChen-NOAA When I run the C96_atm3DVar_extended.yaml test, I get a gfsnpoess_pgrb2_0p5deg job, so I'm not sure why you don't.

@WalterKolczynski-NOAA, can you provide me the experiment setup for setup_expt.py to run the C96_atm3DVar_extended.yaml test? I will test run it to see if the gfsnpoess_pgrb2_0p5deg job shows up.

The name for the GOES master files was reverted by Wen in #2499, but it looks like she missed updating the downstream jobs. You can go ahead and do that as part of this PR.

But I didn't get any GOES master files under model_data/atmos/master, neither special or goesmaster. The error message below is in the log file:

Should I open a bug issue for it?

WalterKolczynski-NOAA commented 1 week ago

@GwenChen-NOAA When I run the C96_atm3DVar_extended.yaml test, I get a gfsnpoess_pgrb2_0p5deg job, so I'm not sure why you don't.

@WalterKolczynski-NOAA, can you provide me the experiment setup for setup_expt.py to run the C96_atm3DVar_extended.yaml test? I will test run it to see if the gfsnpoess_pgrb2_0p5deg job shows up.

The name for the GOES master files was reverted by Wen in #2499, but it looks like she missed updating the downstream jobs. You can go ahead and do that as part of this PR.

But I didn't get any GOES master files under model_data/atmos/master, neither special or goesmaster. The error message below is in the log file:

  • JGLOBAL_ATMOS_UPP[22]: /lfs/h2/emc/vpppg/noscrub/Lichuan.Chen/gwDev/global-workflow/scripts/exglobal_atmos_upp.py Traceback (most recent call last): File "/lfs/h2/emc/vpppg/noscrub/Lichuan.Chen/gwDev/global-workflow/scripts/exglobal_atmos_upp.py", line 6, in from pygfs.task.upp import UPP File "/lfs/h2/emc/vpppg/noscrub/Lichuan.Chen/gwDev/global-workflow/ush/python/pygfs/init.py", line 4, in from .task.analysis import Analysis File "/lfs/h2/emc/vpppg/noscrub/Lichuan.Chen/gwDev/global-workflow/ush/python/pygfs/task/analysis.py", line 11, in from jcb import render ModuleNotFoundError: No module named 'jcb'
  • JGLOBAL_ATMOS_UPP[1]: postamble JGLOBAL_ATMOS_UPP 1719240617 1
  • preamble.sh[70]: set +x End JGLOBAL_ATMOS_UPP at 14:50:20 with error code 1 (time elapsed: 00:00:03)

Should I open a bug issue for it?

jcb is part of the UFSDA. I'm not sure why UPP requires that, but for the time being recompile with -gu to build the GDAS app.

@aerorahul Why is the UPP job using the jedi config builder?

aerorahul commented 1 week ago

The error message below is in the log file:

  • JGLOBAL_ATMOS_UPP[22]: /lfs/h2/emc/vpppg/noscrub/Lichuan.Chen/gwDev/global-workflow/scripts/exglobal_atmos_upp.py Traceback (most recent call last): File "/lfs/h2/emc/vpppg/noscrub/Lichuan.Chen/gwDev/global-workflow/scripts/exglobal_atmos_upp.py", line 6, in from pygfs.task.upp import UPP File "/lfs/h2/emc/vpppg/noscrub/Lichuan.Chen/gwDev/global-workflow/ush/python/pygfs/ init .py", line 4, in from .task.analysis import Analysis File "/lfs/h2/emc/vpppg/noscrub/Lichuan.Chen/gwDev/global-workflow/ush/python/pygfs/task/analysis.py", line 11, in from jcb import render ModuleNotFoundError: No module named 'jcb'
  • JGLOBAL_ATMOS_UPP[1]: postamble JGLOBAL_ATMOS_UPP 1719240617 1
  • preamble.sh[70]: set +x End JGLOBAL_ATMOS_UPP at 14:50:20 with error code 1 (time elapsed: 00:00:03)

Should I open a bug issue for it?

jcb is part of the UFSDA. I'm not sure why UPP requires that, but for the time being recompile with -gu to build the GDAS app.

@aerorahul Why is the UPP job using the jedi config builder?

It is not. The message says pygfs/__ init __.py is unable to import jcb from analysis.py.
jcb is a submodule of gdasapp and gdasapp is now required to be cloned as part of atmosphere-only experimentation. @CoryMartin-NOAA We now will require gdasapp to be cloned for forecast-only experiments since jcb is now a submodule of it. GDASApp is no longer optional.

WalterKolczynski-NOAA commented 1 week ago

jcb is part of the UFSDA. I'm not sure why UPP requires that, but for the time being recompile with -gu to build the GDAS app. @aerorahul Why is the UPP job using the jedi config builder?

It is not. The message says pygfs/__ init __.py is unable to import jcb from analysis.py. jcb is a submodule of gdasapp and gdasapp is now required to be cloned as part of atmosphere-only experimentation. @CoryMartin-NOAA We now will require gdasapp to be cloned for forecast-only experiments since jcb is now a submodule of it. GDASApp is no longer optional.

It's always cloned (unless you go to extraordinary lengths to not check out that specific submodule), but not necessarily built/linked. Is that what you mean?

aerorahul commented 1 week ago

jcb is part of the UFSDA. I'm not sure why UPP requires that, but for the time being recompile with -gu to build the GDAS app. @aerorahul Why is the UPP job using the jedi config builder?

It is not. The message says pygfs/__ init __.py is unable to import jcb from analysis.py. jcb is a submodule of gdasapp and gdasapp is now required to be cloned as part of atmosphere-only experimentation. @CoryMartin-NOAA We now will require gdasapp to be cloned for forecast-only experiments since jcb is now a submodule of it. GDASApp is no longer optional.

It's always cloned (unless you go to extraordinary lengths to not check out that specific submodule), but not necessarily built/linked. Is that what you mean?

Yes. It is cloned, but not linked. I also don't checkout gdasapp if I don't have to do DA development. Keeps my clone light and I don't have to keep syncing/rebuilding GDASapp if I don't need to.

GwenChen-NOAA commented 1 week ago

jcb is part of the UFSDA. I'm not sure why UPP requires that, but for the time being recompile with -gu to build the GDAS app. @aerorahul Why is the UPP job using the jedi config builder?

It is not. The message says pygfs/__ init __.py is unable to import jcb from analysis.py. jcb is a submodule of gdasapp and gdasapp is now required to be cloned as part of atmosphere-only experimentation. @CoryMartin-NOAA We now will require gdasapp to be cloned for forecast-only experiments since jcb is now a submodule of it. GDASApp is no longer optional.

@aerorahul and @WalterKolczynski-NOAA, global-workflow doc said the -gu option currently only available on Hera, Orion, and Hercules. GFS downstream package can only run on WCOSS2.

WalterKolczynski-NOAA commented 1 week ago

jcb is part of the UFSDA. I'm not sure why UPP requires that, but for the time being recompile with -gu to build the GDAS app. @aerorahul Why is the UPP job using the jedi config builder?

It is not. The message says pygfs/__ init __.py is unable to import jcb from analysis.py. jcb is a submodule of gdasapp and gdasapp is now required to be cloned as part of atmosphere-only experimentation. @CoryMartin-NOAA We now will require gdasapp to be cloned for forecast-only experiments since jcb is now a submodule of it. GDASApp is no longer optional.

@aerorahul and @WalterKolczynski-NOAA, global-workflow doc said the -gu option currently only available on Hera, Orion, and Hercules. GFS downstream package can only run on WCOSS2.

The documentation is out-of-date. Build with -u was added for WCOSS a few weeks ago.

GwenChen-NOAA commented 1 week ago

@aerorahul and @WalterKolczynski-NOAA, global-workflow doc said the -gu option currently only available on Hera, Orion, and Hercules. GFS downstream package can only run on WCOSS2.

The documentation is out-of-date. Build with -u was added for WCOSS a few weeks ago.

Great! Thanks! I will make a new build on WCOSS2 with the -gu option.

WalterKolczynski-NOAA commented 1 week ago

This may be the actual cause of the UPP issues: https://github.com/NOAA-EMC/global-workflow/pull/2700#issuecomment-2187130502

GwenChen-NOAA commented 6 days ago

@WalterKolczynski-NOAA, I updated the gfs_tasks.py script and now the gempak job can run successfully in rocoto. Somehow, the goesupp job failed to run, but that's out of the scope for this PR. The gempakgrb2spec job ran successfully when forced with pre-existing goes_sim files. This PR is ready for review.

WalterKolczynski-NOAA commented 5 days ago

Why were my changes to gfs_tasks.py modified to reinsert groups? The point was to eliminate the loops over forecast hours, but now they are added back in.

GwenChen-NOAA commented 5 days ago

Why were my changes to gfs_tasks.py modified to reinsert groups? The point was to eliminate the loops over forecast hours, but now they are added back in.

Because I got an error message of unbound variable FHR3 in the log files when running with your version of gfs_tasks.py, and no gempak files were generated due to undefined FHR3.

My gempak scripts follow the same setup as the awips scripts. They use the FHRLST defined in the _get_awipsgroups function. Although it's called groups, the function generates a list of forecast hours and the scripts are run for each forecast hour separately.

WalterKolczynski-NOAA commented 5 days ago

Why were my changes to gfs_tasks.py modified to reinsert groups? The point was to eliminate the loops over forecast hours, but now they are added back in.

Because I got an error message of unbound variable FHR3 in the log files when running with your version of gfs_tasks.py, and no gempak files were generated due to undefined FHR3.

My gempak scripts follow the same setup as the awips scripts. They use the FHRLST defined in the _get_awipsgroups function. Although it's called groups, the function generates a list of forecast hours and the scripts are run for each forecast hour separately.

That's a simple bug. Restore what was there, except change the following in the task_dict:

                     'envars': self.envars,

to

                     'envars': gempak_vars,

Then remove the forecast hour loops in the jobs/rocoto scritps.

GwenChen-NOAA commented 5 days ago

I don't think the submodule updates are supposed to be in here. Maybe a bad merge?

The submodule updates were automatically added when I merged branch 'NOAA-EMC:develop' into develop. Don't know how to remove them.

WalterKolczynski-NOAA commented 5 days ago

I don't think the submodule updates are supposed to be in here. Maybe a bad merge?

The submodule updates were automatically added when I merged branch 'NOAA-EMC:develop' into develop. Don't know how to remove them.

Yeah, the git history got messed up somehow. Instead of doing something complicated to fix it, checkout the global-workflow develop version of each submodule and then commit them. Step-by-step:

cd sorc/<submodule>
git checkout <hash>
cd ../..
git add sorc/<submodule>
<repeat for all submodules showing as changed>
git commit
git push
GwenChen-NOAA commented 5 days ago

I don't think the submodule updates are supposed to be in here. Maybe a bad merge?

The submodule updates were automatically added when I merged branch 'NOAA-EMC:develop' into develop. Don't know how to remove them.

Yeah, the git history got messed up somehow. Instead of doing something complicated to fix it, checkout the global-workflow develop version of each submodule and then commit them. Step-by-step:

cd sorc/<submodule>
git checkout <hash>
cd ../..
git add sorc/<submodule>
<repeat for all submodules showing as changed>
git commit
git push

Which <hash> should I use?

WalterKolczynski-NOAA commented 5 days ago

I don't think the submodule updates are supposed to be in here. Maybe a bad merge?

The submodule updates were automatically added when I merged branch 'NOAA-EMC:develop' into develop. Don't know how to remove them.

Yeah, the git history got messed up somehow. Instead of doing something complicated to fix it, checkout the global-workflow develop version of each submodule and then commit them. Step-by-step:

cd sorc/<submodule>
git checkout <hash>
cd ../..
git add sorc/<submodule>
<repeat for all submodules showing as changed>
git commit
git push

Which <hash> should I use?

Use the hashes after the @ on this page: https://github.com/NOAA-EMC/global-workflow/tree/develop/sorc

GwenChen-NOAA commented 5 days ago

All should be fixed now.

GwenChen-NOAA commented 4 days ago

@WalterKolczynski-NOAA, the errors below showed up in my rocoto test run. Have you seen them before? Do you know how to fix them?

1) > fhr3=$(printf "%03d" 009)
printf: 009: invalid octal number

2) > fhr3=$(printf "%03d" 012)
> echo $fhr3 010 <-- should be 012

It works fine for FHR3=006: > fhr3=$(printf "%03d" 006)
> echo $fhr3 006

WalterKolczynski-NOAA commented 4 days ago

@WalterKolczynski-NOAA, the errors below showed up in my rocoto test run. Have you seen them before? Do you know how to fix them?

  1. > fhr3=$(printf "%03d" 009) printf: 009: invalid octal number
  2. > fhr3=$(printf "%03d" 012) > echo $fhr3 010 <-- should be 012

It works fine for FHR3=006: > fhr3=$(printf "%03d" 006) > echo $fhr3 006

The input you are providing the scripts is already zero-padded, so instead of assigning the argument to fhr, assign it to fhr3 and remove the printf step. I'll make the necessary suggestions (I didn't in my review).

WalterKolczynski-NOAA commented 4 days ago

In addition to the above, can the wallclock for the jobs that are now running only one forecast hour be reduced in config.resources?

GwenChen-NOAA commented 4 days ago

In addition to the above, can the wallclock for the jobs that are now running only one forecast hour be reduced in config.resources?

I think so. I reduced the wallclock in the jgfs_atmos_gempak.ecf script from 3 hr to 30 min, to be conservative. 10 min should work fine.

WalterKolczynski-NOAA commented 4 days ago

In addition to the above, can the wallclock for the jobs that are now running only one forecast hour be reduced in config.resources?

I think so. I reduced the wallclock in the jgfs_atmos_gempak.ecf script from 3 hr to 30 min, to be conservative. 10 min should work fine.

Okay, please update config.resources with the new wallclocks to match the ecflow (I didn't even look at the ecf, because they all need to be redone anyway).

aerorahul commented 1 day ago

@GwenChen-NOAA Is this ready to be run on WCOSS2 in the CI? I see in a comment from @WalterKolczynski-NOAA that you were running a test before CI. Please let us know when you are done and we can run this through the CI.

GwenChen-NOAA commented 1 day ago

@GwenChen-NOAA Is this ready to be run on WCOSS2 in the CI? I see in a comment from @WalterKolczynski-NOAA that you were running a test before CI. Please let us know when you are done and we can run this through the CI.

@aerorahul, my latest rocoto test ran successfully on WCOSS2 for all 3 gempak jobs. You may run the CI test now. Thanks!

aerorahul commented 1 day ago

@G

@GwenChen-NOAA Is this ready to be run on WCOSS2 in the CI? I see in a comment from @WalterKolczynski-NOAA that you were running a test before CI. Please let us know when you are done and we can run this through the CI.

@aerorahul, my latest rocoto test ran successfully on WCOSS2 for all 3 gempak jobs. You may run the CI test now. Thanks!

@GwenChen-NOAA Thanks!

emcbot commented 1 day ago

CI Update on Wcoss2 at 07/01/24 07:06:08 PM
============================================
Cloning and Building global-workflow PR: 2671
with PID: 158035 on host: dlogin08
emcbot commented 1 day ago

Automated global-workflow Testing Results:


Machine: Wcoss2
Start: Mon Jul  1 19:09:16 UTC 2024 on dlogin08
---------------------------------------------------
Build: Completed at 07/01/24 07:46:45 PM
Case setup: Completed for experiment C48_ATM_ee159039
Case setup: Skipped for experiment C48mx500_3DVarAOWCDA_ee159039
Case setup: Skipped for experiment C48_S2SWA_gefs_ee159039
Case setup: Completed for experiment C48_S2SW_ee159039
Case setup: Completed for experiment C96_atm3DVar_extended_ee159039
Case setup: Skipped for experiment C96_atm3DVar_ee159039
Case setup: Completed for experiment C96_atmaerosnowDA_ee159039
Case setup: Completed for experiment C96C48_hybatmDA_ee159039
Case setup: Completed for experiment C96C48_ufs_hybatmDA_ee159039
emcbot commented 1 day ago

Experiment C96_atm3DVar_extended_ee159039 FAIL on Wcoss2 at 07/02/24 12:08:25 AM

Error logs:

/lfs/h2/emc/global/noscrub/globalworkflow.ci/GFS_CI_ROOT/PR/2671/RUNTESTS/COMROOT/C96_atm3DVar_extended_ee159039/logs/2021122100/gfsnpoess_pgrb2_0p5deg.log

Follow link here to view the contents of the above file(s): (link)

aerorahul commented 1 day ago

Experiment C96_atm3DVar_extended_ee159039 FAIL on Wcoss2 at 07/02/24 12:08:25 AM

Error logs:

/lfs/h2/emc/global/noscrub/globalworkflow.ci/GFS_CI_ROOT/PR/2671/RUNTESTS/COMROOT/C96_atm3DVar_extended_ee159039/logs/2021122100/gfsnpoess_pgrb2_0p5deg.log

Follow link here to view the contents of the above file(s): (link)

https://github.com/GwenChen-NOAA/global-workflow/blob/ee159039cff15e42769ffa986e1dc27389cd8440/scripts/exgfs_atmos_grib2_special_npoess.sh#L154 should be uncommented

GwenChen-NOAA commented 14 hours ago

https://github.com/GwenChen-NOAA/global-workflow/blob/ee159039cff15e42769ffa986e1dc27389cd8440/scripts/exgfs_atmos_grib2_special_npoess.sh#L154 should be uncommented

@aerorahul, it is fixed now.

aerorahul commented 8 hours ago

@GwenChen-NOAA Trying WCOSS2 CI again

emcbot commented 8 hours ago

CI Update on Wcoss2 at 07/02/24 08:48:59 PM
============================================
Cloning and Building global-workflow PR: 2671
with PID: 108148 on host: dlogin08
emcbot commented 7 hours ago

Automated global-workflow Testing Results:


Machine: Wcoss2
Start: Tue Jul  2 20:51:55 UTC 2024 on dlogin08
---------------------------------------------------
Build: Completed at 07/02/24 09:29:13 PM
Case setup: Completed for experiment C48_ATM_2aed07cd
Case setup: Skipped for experiment C48mx500_3DVarAOWCDA_2aed07cd
Case setup: Skipped for experiment C48_S2SWA_gefs_2aed07cd
Case setup: Completed for experiment C48_S2SW_2aed07cd
Case setup: Completed for experiment C96_atm3DVar_extended_2aed07cd
Case setup: Skipped for experiment C96_atm3DVar_2aed07cd
Case setup: Completed for experiment C96_atmaerosnowDA_2aed07cd
Case setup: Completed for experiment C96C48_hybatmDA_2aed07cd
Case setup: Completed for experiment C96C48_ufs_hybatmDA_2aed07cd