NOAA-EMC / global-workflow

Global Superstructure/Workflow supporting the Global Forecast System (GFS)
https://global-workflow.readthedocs.io/en/latest
GNU Lesser General Public License v3.0
70 stars 162 forks source link

Add IAU to snow DA (and its test) #2610

Closed CoryMartin-NOAA closed 1 month ago

CoryMartin-NOAA commented 1 month ago

Description

This PR enables IAU for the snow DA which is necessary for GFSv17.

A snow analysis is created for the center of the window regardless, and an additional at the beginning of the window is added if IAU is on. The former is needed for UPP and the latter, to initialize the model.

The increment is valid throughout the window for 3DVar, so the same increment is added to both forecasts.

Additionally, the input file that goes into global_cycle has been updated to be the output of the JEDI snow analysis instead of the forecast (@jiaruidong2017 I recall discussing this, can you confirm this is right or am I mistaken)

This PR also makes the CI test for snow DA (and aerosol DA) include IAU rather than without IAU,.

Type of change

Change characteristics

How has this been tested?

Example:

Checklist

jiaruidong2017 commented 1 month ago

Additionally, the input file that goes into global_cycle has been updated to be the output of the JEDI snow analysis instead of the forecast (@jiaruidong2017 I recall discussing this, can you confirm this is right or am I mistaken)

@CoryMartin-NOAA Yes. However, it doesn't support the if [[ ${DOIAU} = "YES" ]]; then case yet.

CoryMartin-NOAA commented 1 month ago

Additionally, the input file that goes into global_cycle has been updated to be the output of the JEDI snow analysis instead of the forecast (@jiaruidong2017 I recall discussing this, can you confirm this is right or am I mistaken)

@CoryMartin-NOAA Yes. However, it doesn't support the if [[ ${DOIAU} = "YES" ]]; then case yet.

@jiaruidong2017 can you clarify what you mean by this, does this PR fix that or is something else needed?

jiaruidong2017 commented 1 month ago

Additionally, the input file that goes into global_cycle has been updated to be the output of the JEDI snow analysis instead of the forecast (@jiaruidong2017 I recall discussing this, can you confirm this is right or am I mistaken)

@CoryMartin-NOAA Yes. However, it doesn't support the if [[ ${DOIAU} = "YES" ]]; then case yet.

@jiaruidong2017 can you clarify what you mean by this, does this PR fix that or is something else needed?

@CoryMartin-NOAA Sorry for the confusion. I mean the current G-W doesn't support the IAU. This PR will fix the issue.

CoryMartin-NOAA commented 1 month ago

Thanks @aerorahul those IF statements are much better now.

emcbot commented 1 month ago

CI Update on Wcoss2 at 05/24/24 04:11:43 PM
============================================
Cloning and Building global-workflow PR: 2610
with PID: 156479 on host: clogin01
emcbot commented 1 month ago

Experiment C48_S2SWA_gefs FAILED on Hercules in /work2/noaa/stmp/CI/HERCULES/2610/RUNTESTS/C48_S2SWA_gefs_0d0bfa15

emcbot commented 1 month ago

Experiment C48_ATM FAILED on Hercules in /work2/noaa/stmp/CI/HERCULES/2610/RUNTESTS/C48_ATM_0d0bfa15

emcbot commented 1 month ago

Experiment C96_atm3DVar FAILED on Hercules in /work2/noaa/stmp/CI/HERCULES/2610/RUNTESTS/C96_atm3DVar_0d0bfa15

emcbot commented 1 month ago

Experiment C96C48_hybatmDA FAILED on Orion in /work2/noaa/stmp/CI/ORION/2610/RUNTESTS/C96C48_hybatmDA_0d0bfa15

emcbot commented 1 month ago

Experiment C48_S2SW FAILED on Orion in /work2/noaa/stmp/CI/ORION/2610/RUNTESTS/C48_S2SW_0d0bfa15

emcbot commented 1 month ago

Experiment C96_atm3DVar FAILED on Orion in /work2/noaa/stmp/CI/ORION/2610/RUNTESTS/C96_atm3DVar_0d0bfa15

emcbot commented 1 month ago

Experiment C48_ATM FAILED on Orion in /work2/noaa/stmp/CI/ORION/2610/RUNTESTS/C48_ATM_0d0bfa15

emcbot commented 1 month ago

Experiment C96C48_hybatmDA FAILED on Hercules in /work2/noaa/stmp/CI/HERCULES/2610/RUNTESTS/C96C48_hybatmDA_0d0bfa15

emcbot commented 1 month ago

Experiment C48_S2SW FAILED on Hercules in /work2/noaa/stmp/CI/HERCULES/2610/RUNTESTS/C48_S2SW_0d0bfa15

emcbot commented 1 month ago

Automated global-workflow Testing Results:


Machine: Wcoss2
Start: Fri May 24 16:20:41 UTC 2024 on clogin01
---------------------------------------------------
Build: Completed at 05/24/24 04:32:02 PM
Case setup: Completed for experiment C48_ATM_0d0bfa15
Case setup: Skipped for experiment C48mx500_3DVarAOWCDA_0d0bfa15
Case setup: Skipped for experiment C48_S2SWA_gefs_0d0bfa15
Case setup: Completed for experiment C48_S2SW_0d0bfa15
Case setup: Completed for experiment C96_atm3DVar_extended_0d0bfa15
Case setup: Skipped for experiment C96_atm3DVar_0d0bfa15
Case setup: Skipped for experiment C96_atmaerosnowDA_0d0bfa15
Case setup: Completed for experiment C96C48_hybatmDA_0d0bfa15
Case setup: Skipped for experiment C96C48_ufs_hybatmDA_0d0bfa15
emcbot commented 1 month ago

Experiment C48_ATM_0d0bfa15 SUCCESS on Wcoss2 at 05/24/24 05:51:16 PM

emcbot commented 1 month ago

Experiment C48_S2SW_0d0bfa15 SUCCESS on Wcoss2 at 05/24/24 06:06:18 PM

emcbot commented 1 month ago

Experiment C96C48_hybatmDA_0d0bfa15 SUCCESS on Wcoss2 at 05/24/24 07:06:20 PM

emcbot commented 1 month ago

Experiment C96_atm3DVar_extended_0d0bfa15 SUCCESS on Wcoss2 at 05/25/24 05:06:25 AM

emcbot commented 1 month ago

All CI Test Cases Passed on Wcoss2:


Experiment C48_ATM_0d0bfa15 *** SUCCESS *** at 05/24/24 05:51:16 PM
Experiment C48_S2SW_0d0bfa15 *** SUCCESS *** at 05/24/24 06:06:18 PM
Experiment C96C48_hybatmDA_0d0bfa15 *** SUCCESS *** at 05/24/24 07:06:20 PM
Experiment C96_atm3DVar_extended_0d0bfa15 *** SUCCESS *** at 05/25/24 05:06:25 AM
CoryMartin-NOAA commented 1 month ago

@aerorahul I think now the changes you added should be good to go, the sfcanl job runs to completion with exit code 0.

emcbot commented 1 month ago

CI Update on Wcoss2 at 05/28/24 09:20:16 PM
============================================
Cloning and Building global-workflow PR: 2610
with PID: 39572 on host: clogin01
emcbot commented 1 month ago

Automated global-workflow Testing Results:


Machine: Wcoss2
Start: Tue May 28 21:27:16 UTC 2024 on clogin01
---------------------------------------------------
Build: Completed at 05/28/24 09:38:24 PM
Case setup: Completed for experiment C48_ATM_6c050854
Case setup: Skipped for experiment C48mx500_3DVarAOWCDA_6c050854
Case setup: Skipped for experiment C48_S2SWA_gefs_6c050854
Case setup: Completed for experiment C48_S2SW_6c050854
Case setup: Completed for experiment C96_atm3DVar_extended_6c050854
Case setup: Skipped for experiment C96_atm3DVar_6c050854
Case setup: Skipped for experiment C96_atmaerosnowDA_6c050854
Case setup: Completed for experiment C96C48_hybatmDA_6c050854
Case setup: Skipped for experiment C96C48_ufs_hybatmDA_6c050854
emcbot commented 1 month ago

Experiment C48_ATM_6c050854 SUCCESS on Wcoss2 at 05/28/24 10:51:14 PM

emcbot commented 1 month ago

Experiment C48_S2SW_6c050854 SUCCESS on Wcoss2 at 05/28/24 11:03:11 PM

emcbot commented 1 month ago

Experiment C96_atm3DVar_extended_6c050854 FAIL on Wcoss2 at 05/28/24 11:12:20 PM

Error logs:

/lfs/h2/emc/global/noscrub/globalworkflow.ci/GFS_CI_ROOT/PR/2610/RUNTESTS/COMROOT/C96_atm3DVar_extended_6c050854/logs/2021122106/gfssfcanl.log

Follow link here to view the contents of the above file(s): (link)

emcbot commented 1 month ago

Build FAILED on Orion with error logs:

/work2/noaa/stmp/CI/ORION/2610/gfs/sorc/logs/build_ufs_utils.log

Follow link here to view the contents of the above file(s): (link)

emcbot commented 1 month ago

CI Passed Hera at
Built and ran in directory /scratch1/NCEPDEV/global/CI/2610

emcbot commented 1 month ago

CI Passed Hercules at
Built and ran in directory /work2/noaa/stmp/CI/HERCULES/2610

emcbot commented 1 month ago

Experiment C96C48_hybatmDA FAILED on Orion with error logs:

/work2/noaa/stmp/CI/ORION/2610/RUNTESTS/COMROOT/C96C48_hybatmDA_6c050854/logs/2021122100/gfsanal.log

Follow link here to view the contents of the above file(s): (link)

emcbot commented 1 month ago

Experiment C96C48_hybatmDA FAILED on Orion in /work2/noaa/stmp/CI/ORION/2610/RUNTESTS/C96C48_hybatmDA_6c050854

WalterKolczynski-NOAA commented 1 month ago

I'm not sure why gsi.x would even need a value for SlurmUser, so I'm probably not going to be much help troubleshooting this one.

WalterKolczynski-NOAA commented 1 month ago

@CoryMartin-NOAA just want to get confirmation you know I'm waiting on you to figure out what the issues are (seems like two different ones) and to correct it.

CoryMartin-NOAA commented 1 month ago

@WalterKolczynski-NOAA the issues with GSI on Orion? Or other issues? This PR doesn't touch GSI, can we re-run on Orion and confirm it still has the same issue?

aerorahul commented 1 month ago

@WalterKolczynski-NOAA the issues with GSI on Orion? Or other issues? This PR doesn't touch GSI, can we re-run on Orion and confirm it still has the same issue?

I kicked wcoss2. Will do Orion soon.

emcbot commented 1 month ago

CI Update on Wcoss2 at 05/31/24 08:28:44 PM
============================================
Cloning and Building global-workflow PR: 2610
with PID: 153132 on host: clogin01
emcbot commented 1 month ago

Automated global-workflow Testing Results:


Machine: Wcoss2
Start: Fri May 31 20:36:28 UTC 2024 on clogin01
---------------------------------------------------
Build: Completed at 05/31/24 08:48:19 PM
Case setup: Completed for experiment C48_ATM_89924319
Case setup: Skipped for experiment C48mx500_3DVarAOWCDA_89924319
Case setup: Skipped for experiment C48_S2SWA_gefs_89924319
Case setup: Completed for experiment C48_S2SW_89924319
Case setup: Completed for experiment C96_atm3DVar_extended_89924319
Case setup: Skipped for experiment C96_atm3DVar_89924319
Case setup: Skipped for experiment C96_atmaerosnowDA_89924319
Case setup: Completed for experiment C96C48_hybatmDA_89924319
Case setup: Skipped for experiment C96C48_ufs_hybatmDA_89924319
emcbot commented 1 month ago

Experiment C48_ATM_89924319 SUCCESS on Wcoss2 at 05/31/24 09:57:13 PM

emcbot commented 1 month ago

Experiment C48_S2SW_89924319 SUCCESS on Wcoss2 at 05/31/24 10:21:10 PM

emcbot commented 1 month ago

Experiment C96_atm3DVar_extended_89924319 FAIL on Wcoss2 at 05/31/24 10:33:21 PM

Error logs:

/lfs/h2/emc/global/noscrub/globalworkflow.ci/GFS_CI_ROOT/PR/2610/RUNTESTS/COMROOT/C96_atm3DVar_extended_89924319/logs/2021122106/gfssfcanl.log

Follow link here to view the contents of the above file(s): (link)

WalterKolczynski-NOAA commented 1 month ago
nid002580.cactus.wcoss2.ncep.noaa.gov 2:  FATAL ERROR: ice concentration
 analysis read error.
nid002580.cactus.wcoss2.ncep.noaa.gov 2: abort: 
nid002580.cactus.wcoss2.ncep.noaa.gov: rank 2 exited with code 134
nid002580.cactus.wcoss2.ncep.noaa.gov 0: forrtl: error (78): process killed (SIGTERM)
CoryMartin-NOAA commented 1 month ago

Thanks @WalterKolczynski-NOAA No idea why this is failing on WCOSS2... I'll see if I can look into it or find someone to help

CoryMartin-NOAA commented 1 month ago

@WalterKolczynski-NOAA this one test on WCOSS is only ran there, right? what is unique about it compared to the other CI tests. @aerorahul I fear the issue is (without any evidence) related to the IAU valid time in global_cycle, because it runs successfully for the center of the window, but not the beginning.

CoryMartin-NOAA commented 1 month ago

FYI the previous cycle's gfssfcanl completes successfully, but the gfsfcst fails...

CoryMartin-NOAA commented 1 month ago

@GeorgeGayno-NOAA any ideas on this? It seems that the issue is reading a sea ice GRIB file, and while it seems to work for 21 and 0 for the 00z cycle it will only work for 06 (not 03z) for the 06z cycle. I think the best solution for now is probably to revert to the "bug" where global_cycle assumes the center of the window even if the sfc restarts are for the beginning of the window.

CoryMartin-NOAA commented 1 month ago

It looks like the seaice file is updated 1x per day but there are copies for every cycle. My guess based on digging is that when the global_cycle is reading the file to find a matching time, it looks to see if the date is correct, and if not, go back a day, and check for that valid time, but 03z might be right when the change over happens so it can't find a matching time. So I think we just need to revert to the same ih regardless of beginning or middle of the window (for now)

CoryMartin-NOAA commented 1 month ago

@aerorahul I've added that change, is there an easy way to re-run just the gfssfcanl job that failed to see if this fixes it?

aerorahul commented 1 month ago

@aerorahul I've added that change, is there an easy way to re-run just the gfssfcanl job that failed to see if this fixes it?

In the CI? No. That is a desired capability to run a single test. I will take it w/ @TerryMcGuinness-NOAA, but he is on PTO today. Fastest way is to re-trigger the CI on WCOSS2.

emcbot commented 1 month ago

CI Update on Wcoss2 at 06/03/24 03:52:45 PM
============================================
Cloning and Building global-workflow PR: 2610
with PID: 62443 on host: clogin01
emcbot commented 1 month ago

Automated global-workflow Testing Results:


Machine: Wcoss2
Start: Mon Jun  3 15:59:35 UTC 2024 on clogin01
---------------------------------------------------
Build: Completed at 06/03/24 04:12:55 PM
Case setup: Completed for experiment C48_ATM_3e5eea69
Case setup: Skipped for experiment C48mx500_3DVarAOWCDA_3e5eea69
Case setup: Skipped for experiment C48_S2SWA_gefs_3e5eea69
Case setup: Completed for experiment C48_S2SW_3e5eea69
Case setup: Completed for experiment C96_atm3DVar_extended_3e5eea69
Case setup: Skipped for experiment C96_atm3DVar_3e5eea69
Case setup: Skipped for experiment C96_atmaerosnowDA_3e5eea69
Case setup: Completed for experiment C96C48_hybatmDA_3e5eea69
Case setup: Skipped for experiment C96C48_ufs_hybatmDA_3e5eea69