Closed NeilBarton-NOAA closed 1 month ago
ready for review. This will have a conflict with @KateFriedman-NOAA 's PR https://github.com/NOAA-EMC/global-workflow/pull/2651
in my opinion, this is ready for further review @WalterKolczynski-NOAA @aerorahul
in my opinion, this is ready for further review @WalterKolczynski-NOAA @aerorahul
I thought we were redirecting this to the new feature/gefs_reforecast
branch. You should be able to just change the target branch for this PR (or I can do it).
I thought we were redirecting this to the new
feature/gefs_reforecast
branch. You should be able to just change the target branch for this PR (or I can do it).
@WalterKolczynski-NOAA Will we be able to run CI tests with feature/gefs_reforecast
as the target branch? If yes, I am fine with feature/gefs_reforecast
being the target of this PR.
I thought we were redirecting this to the new
feature/gefs_reforecast
branch. You should be able to just change the target branch for this PR (or I can do it).@WalterKolczynski-NOAA Will we be able to run CI tests with
feature/gefs_reforecast
as the target branch? If yes, I am fine withfeature/gefs_reforecast
being the target of this PR.
Should be able to.
Changed the target to feature/gefs_reforecast
@aerorahul and I talked about this going into dev. Otherwise updates going into dev have a great potential of breaking GEFS and SFS runs
Changed the target back to develop.
@aerorahul I have addressed your review comments related to the extractvars task. With Neil's permission, I have pushed these changes directly to his branch.
@WalterKolczynski-NOAA @aerorahul This PR is ready for further review/CI testing.
@KateFriedman-NOAA I update the stage ic yamls with the default directory structure, You'll have to pick up the ICs again at
hera:/scratch2/NCEPDEV/stmp3/Neil.Barton/ICs/REPLAY_ICs/CI
Everything else is working on my side
@KateFriedman-NOAA I update the stage ic yamls with the default directory structure, You'll have to pick up the ICs again at
hera:/scratch2/NCEPDEV/stmp3/Neil.Barton/ICs/REPLAY_ICs/CI
@NeilBarton-NOAA I've pulled the updated IC set into ICSDIR
on Hera. See here: /scratch1/NCEPDEV/global/glopara/data/ICSDIR/C96mx100/20240610
I pulled the files through the symlinks I saw within your set (rsync -azvL
), so as to not have any symlinks to personal files. I also placed it under a timestamp folder. Let me know if the ICs look good on Hera and we'll sync them to the other platforms. After we sync them I will also remove the prior set that I pulled in that wasn't yet under the timestamp folder (/scratch1/NCEPDEV/global/glopara/data/ICSDIR/C96mx100/gefs.20201101
).
Conflicts have been resolved. I think this PR is ready to move forward.
CI Update on Wcoss2 at 09/26/24 07:55:14 PM
============================================
Cloning and Building global-workflow PR: 2788
with PID: 66282 on host: clogin03
Checkout Failed on Hera in Build# 2: Could not perform submodule update
Checkout Failed on Hera in Build# 2: Could not perform submodule update
@TerrenceMcGuinness-NOAA @KateFriedman-NOAA What are we dealing with here? Seems like Hera issues. Do we need to open a ticket w/ Hera admins?
Automated global-workflow Testing Results:
Machine: Wcoss2
Start: Thu Sep 26 20:01:23 UTC 2024 on clogin03
---------------------------------------------------
Build: Completed at 09/26/24 08:38:56 PM
Case setup: Completed for experiment C48_ATM_b0d0cd46
Case setup: Skipped for experiment C48mx500_3DVarAOWCDA_b0d0cd46
Case setup: Skipped for experiment C48_S2SWA_gefs_b0d0cd46
Case setup: Completed for experiment C48_S2SW_b0d0cd46
Case setup: Completed for experiment C96_atm3DVar_extended_b0d0cd46
Case setup: Skipped for experiment C96_atm3DVar_b0d0cd46
Case setup: Completed for experiment C96C48_hybatmaerosnowDA_b0d0cd46
Case setup: Completed for experiment C96C48_hybatmDA_b0d0cd46
Case setup: Completed for experiment C96C48_ufs_hybatmDA_b0d0cd46
Case setup: Completed for experiment C96_S2SWA_gefs_replay_ics_b0d0cd46
Experiment C96_S2SWA_gefs_replay_ics_b0d0cd46 FAIL on Wcoss2 at 09/26/24 10:35:31 PM
Error logs:
/lfs/h2/emc/global/noscrub/globalworkflow.ci/GFS_CI_ROOT/PR/2788/RUNTESTS/COMROOT/C96_S2SWA_gefs_replay_ics_b0d0cd46/logs/2020110100/stage_ic.log
Follow link here to view the contents of the above file(s): (link)
Checkout Failed on Hera in Build# 2: Could not perform submodule update
@TerrenceMcGuinness-NOAA @KateFriedman-NOAA What are we dealing with here? Seems like Hera issues. Do we need to open a ticket w/ Hera admins?
There have been intermittent but persistent issues checking out gdas on Hera for at least a week now (specifically git-lfs).
@WalterKolczynski-NOAA We have the same issue on Gaea. The git-lfs
is timing out in the "Nodes" and we an not reproducing the error directly at the prompts outside of the Jenkins Nodes. I am looking into increasing the time limits set by the Jenkins scm plugin. ~T.McG
Submodule 'pkg/CVMix-src' (https://github.com/mom-ocean/CVMix-src.git) registered for path 'sorc/gdas.cd/sorc/soca/mom6/MOM6/pkg/CVMix-src'
Submodule 'pkg/GSW-Fortran' (https://github.com/mom-ocean/GSW-Fortran.git) registered for path 'sorc/gdas.cd/sorc/soca/mom6/MOM6/pkg/GSW-Fortran'
Cloning into '/scratch1/NCEPDEV/global/CI/2788/gfs/sorc/gdas.cd/sorc/soca/mom6/MOM6/pkg/CVMix-src'...
Cloning into '/scratch1/NCEPDEV/global/CI/2788/gfs/sorc/gdas.cd/sorc/soca/mom6/MOM6/pkg/GSW-Fortran'...
fatal: the remote end hung up unexpectedly
error: git-lfs filter-process died of signal 15
fatal: Unable to checkout 'cd66505007b1559d79cb158bd6dc018a3943c1e7' in submodule path 'sorc/gdas.cd/sorc/ufo'
fatal: Failed to recurse into submodule path 'sorc/gdas.cd'
at org.jenkinsci.plugins.gitclient.CliGitAPIImpl.launchCommandIn(CliGitAPIImpl.java:2846)
at org.jenkinsci.plugins.gitclient.CliGitAPIImpl.launchCommandWithCredentials(CliGitAPIImpl.java:2185)
at org.jenkinsci.plugins.gitclient.CliGitAPIImpl$7.lambda$execute$0(CliGitAPIImpl.java:1573)
at com.google.common.util.concurrent.TrustedListenableFutureTask$TrustedFutureInterruptibleTask.runInterruptibly(TrustedListenableFutureTask.java:131)
at com.google.common.util.concurrent.InterruptibleTask.run(InterruptibleTask.java:76)
at com.google.common.util.concurrent.TrustedListenableFutureTask.run(TrustedListenableFutureTask.java:82)
at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:539)
at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
at com.google.common.util.concurrent.DirectExecutorService.execute(DirectExecutorService.java:51)
at java.base/java.util.concurrent.ExecutorCompletionService.submit(ExecutorCompletionService.java:184)
at org.jenkinsci.plugins.gitclient.cgit.GitCommandsExecutor.submitRemainingCommand(GitCommandsExecutor.java:77)
at org.jenkinsci.plugins.gitclient.cgit.GitCommandsExecutor.invokeAll(GitCommandsExecutor.java:70)
Also: hudson.remoting.Channel$CallSiteStackTrace: Remote call to Hera-EMC
at hudson.remoting.Channel.attachCallSiteStackTrace(Channel.java:1826)
at hudson.remoting.UserRequest$ExceptionResponse.retrieve(UserRequest.java:356)
at hudson.remoting.Channel.call(Channel.java:1042)
at PluginClassLoader for git-client//org.jenkinsci.plugins.gitclient.RemoteGitImpl$CommandInvocationHandler.execute(RemoteGitImpl.java:153)
at jdk.internal.reflect.GeneratedMethodAccessor421.invoke(Unknown Source)
at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.base/java.lang.reflect.Method.invoke(Method.java:569)
at PluginClassLoader for git-client//org.jenkinsci.plugins.gitclient.RemoteGitImpl$CommandInvocationHandler.invoke(RemoteGitImpl.java:138)
at PluginClassLoader for git-client/jdk.proxy26/jdk.proxy26.$Proxy109.execute(Unknown Source)
at PluginClassLoader for git//hudson.plugins.git.extensions.impl.SubmoduleOption.onCheckoutCompleted(SubmoduleOption.java:196)
at PluginClassLoader for git//hudson.plugins.git.GitSCM._checkout(GitSCM.java:1395)
at PluginClassLoader for git//hudson.plugins.git.GitSCM.checkout(GitSCM.java:1277)
at PluginClassLoader for workflow-scm-step//org.jenkinsci.plugins.workflow.steps.scm.SCMStep.checkout(SCMStep.java:136)
at PluginClassLoader for workflow-scm-step//org.jenkinsci.plugins.workflow.steps.scm.SCMStep$StepExecutionImpl.run(SCMStep.java:101)
at PluginClassLoader for workflow-scm-step//org.jenkinsci.plugins.workflow.steps.scm.SCMStep$StepExecutionImpl.run(SCMStep.java:88)
at PluginClassLoader for workflow-step-api//org.jenkinsci.plugins.workflow.steps.SynchronousNonBlockingStepExecution.lambda$start$0(SynchronousNonBlockingStepExecution.java:47)
at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:539)
at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136)
at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635)
at java.base/java.lang.Thread.run(Thread.java:840)
Caused: hudson.plugins.git.GitException
at org.jenkinsci.plugins.gitclient.cgit.GitCommandsExecutor.checkResult(GitCommandsExecutor.java:89)
at org.jenkinsci.plugins.gitclient.cgit.GitCommandsExecutor.invokeAll(GitCommandsExecutor.java:69)
at org.jenkinsci.plugins.gitclient.cgit.GitCommandsExecutor.invokeAll(GitCommandsExecutor.java:47)
at org.jenkinsci.plugins.gitclient.CliGitAPIImpl$7.execute(CliGitAPIImpl.java:1576)
at org.jenkinsci.plugins.gitclient.RemoteGitImpl$CommandInvocationHandler$GitCommandMasterToSlaveCallable.call(RemoteGitImpl.java:170)
at org.jenkinsci.plugins.gitclient.RemoteGitImpl$CommandInvocationHandler$GitCommandMasterToSlaveCallable.call(RemoteGitImpl.java:161)
at hudson.remoting.UserRequest.perform(UserRequest.java:211)
at hudson.remoting.UserRequest.perform(UserRequest.java:54)
at hudson.remoting.Request$2.run(Request.java:377)
at hudson.remoting.InterceptingExecutorService.lambda$wrap$0(InterceptingExecutorService.java:78)
at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136)
at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635)
at hudson.remoting.Engine$1.lambda$newThread$0(Engine.java:137)
at java.base/java.lang.Thread.run(Thread.java:842)
Caused: java.io.IOException: Could not perform submodule update
at PluginClassLoader for git//hudson.plugins.git.extensions.impl.SubmoduleOption.onCheckoutCompleted(SubmoduleOption.java:201)
at PluginClassLoader for git//hudson.plugins.git.GitSCM._checkout(GitSCM.java:1395)
at PluginClassLoader for git//hudson.plugins.git.GitSCM.checkout(GitSCM.java:1277)
at PluginClassLoader for workflow-scm-step//org.jenkinsci.plugins.workflow.steps.scm.SCMStep.checkout(SCMStep.java:136)
at PluginClassLoader for workflow-scm-step//org.jenkinsci.plugins.workflow.steps.scm.SCMStep$StepExecutionImpl.run(SCMStep.java:101)
at PluginClassLoader for workflow-scm-step//org.jenkinsci.plugins.workflow.steps.scm.SCMStep$StepExecutionImpl.run(SCMStep.java:88)
at PluginClassLoader for workflow-step-api//org.jenkinsci.plugins.workflow.steps.SynchronousNonBlockingStepExecution.lambda$start$0(SynchronousNonBlockingStepExecution.java:47)
at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:539)
at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136)
at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635)
at java.base/java.lang.Thread.run(Thread.java:840)
ERROR: Maximum checkout retry attempts reached, aborting
ERROR: Timeout after 10 minutes
ICSDIR not being populated correctly on WCOSS2 (possibly elsewhere too?):
+++ config.stage_ic[10]: export ICSDIR=UNDEFINED/REPLAY_ICs/CI
I edited the YAML to point to the CI ICs instead of my personal ICs. It's now working on hera
I increased the timeout settings for scm (Software Control Module) in the Jenkins project. If you are ready you can restart the CI tests and see if that helps with the time outs from the submodule checkouts.
Checkout Failed on Hera in Build# 3: Could not perform submodule update
Checkout Failed on Hera in Build# 4: Could not perform submodule update
The check phase of build was unable to update some submodules. Let me clear out all the Jenkins cache files for this work space so scm can start completely over and do new clone.
Build FAILED on Hera in Build# 5 with error logs:
/scratch1/NCEPDEV/global/CI/2788/gfs/sorc/logs/build_gsi_utils.log
Follow link here to view the contents of the above file(s): (link)
Cannot contact Hera-EMC: hudson.remoting.ChannelClosedException: Channel "hudson.remoting.Channel@1e281834:Hera-EMC": Remote call on Hera-EMC failed. The channel is closing down or has closed down
Hera lost connection with the Jenkins Controller. The connection was restored with health checking scripts and the PR CI is being relaunched.
@WalterKolczynski-NOAA NOTE that it was during the actual build stage after cloning that this "disconnected" error occurred on Hera so it is unrelated to the prior time out issues that we believe have been resolved.
Experiment C48_S2SWA_gefs FAILED on Hera in Build# 6 with error logs:
/scratch1/NCEPDEV/global/CI/2788/RUNTESTS/COMROOT/C48_S2SWA_gefs_f918cc62/logs/2021032312/ocean_prod_mem000_f006.log
Follow link here to view the contents of the above file(s): (link)
Experiment C48_S2SWA_gefs FAILED on Hera in Build# 6 in
/scratch1/NCEPDEV/global/CI/2788/RUNTESTS/EXPDIR/C48_S2SWA_gefs_f918cc62
Looks like a 'real' error this time:
[38;21m2024-10-02 20:37:31,785 - INFO - oceanice_products: Copy ocean data to run directory[0m
Traceback (most recent call last):
File "/scratch1/NCEPDEV/global/CI/2788/gefs/ush/python/wxflow/fsutils.py", line 85, in cp
shutil.copy2(source, target)
File "/scratch1/NCEPDEV/nems/role.epic/spack-stack/spack-stack-1.6.0/envs/gsi-addon-dev-rocky8/install/intel/2021.5.0/python-3.11.6-b6ydksr/lib/python3.11/shutil.py", line 436, in copy2
copyfile(src, dst, follow_symlinks=follow_symlinks)
File "/scratch1/NCEPDEV/nems/role.epic/spack-stack/spack-stack-1.6.0/envs/gsi-addon-dev-rocky8/install/intel/2021.5.0/python-3.11.6-b6ydksr/lib/python3.11/shutil.py", line 256, in copyfile
with open(src, 'rb') as fsrc:
^^^^^^^^^^^^^^^
FileNotFoundError: [Errno 2] No such file or directory: '/scratch1/NCEPDEV/global/CI/2788/RUNTESTS/COMROOT/C48_S2SWA_gefs_f918cc62/gefs.20210323/12/mem000/model/ocean/history/gefs.ocean.t12z.6hr_avg.f006.nc'
CI Failed on Hera in Build# 6
Built and ran in directory /scratch1/NCEPDEV/global/CI/2788
Experiment C48_S2SWA_gefs_f918cc62 Terminated with 0
FAIL
FAIL tasks failed and 1 dead at Wed Oct 2 20:40:43 UTC 2024
Experiment C48_S2SWA_gefs_f918cc62 Terminated: *FAIL*
Error logs:
/scratch1/NCEPDEV/global/CI/2788/RUNTESTS/COMROOT/C48_S2SWA_gefs_f918cc62/logs/2021032312/ocean_prod_mem000_f006.log
Experiment C48_ATM_f918cc62 Completed 1 Cycles: *SUCCESS* at Wed Oct 2 23:11:31 UTC 2024
Experiment C48mx500_3DVarAOWCDA_f918cc62 Completed 2 Cycles: *SUCCESS* at Wed Oct 2 23:24:04 UTC 2024
Experiment C48_S2SW_f918cc62 Completed 1 Cycles: *SUCCESS* at Wed Oct 2 23:25:49 UTC 2024
Experiment C96_atm3DVar_f918cc62 Completed 3 Cycles: *SUCCESS* at Thu Oct 3 01:03:03 UTC 2024
Experiment C96_S2SWA_gefs_replay_ics_f918cc62 Completed 1 Cycles: *SUCCESS* at Thu Oct 3 01:15:49 UTC 2024
Experiment C96C48_hybatmDA_f918cc62 Completed 3 Cycles: *SUCCESS* at Thu Oct 3 01:28:01 UTC 2024
Experiment C96C48_ufs_hybatmDA_f918cc62 Completed 2 Cycles: *SUCCESS* at Thu Oct 3 01:34:23 UTC 2024
Experiment C96C48_hybatmaerosnowDA_f918cc62 Completed 3 Cycles: *SUCCESS* at Thu Oct 3 07:34:26 UTC 2024
CI test fix updated. Please put through CI tests again
CI Passed on Hera in Build# 7
Built and ran in directory /scratch1/NCEPDEV/global/CI/2788
Experiment C96_S2SWA_gefs_replay_ics_922eadff Completed 1 Cycles: *SUCCESS* at Sat Oct 5 03:55:04 UTC 2024
Experiment C48_ATM_922eadff Completed 1 Cycles: *SUCCESS* at Sat Oct 5 03:58:13 UTC 2024
Experiment C48_S2SW_922eadff Completed 1 Cycles: *SUCCESS* at Sat Oct 5 05:15:57 UTC 2024
Experiment C48mx500_3DVarAOWCDA_922eadff Completed 2 Cycles: *SUCCESS* at Sat Oct 5 08:38:15 UTC 2024
Experiment C96_atm3DVar_922eadff Completed 3 Cycles: *SUCCESS* at Sat Oct 5 09:47:09 UTC 2024
Experiment C48_S2SWA_gefs_922eadff Completed 1 Cycles: *SUCCESS* at Sat Oct 5 09:52:54 UTC 2024
Experiment C96C48_ufs_hybatmDA_922eadff Completed 2 Cycles: *SUCCESS* at Sat Oct 5 09:59:28 UTC 2024
Experiment C96C48_hybatmDA_922eadff Completed 3 Cycles: *SUCCESS* at Sat Oct 5 10:24:11 UTC 2024
Experiment C96C48_hybatmaerosnowDA_922eadff Completed 3 Cycles: *SUCCESS* at Sat Oct 5 10:46:08 UTC 2024
CI Passed on Hercules in Build# 8
Built and ran in directory /work2/noaa/stmp/CI/HERCULES/2788
Experiment C48_ATM_922eadff Completed 1 Cycles: *SUCCESS* at Mon Oct 7 12:51:11 CDT 2024
Experiment C96_S2SWA_gefs_replay_ics_922eadff Completed 1 Cycles: *SUCCESS* at Mon Oct 7 12:57:19 CDT 2024
Experiment C96C48_hybatmDA_922eadff Completed 3 Cycles: *SUCCESS* at Mon Oct 7 13:51:59 CDT 2024
Experiment C96_atm3DVar_922eadff Completed 3 Cycles: *SUCCESS* at Mon Oct 7 13:57:47 CDT 2024
Experiment C48_S2SWA_gefs_922eadff Completed 1 Cycles: *SUCCESS* at Mon Oct 7 14:59:19 CDT 2024
Experiment C48_S2SW_922eadff Completed 1 Cycles: *SUCCESS* at Mon Oct 7 15:22:48 CDT 2024
@NeilBarton-NOAA I've pulled the updated IC set into
ICSDIR
on Hera. See here:/scratch1/NCEPDEV/global/glopara/data/ICSDIR/C96mx100/20240610
I pulled the files through the symlinks I saw within your set (
rsync -azvL
), so as to not have any symlinks to personal files. I also placed it under a timestamp folder. Let me know if the ICs look good on Hera and we'll sync them to the other platforms. After we sync them I will also remove the prior set that I pulled in that wasn't yet under the timestamp folder (/scratch1/NCEPDEV/global/glopara/data/ICSDIR/C96mx100/gefs.20201101
).
@NeilBarton-NOAA Checking back on this. Please confirm the above ICs are good-to-go. Thanks!
@KateFriedman-NOAA The CI tests passed and the code has been merged. The ICs are good-to-go
Thanks for confirming @NeilBarton-NOAA , particularly for our records. I will proceed with deleting the non-timestamped copy (scratch1/NCEPDEV/global/glopara/data/ICSDIR/C96mx100/gefs.20201101
).
@KateFriedman-NOAA I miss understood your question. The CI points to the timestamped ICs
https://github.com/NOAA-EMC/global-workflow/blob/develop/ci/cases/pr/C96_S2SWA_gefs_replay_ics.yaml
If removing the time-stamp folder, the yaml file will need to be edited.
@NeilBarton-NOAA All good, I removed the non-timestamped one. The timestamped one will remain and be used in the CI test.
Description
This PR adds a CI testing using C96mx100 resolution and the S2SWA app. 2 perturb members are included with the control member.
Currently, this PR is dependent on https://github.com/NOAA-EMC/global-workflow/pull/2778 https://github.com/NOAA-EMC/global-workflow/pull/2755
as the above PRs ensure the product tasks are being triggered in the workflow.
I could likely bypass the dependency https://github.com/NOAA-EMC/global-workflow/pull/2755 if desired.
I will keep this PR as a draft until further discussion on the dependencies.
Current test ICs are located on hera at /scratch2/NCEPDEV/stmp3/Neil.Barton/ICs/REPLAY_ICs/CI/2020110100
Type of change
Change characteristics
How has this been tested?
This is currently being tested on HERA.
Checklist