Closed CoryMartin-NOAA closed 3 months ago
unit tests fail because EMC RZDM is down...
unit tests fail because EMC RZDM is down...
I am unable to build GDASApp from PR #1033. I no longer think PR #1033 is the issue. I'm beginning to suspect the failure is related to EMC RZDM being offline. Attempts to build GDASApp develop
at several hashes fail as documented in PR #1033.
I'm not sure what to do to fix this logjam. We have no idea when RZDM will be back. Do we disable all unit tests in the interim? Find an alternate place to store the files? Other options?
Automated Global-Workflow GDASApp Testing Results: Machine: orion
Start: Tue Apr 16 09:37:02 CDT 2024 on Orion-login-1.HPC.MsState.Edu
---------------------------------------------------
Build: *FAILED*
Build: Failed at Tue Apr 16 09:57:11 CDT 2024
Build: see output at /work2/noaa/stmp/cmartin/CI/GDASApp/workflow/PR/1045/global-workflow/sorc/log.build
Automated Global-Workflow GDASApp Testing Results: Machine: hera
Start: Tue Apr 16 14:44:36 UTC 2024 on hfe11
---------------------------------------------------
Build: *FAILED*
Build: Failed at Tue Apr 16 15:02:40 UTC 2024
Build: see output at /scratch1/NCEPDEV/da/Cory.R.Martin/CI/GDASApp/workflow/PR/1045/global-workflow/sorc/log.build
I'm not sure what to do to fix this logjam. We have no idea when RZDM will be back. Do we disable all unit tests in the interim? Find an alternate place to store the files? Other options?
should we put the tarball on hpc instead?
Why does our not being able to download test files from EMC RZDM break the GDASApp build? It should only break testing, not the build, right? Does cmake intertwine application testing with the application build in such a way that if we can't build the tests, we can't build GDASApp?
Do we have tarball gdasapp-fix-${SHORTSHA}.tgz
available from somewhere besides https://ftp.emc.ncep.noaa.gov/static_files/public/GDASApp
?
Why does our not being able to download test files from EMC RZDM break the GDASApp build? It should only break testing, not the build, right? Does cmake intertwine application testing with the application build in such a way that if we can't build the tests, we can't build GDASApp?
It's because CMake is doing the downloading. It fails, so CMake can't finish. Perhaps we should instead have a test that runs that downloads/links the files. This is how JCSDA does it for JEDI.
Placing the tarball on HPC is an option but when the given machine is offline, we're stuck. We could place the tarball on multiple HPC platforms. Even this approach has potential pitfalls. For example, we place the tarball in /work2
but when we want to build on Orion or Hercules /work2
is offline for some reason and only /work
is available.
Why does our not being able to download test files from EMC RZDM break the GDASApp build? It should only break testing, not the build, right? Does cmake intertwine application testing with the application build in such a way that if we can't build the tests, we can't build GDASApp?
It's because CMake is doing the downloading. It fails, so CMake can't finish. Perhaps we should instead have a test that runs that downloads/links the files. This is how JCSDA does it for JEDI.
This is a good option. It separates the application build from the application tests. We should be able to compile GDASApp whether or not data for ctests is available.
Automated Global-Workflow GDASApp Testing Results: Machine: hera
Start: Wed Apr 17 00:36:44 UTC 2024 on hfe11
---------------------------------------------------
Tests: *Failed*
Tests: Failed at Wed Apr 17 00:37:30 UTC 2024
Tests:
Tests: see output at /scratch1/NCEPDEV/da/Cory.R.Martin/CI/GDASApp/workflow/PR/1045/global-workflow/sorc/gdas.cd/build/log.ctest
Automated Global-Workflow GDASApp Testing Results: Machine: orion
Start: Tue Apr 16 19:46:56 CDT 2024 on Orion-login-1.HPC.MsState.Edu
---------------------------------------------------
Tests: *Failed*
Tests: Failed at Tue Apr 16 19:47:20 CDT 2024
Tests:
Tests: see output at /work2/noaa/stmp/cmartin/CI/GDASApp/workflow/PR/1045/global-workflow/sorc/gdas.cd/build/log.ctest
Automated Global-Workflow GDASApp Testing Results: Machine: hera
Start: Wed Apr 17 00:36:44 UTC 2024 on hfe11
---------------------------------------------------
Tests: *Failed*
Tests: Failed at Wed Apr 17 00:37:30 UTC 2024
Tests:
Tests: see output at /scratch1/NCEPDEV/da/Cory.R.Martin/CI/GDASApp/workflow/PR/1045/global-workflow/sorc/gdas.cd/build/log.ctest
Tests: Failed Tests: Failed at Wed Apr 17 01:39:40 UTC 2024 Tests: 91% tests passed, 5 tests failed out of 54 1763 - test_gdasapp_soca_JGDAS_GLOBAL_OCEAN_ANALYSIS_VRFY (Failed) 1776 - test_gdasapp_atm_jjob_var_run (Failed) 1777 - test_gdasapp_atm_jjob_var_inc (Failed) 1778 - test_gdasapp_atm_jjob_var_final (Failed) Tests: see output at /scratch1/NCEPDEV/da/Cory.R.Martin/CI/GDASApp/workflow/PR/1045/global-workflow/sorc/gdas.cd/build/log.ctest
Automated Global-Workflow GDASApp Testing Results: Machine: orion
Start: Tue Apr 16 19:46:56 CDT 2024 on Orion-login-1.HPC.MsState.Edu
---------------------------------------------------
Tests: *Failed*
Tests: Failed at Tue Apr 16 19:47:20 CDT 2024
Tests:
Tests: see output at /work2/noaa/stmp/cmartin/CI/GDASApp/workflow/PR/1045/global-workflow/sorc/gdas.cd/build/log.ctest
Tests: Failed Tests: Failed at Tue Apr 16 21:02:46 CDT 2024 Tests: 93% tests passed, 4 tests failed out of 54 1777 - test_gdasapp_atm_jjob_var_run (Failed) 1778 - test_gdasapp_atm_jjob_var_inc (Failed) 1779 - test_gdasapp_atm_jjob_var_final (Failed) Tests: see output at /work2/noaa/stmp/cmartin/CI/GDASApp/workflow/PR/1045/global-workflow/sorc/gdas.cd/build/log.ctest
Automated Global-Workflow GDASApp Testing Results: Machine: hera
Start: Wed Apr 17 01:42:32 UTC 2024 on hfe06
---------------------------------------------------
Build: *SUCCESS*
Build: Completed at Wed Apr 17 02:33:04 UTC 2024
---------------------------------------------------
Tests: *Failed*
Tests: Failed at Wed Apr 17 02:49:26 UTC 2024
Tests: 91% tests passed, 5 tests failed out of 54
1763 - test_gdasapp_soca_JGDAS_GLOBAL_OCEAN_ANALYSIS_VRFY (Failed)
1776 - test_gdasapp_atm_jjob_var_run (Failed)
1777 - test_gdasapp_atm_jjob_var_inc (Failed)
1778 - test_gdasapp_atm_jjob_var_final (Failed)
Tests: see output at /scratch1/NCEPDEV/da/Cory.R.Martin/CI/GDASApp/workflow/PR/1045/global-workflow/sorc/gdas.cd/build/log.ctest
Automated Global-Workflow GDASApp Testing Results: Machine: orion
Start: Tue Apr 16 21:03:53 CDT 2024 on Orion-login-1.HPC.MsState.Edu
---------------------------------------------------
Build: *SUCCESS*
Build: Completed at Tue Apr 16 21:58:27 CDT 2024
---------------------------------------------------
Tests: *Failed*
Tests: Failed at Tue Apr 16 22:30:14 CDT 2024
Tests: 91% tests passed, 5 tests failed out of 54
1764 - test_gdasapp_soca_JGDAS_GLOBAL_OCEAN_ANALYSIS_VRFY (Failed)
1777 - test_gdasapp_atm_jjob_var_run (Failed)
1778 - test_gdasapp_atm_jjob_var_inc (Failed)
1779 - test_gdasapp_atm_jjob_var_final (Failed)
Tests: see output at /work2/noaa/stmp/cmartin/CI/GDASApp/workflow/PR/1045/global-workflow/sorc/gdas.cd/build/log.ctest
The
test_gdasapp_util_ghrsst2ioda
job is failing too but isn't reported as failed. Can you update the references @apchoiCMD ? The vrfy task is failing with an issue reading the layer thicknesses, not sure why ...This PR fixes the previous error, good enough to be merged. We'll comment out the
verify
test until one of us water people have time to look into it and fix the issue.
My bad but your PR #1050 already includes a modified test reference file https://github.com/NOAA-EMC/GDASApp/pull/1050/files#diff-68603b0771f7acb935fa0d121599ea74718ba8012d0e7dc7c1ee1a192150d93e Do you want me to update before merging your PR? Thanks @guillaumevernieres
What the title says. Fixes #994