NOAA-EMC / GDASApp

Global Data Assimilation System Application
GNU Lesser General Public License v2.1
14 stars 28 forks source link

Get SOCA vrfy job working on Hera again #1045

Closed CoryMartin-NOAA closed 3 months ago

CoryMartin-NOAA commented 3 months ago

What the title says. Fixes #994

CoryMartin-NOAA commented 3 months ago

unit tests fail because EMC RZDM is down...

RussTreadon-NOAA commented 3 months ago

unit tests fail because EMC RZDM is down...

I am unable to build GDASApp from PR #1033. I no longer think PR #1033 is the issue. I'm beginning to suspect the failure is related to EMC RZDM being offline. Attempts to build GDASApp develop at several hashes fail as documented in PR #1033.

CoryMartin-NOAA commented 3 months ago

I'm not sure what to do to fix this logjam. We have no idea when RZDM will be back. Do we disable all unit tests in the interim? Find an alternate place to store the files? Other options?

emcbot commented 3 months ago

Automated Global-Workflow GDASApp Testing Results: Machine: orion

Start: Tue Apr 16 09:37:02 CDT 2024 on Orion-login-1.HPC.MsState.Edu
---------------------------------------------------
Build:                                  *FAILED*
Build: Failed at Tue Apr 16 09:57:11 CDT 2024
Build: see output at /work2/noaa/stmp/cmartin/CI/GDASApp/workflow/PR/1045/global-workflow/sorc/log.build
emcbot commented 3 months ago

Automated Global-Workflow GDASApp Testing Results: Machine: hera

Start: Tue Apr 16 14:44:36 UTC 2024 on hfe11
---------------------------------------------------
Build:                                  *FAILED*
Build: Failed at Tue Apr 16 15:02:40 UTC 2024
Build: see output at /scratch1/NCEPDEV/da/Cory.R.Martin/CI/GDASApp/workflow/PR/1045/global-workflow/sorc/log.build
guillaumevernieres commented 3 months ago

I'm not sure what to do to fix this logjam. We have no idea when RZDM will be back. Do we disable all unit tests in the interim? Find an alternate place to store the files? Other options?

should we put the tarball on hpc instead?

RussTreadon-NOAA commented 3 months ago

Why does our not being able to download test files from EMC RZDM break the GDASApp build? It should only break testing, not the build, right? Does cmake intertwine application testing with the application build in such a way that if we can't build the tests, we can't build GDASApp?

RussTreadon-NOAA commented 3 months ago

Do we have tarball gdasapp-fix-${SHORTSHA}.tgz available from somewhere besides https://ftp.emc.ncep.noaa.gov/static_files/public/GDASApp?

CoryMartin-NOAA commented 3 months ago

Why does our not being able to download test files from EMC RZDM break the GDASApp build? It should only break testing, not the build, right? Does cmake intertwine application testing with the application build in such a way that if we can't build the tests, we can't build GDASApp?

It's because CMake is doing the downloading. It fails, so CMake can't finish. Perhaps we should instead have a test that runs that downloads/links the files. This is how JCSDA does it for JEDI.

RussTreadon-NOAA commented 3 months ago

Placing the tarball on HPC is an option but when the given machine is offline, we're stuck. We could place the tarball on multiple HPC platforms. Even this approach has potential pitfalls. For example, we place the tarball in /work2 but when we want to build on Orion or Hercules /work2 is offline for some reason and only /work is available.

RussTreadon-NOAA commented 3 months ago

Why does our not being able to download test files from EMC RZDM break the GDASApp build? It should only break testing, not the build, right? Does cmake intertwine application testing with the application build in such a way that if we can't build the tests, we can't build GDASApp?

It's because CMake is doing the downloading. It fails, so CMake can't finish. Perhaps we should instead have a test that runs that downloads/links the files. This is how JCSDA does it for JEDI.

This is a good option. It separates the application build from the application tests. We should be able to compile GDASApp whether or not data for ctests is available.

emcbot commented 3 months ago

Automated Global-Workflow GDASApp Testing Results: Machine: hera

Start: Wed Apr 17 00:36:44 UTC 2024 on hfe11
---------------------------------------------------
Tests:                                  *Failed*
Tests: Failed at Wed Apr 17 00:37:30 UTC 2024
Tests: 
Tests: see output at /scratch1/NCEPDEV/da/Cory.R.Martin/CI/GDASApp/workflow/PR/1045/global-workflow/sorc/gdas.cd/build/log.ctest
emcbot commented 3 months ago

Automated Global-Workflow GDASApp Testing Results: Machine: orion

Start: Tue Apr 16 19:46:56 CDT 2024 on Orion-login-1.HPC.MsState.Edu
---------------------------------------------------
Tests:                                  *Failed*
Tests: Failed at Tue Apr 16 19:47:20 CDT 2024
Tests: 
Tests: see output at /work2/noaa/stmp/cmartin/CI/GDASApp/workflow/PR/1045/global-workflow/sorc/gdas.cd/build/log.ctest
emcbot commented 3 months ago

Automated Global-Workflow GDASApp Testing Results: Machine: hera

Start: Wed Apr 17 00:36:44 UTC 2024 on hfe11
---------------------------------------------------
Tests:                                  *Failed*
Tests: Failed at Wed Apr 17 00:37:30 UTC 2024
Tests: 
Tests: see output at /scratch1/NCEPDEV/da/Cory.R.Martin/CI/GDASApp/workflow/PR/1045/global-workflow/sorc/gdas.cd/build/log.ctest

Build: SUCCESS Build: Completed at Wed Apr 17 01:23:34 UTC 2024

Tests: Failed Tests: Failed at Wed Apr 17 01:39:40 UTC 2024 Tests: 91% tests passed, 5 tests failed out of 54 1763 - test_gdasapp_soca_JGDAS_GLOBAL_OCEAN_ANALYSIS_VRFY (Failed) 1776 - test_gdasapp_atm_jjob_var_run (Failed) 1777 - test_gdasapp_atm_jjob_var_inc (Failed) 1778 - test_gdasapp_atm_jjob_var_final (Failed) Tests: see output at /scratch1/NCEPDEV/da/Cory.R.Martin/CI/GDASApp/workflow/PR/1045/global-workflow/sorc/gdas.cd/build/log.ctest

emcbot commented 3 months ago

Automated Global-Workflow GDASApp Testing Results: Machine: orion

Start: Tue Apr 16 19:46:56 CDT 2024 on Orion-login-1.HPC.MsState.Edu
---------------------------------------------------
Tests:                                  *Failed*
Tests: Failed at Tue Apr 16 19:47:20 CDT 2024
Tests: 
Tests: see output at /work2/noaa/stmp/cmartin/CI/GDASApp/workflow/PR/1045/global-workflow/sorc/gdas.cd/build/log.ctest

Build: SUCCESS Build: Completed at Tue Apr 16 20:42:38 CDT 2024

Tests: Failed Tests: Failed at Tue Apr 16 21:02:46 CDT 2024 Tests: 93% tests passed, 4 tests failed out of 54 1777 - test_gdasapp_atm_jjob_var_run (Failed) 1778 - test_gdasapp_atm_jjob_var_inc (Failed) 1779 - test_gdasapp_atm_jjob_var_final (Failed) Tests: see output at /work2/noaa/stmp/cmartin/CI/GDASApp/workflow/PR/1045/global-workflow/sorc/gdas.cd/build/log.ctest

emcbot commented 3 months ago

Automated Global-Workflow GDASApp Testing Results: Machine: hera

Start: Wed Apr 17 01:42:32 UTC 2024 on hfe06
---------------------------------------------------
Build:                                 *SUCCESS*
Build: Completed at Wed Apr 17 02:33:04 UTC 2024
---------------------------------------------------
Tests:                                  *Failed*
Tests: Failed at Wed Apr 17 02:49:26 UTC 2024
Tests: 91% tests passed, 5 tests failed out of 54
    1763 - test_gdasapp_soca_JGDAS_GLOBAL_OCEAN_ANALYSIS_VRFY (Failed)
    1776 - test_gdasapp_atm_jjob_var_run (Failed)
    1777 - test_gdasapp_atm_jjob_var_inc (Failed)
    1778 - test_gdasapp_atm_jjob_var_final (Failed)
Tests: see output at /scratch1/NCEPDEV/da/Cory.R.Martin/CI/GDASApp/workflow/PR/1045/global-workflow/sorc/gdas.cd/build/log.ctest
emcbot commented 3 months ago

Automated Global-Workflow GDASApp Testing Results: Machine: orion

Start: Tue Apr 16 21:03:53 CDT 2024 on Orion-login-1.HPC.MsState.Edu
---------------------------------------------------
Build:                                 *SUCCESS*
Build: Completed at Tue Apr 16 21:58:27 CDT 2024
---------------------------------------------------
Tests:                                  *Failed*
Tests: Failed at Tue Apr 16 22:30:14 CDT 2024
Tests: 91% tests passed, 5 tests failed out of 54
    1764 - test_gdasapp_soca_JGDAS_GLOBAL_OCEAN_ANALYSIS_VRFY (Failed)
    1777 - test_gdasapp_atm_jjob_var_run (Failed)
    1778 - test_gdasapp_atm_jjob_var_inc (Failed)
    1779 - test_gdasapp_atm_jjob_var_final (Failed)
Tests: see output at /work2/noaa/stmp/cmartin/CI/GDASApp/workflow/PR/1045/global-workflow/sorc/gdas.cd/build/log.ctest
apchoiCMD commented 3 months ago

The test_gdasapp_util_ghrsst2ioda job is failing too but isn't reported as failed. Can you update the references @apchoiCMD ? The vrfy task is failing with an issue reading the layer thicknesses, not sure why ...

This PR fixes the previous error, good enough to be merged. We'll comment out the verify test until one of us water people have time to look into it and fix the issue.

My bad but your PR #1050 already includes a modified test reference file https://github.com/NOAA-EMC/GDASApp/pull/1050/files#diff-68603b0771f7acb935fa0d121599ea74718ba8012d0e7dc7c1ee1a192150d93e Do you want me to update before merging your PR? Thanks @guillaumevernieres