Open CatherineThomas-NOAA opened 1 month ago
@CatherineThomas-NOAA Turning it off as a default was intentional, but I am in the process of trying to re-enable it for one of the CI tests so it gets exercised.
Actually, the metplus scripts don't seem to be working correctly (they complete, but with silent failures over missing files). I'm not sure if it is just because of #2673 (though I'm pretty sure that is the immediate issue), or if there is some other issue.
CC: @malloryprow
Hmmm, yeah if the archive job isn't archiving the files the verification will have nothing to run with.
FWIW the metplus jobs were working well and producing good stats at 7d2c539f45194cd4e5b21bfd4b83a9480189cd0f from 3 weeks ago. See this page produced with those step 1 stats.
Oh that is good news! Was this before the archive rework?
@WalterKolczynski-NOAA I can look more into what is going on but I just need log files.
Actually, the metplus scripts don't seem to be working correctly (they complete, but with silent failures over missing files). I'm not sure if it is just because of #2673 (though I'm pretty sure that is the immediate issue), or if there is some other issue.
CC: @malloryprow
@WalterKolczynski-NOAA I think the issue you are experiencing is related to this PR #2673
Yes, this is an immediate problem. We have three experiments holding up due to missing files from online archive. We need to check out an earlier version of global-workflow. @CatherineThomas-NOAA and @WalterKolczynski-NOAA Do you have a suggestion on hash tag for us? It has to be a version before June 1, 2024. This is the date that the second part of the archive job refactoring merged into develop.
@malloryprow: There have been two recent archive refactor PRs: PR #2491 and PR #2621. The experiment I referenced with good stats includes the first PR but not the second. We suspect that it is the second PR that caused the problem with the online archive. @DavidHuber-NOAA is working on this.
@emilyhcliu For our 3DVar runs, we used https://github.com/NOAA-EMC/global-workflow/commit/7d2c539f45194cd4e5b21bfd4b83a9480189cd0f. While we had to make some changes for the S2S cases, ATM only should run out of the box. At the least, it does not have the same online archive issue that you are experiencing.
What is wrong?
Step 1 verification for METplus is current disabled by default. This had been disabled due to lack of spack-stack support (https://github.com/NOAA-EMC/global-workflow/issues/2091), but the problems were resolved when 1.6.0 went in (https://github.com/NOAA-EMC/global-workflow/pull/2239). The DO_METP variable was set to NO after 1.6.0 was merged (https://github.com/NOAA-EMC/global-workflow/pull/2374), but the PR seems unrelated and this may have been a mistake/bad conflict resolution.
@WalterKolczynski-NOAA @aerorahul Was this intentional? Is there any reason to keep METplus off at the moment?
What should have happened?
METplus should be on by default in config.base.
What machines are impacted?
All or N/A
Steps to reproduce
Additional information
I have run some half resolution experiments on Hera setting DO_METP=YES and successfully created a METplus webpage using the resulting step 1 data (grid2grid only).
Do you have a proposed solution?
If confirmed that the DO_METP setting reversal was unintentional, I can open a PR to change the default back to YES.