NOAA-EMC / global-workflow

Global Superstructure/Workflow supporting the Global Forecast System (GFS)
https://global-workflow.readthedocs.io/en/latest
GNU Lesser General Public License v3.0
70 stars 162 forks source link

Reenable METplus in config.base #2647

Open CatherineThomas-NOAA opened 1 month ago

CatherineThomas-NOAA commented 1 month ago

What is wrong?

Step 1 verification for METplus is current disabled by default. This had been disabled due to lack of spack-stack support (https://github.com/NOAA-EMC/global-workflow/issues/2091), but the problems were resolved when 1.6.0 went in (https://github.com/NOAA-EMC/global-workflow/pull/2239). The DO_METP variable was set to NO after 1.6.0 was merged (https://github.com/NOAA-EMC/global-workflow/pull/2374), but the PR seems unrelated and this may have been a mistake/bad conflict resolution.

@WalterKolczynski-NOAA @aerorahul Was this intentional? Is there any reason to keep METplus off at the moment?

What should have happened?

METplus should be on by default in config.base.

What machines are impacted?

All or N/A

Steps to reproduce

  1. Create experiment directory.
  2. Check DO_METP setting in config.base.

Additional information

I have run some half resolution experiments on Hera setting DO_METP=YES and successfully created a METplus webpage using the resulting step 1 data (grid2grid only).

Do you have a proposed solution?

If confirmed that the DO_METP setting reversal was unintentional, I can open a PR to change the default back to YES.

WalterKolczynski-NOAA commented 3 weeks ago

@CatherineThomas-NOAA Turning it off as a default was intentional, but I am in the process of trying to re-enable it for one of the CI tests so it gets exercised.

WalterKolczynski-NOAA commented 3 weeks ago

Actually, the metplus scripts don't seem to be working correctly (they complete, but with silent failures over missing files). I'm not sure if it is just because of #2673 (though I'm pretty sure that is the immediate issue), or if there is some other issue.

CC: @malloryprow

malloryprow commented 2 weeks ago

Hmmm, yeah if the archive job isn't archiving the files the verification will have nothing to run with.

CatherineThomas-NOAA commented 2 weeks ago

FWIW the metplus jobs were working well and producing good stats at 7d2c539f45194cd4e5b21bfd4b83a9480189cd0f from 3 weeks ago. See this page produced with those step 1 stats.

malloryprow commented 2 weeks ago

Oh that is good news! Was this before the archive rework?

@WalterKolczynski-NOAA I can look more into what is going on but I just need log files.

emilyhcliu commented 2 weeks ago

Actually, the metplus scripts don't seem to be working correctly (they complete, but with silent failures over missing files). I'm not sure if it is just because of #2673 (though I'm pretty sure that is the immediate issue), or if there is some other issue.

CC: @malloryprow

@WalterKolczynski-NOAA I think the issue you are experiencing is related to this PR #2673

Yes, this is an immediate problem. We have three experiments holding up due to missing files from online archive. We need to check out an earlier version of global-workflow. @CatherineThomas-NOAA and @WalterKolczynski-NOAA Do you have a suggestion on hash tag for us? It has to be a version before June 1, 2024. This is the date that the second part of the archive job refactoring merged into develop.

CatherineThomas-NOAA commented 2 weeks ago

@malloryprow: There have been two recent archive refactor PRs: PR #2491 and PR #2621. The experiment I referenced with good stats includes the first PR but not the second. We suspect that it is the second PR that caused the problem with the online archive. @DavidHuber-NOAA is working on this.

WalterKolczynski-NOAA commented 2 weeks ago

FWIW the metplus jobs were working well and producing good stats at 7d2c539 from 3 weeks ago. See this page produced with those step 1 stats.

Good to know. Hopefully once the archive issue is resolved, metp works again (sounds like it should).

CatherineThomas-NOAA commented 2 weeks ago

@emilyhcliu For our 3DVar runs, we used https://github.com/NOAA-EMC/global-workflow/commit/7d2c539f45194cd4e5b21bfd4b83a9480189cd0f. While we had to make some changes for the S2S cases, ATM only should run out of the box. At the least, it does not have the same online archive issue that you are experiencing.

emilyhcliu commented 2 weeks ago

@emilyhcliu For our 3DVar runs, we used 7d2c539. While we had to make some changes for the S2S cases, ATM only should run out of the box. At the least, it does not have the same online archive issue that you are experiencing.

Thanks @CatherineThomas-NOAA We will rebuild with 7d2c539.