NOAA-EMC / fv3atm

Other
32 stars 157 forks source link

sync with head of NOAA-EMC UPP develop #845

Closed SamuelTrahanNOAA closed 2 months ago

SamuelTrahanNOAA commented 5 months ago

Description

This updates the upp submodule to the head of NOAA-EMC UPP's develop branch.

Also, the spack.yaml has explicit UPP dependencies (g2, g2tmpl, etc.) and all libraries have exact version numbers. These version numbers match the top of ufs-weather-model's develop branch for hercules.

Issue(s) addressed

Testing

How were these changes tested?

Tested in the HAFS-AR variant of the HAFS workflow in a large stationary regional domain over the north-east Pacific. Values looked reasonable. Also ran UPP and ufs-weather-model regression tests.

What compilers / HPCs was it tested with?

Hera intel and Hera gnu.

Are the changes covered by regression tests? (If not, why? Do new tests need to be added?)

The ufs-weather-model regression tests already test the inline post. However, I added three new number concentration fields to the ufs-weather-model gnv1_nested regression test.

Have the ufs-weather-model regression test been run? On what platform?

The ufs-weather-model regression testing is in progress now. Baseline generation on Hera succeeded.

UPP's own regression tests passed. That tests the offline post (running UPP outside FV3).

Will the code updates change regression test baseline? If yes, why? Please show the baseline directory below.

Results of many tests should change due to updating the UPP hash.

Please commit the regression test log files in your ufs-weather-model branch

Will do.

Dependencies

None.

WenMeng-NOAA commented 5 months ago

@SamuelTrahanNOAA When you submit ufs-weather-model PR with upp submodule updating to the latest commit from UPP repository, all UPP control files post-xconfig*.txt (under ufs-weather-model/test/parm) need to be recreated for UFS RTs.

SamuelTrahanNOAA commented 5 months ago

When you submit ufs-weather-model PR with upp submodule updating to the latest commit from UPP repository, all UPP control files post-xconfig*.txt (under ufs-weather-model/test/parm) need to be recreated for UFS RTs.

I need your help with that. I'm uncertain of which txt file in tests/parm corresponds to which file in FV3/upp/parm. Also, I'm wondering if any need modification before copying to tests/parm.

SamuelTrahanNOAA commented 5 months ago

These three files from tests/parm don't exist in FV3/upp/parm:

This does exist, but it's the wrong one:

The tests/parm version has no synthetic satellite brightness temperature fields, while the FV3/upp/parm one does. It isn't the postxconfig-NT-hafs_nosat.txt either, since that one differs in other ways.

WenMeng-NOAA commented 5 months ago

These three files from tests/parm don't exist in FV3/upp/parm:

  • postxconfig-NT-fv3lam.txt
  • postxconfig-NT-gfs.txt
  • postxconfig-NT-gfs_FH00.txt

This does exist, but it's the wrong one:

  • postxconfig-NT-hafs.txt

The tests/parm version has no synthetic satellite brightness temperature fields, while the FV3/upp/parm one does. It isn't the postxconfig-NT-hafs_nosat.txt either, since that one differs in other ways.

These are simple versions of UPP control files for inline post testing for gfs, rffs and hafs. I will find corresponding xml files and provide you updated txt files later.

SamuelTrahanNOAA commented 5 months ago

@WenMeng-NOAA - As mentioned in this comment, I need help debugging the inline post.

234:  get_g2_fixedsurfacetypes key:          255  not found in table 4.5

You can find the output here:

LOG: /scratch1/NCEPDEV/stmp2/Samuel.Trahan/FV3_RT/rt_3037467/gnv1_nested_intel.log RUN: /scratch1/NCEPDEV/stmp2/Samuel.Trahan/FV3_RT/rt_3037467/gnv1_nested_intel

Many ranks give the same error many times, but it's always the same error.

Code is here:

https://github.com/ufs-community/ufs-weather-model/pull/2326

It only works on Hera because only Hera has g2tmpl 1.12.0 in the same Spack Stack as ufs-weather-model prerequisites.

SamuelTrahanNOAA commented 5 months ago

There are a multitude of warnings coming from a library when the latest UPP is used.

148: get_g2_fixedsurfacetypes key: 255 not found in table 4.5

@WenMeng-NOAA has tracked this down to the UPP pull request https://github.com/NOAA-EMC/UPP/pull/929. The warnings vanish if you revert the changes to the postxconfig files (or rather, to the post_avblflds.xml used to generate them). The warnings don't seem to have an effect on the output.

This warning probably originates from a bug in one of the grib2 libraries (g2 or g2tmpl) or an incompatibility between them. Library updates can take a long time.

Question to everyone: Should we wait for the warnings to go away before updating UPP in fv3atm? Or should we live with the warnings for now?

EDIT: We have a bug fix from the UPP side that eliminates these error messages. It is in this PR.

SamuelTrahanNOAA commented 5 months ago

Good news, everyone!

We have a fix for the "255 not found in table 4.5" messages!

That means this PR can continue review without thousands of error messages from g2tmpl.

SamuelTrahanNOAA commented 5 months ago

I'd like @DusanJovic-NOAA @WenMeng-NOAA and @EricJames-NOAA to look at this PR.

zhanglikate commented 3 months ago

I would suggest the following way to include the 3D rhomid

SamuelTrahanNOAA commented 3 months ago

@zhanglikate - Your message was garbled, so I don't know what you want.

Please do one of these:

  1. Use the github "suggested change" feature, or
  2. Open a PR to this branch with your changes.
zhanglikate commented 3 months ago

Hi Sam,

   Where is the "suggested change” feature?  I added comments in the place as “suggested change”. Thanks.

On Jul 23, 2024, at 5:33 PM, Samuel Trahan (NOAA contractor) @.***> wrote:

@zhanglikate https://github.com/zhanglikate - Your message was garbled, so I don't know what you want.

Please do one of these:

Use the github "suggested change" feature, or Open a PR to this branch with your changes. — Reply to this email directly, view it on GitHub https://github.com/NOAA-EMC/fv3atm/pull/845#issuecomment-2246502611, or unsubscribe https://github.com/notifications/unsubscribe-auth/APJPDRBOUWHXP7NA67PYIZDZN3R23AVCNFSM6AAAAABJG45DOSVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDENBWGUYDENRRGE. You are receiving this because you were mentioned.

WenMeng-NOAA commented 3 months ago

@SamuelTrahanNOAA Could you update the upp harsh to the latest commit @81b38a8 from the UPP repository?

SamuelTrahanNOAA commented 3 months ago

Yes. It matches that hash now.

SamuelTrahanNOAA commented 3 months ago

Could I get some reviews by people "in the know?" Maybe these individuals?

WenMeng-NOAA commented 3 months ago

The upp submodule update and inline post interface "post_fv3.F90" look good to me.

DusanJovic-NOAA commented 3 months ago

Github workflow failed:

2024-08-14T18:45:15.8799819Z /home/runner/work/fv3atm/fv3atm/fv3atm/upp/sorc/ncep_post.fd/grib2_module.f:499:53:
2024-08-14T18:45:15.8800893Z 
2024-08-14T18:45:15.8801377Z   499 |                            g2sec4_temp0,g2sec4_temp8,g2sec4_temp9,g2sec4_temp44,         &
2024-08-14T18:45:15.8802380Z       |                                                     1
2024-08-14T18:45:15.8806022Z Error: Symbol ‘g2sec4_temp9’ referenced at (1) not found in module ‘grib2_all_tables_module’
2024-08-14T18:45:15.8807542Z /home/runner/work/fv3atm/fv3atm/fv3atm/upp/sorc/ncep_post.fd/grib2_module.f:499:80:
2024-08-14T18:45:15.8808321Z 
2024-08-14T18:45:15.8808820Z   499 |                            g2sec4_temp0,g2sec4_temp8,g2sec4_temp9,g2sec4_temp44,         &
2024-08-14T18:45:15.8809907Z       |                                                                                1
2024-08-14T18:45:15.8811126Z Error: Symbol ‘g2sec4_temp46’ referenced at (1) not found in module ‘grib2_all_tables_module’
2024-08-14T18:45:15.8812522Z /home/runner/work/fv3atm/fv3atm/fv3atm/upp/sorc/ncep_post.fd/grib2_module.f:501:79:
2024-08-14T18:45:15.8813317Z 
2024-08-14T18:45:15.8813853Z   501 |                            g2sec5_temp3,g2sec5_temp40,get_g2_sec5packingmethod,          &
2024-08-14T18:45:15.8814838Z       |                                                                               1
2024-08-14T18:45:15.8816070Z Error: Symbol ‘g2sec4_temp49’ referenced at (1) not found in module ‘grib2_all_tables_module’
2024-08-14T18:45:15.8986194Z make[2]: *** [upp/sorc/ncep_post.fd/CMakeFiles/upp.dir/build.make:868: upp/sorc/ncep_post.fd/CMakeFiles/upp.dir/grib2_module.f.o] Error 1
2024-08-14T18:45:15.8989127Z make[1]: *** [CMakeFiles/Makefile2:1278: upp/sorc/ncep_post.fd/CMakeFiles/upp.dir/all] Error 2
2024-08-14T18:45:15.8990581Z make[1]: *** Waiting for unfinished jobs....
SamuelTrahanNOAA commented 3 months ago

That looks a lot like an out-of-date library. Maybe @RatkoVasic-NOAA can comment?

SamuelTrahanNOAA commented 3 months ago

The github workflow gets its UPP dependencies for spack from this line in ci/spack.yaml:

- upp@develop

The upp@develop refers to whatever is in spack develop.

In other words, FV3 is getting its UPP dependencies from the spack repository instead of the UPP repository.

WenMeng-NOAA commented 3 months ago

Github workflow failed:

2024-08-14T18:45:15.8799819Z /home/runner/work/fv3atm/fv3atm/fv3atm/upp/sorc/ncep_post.fd/grib2_module.f:499:53:
2024-08-14T18:45:15.8800893Z 
2024-08-14T18:45:15.8801377Z   499 |                            g2sec4_temp0,g2sec4_temp8,g2sec4_temp9,g2sec4_temp44,         &
2024-08-14T18:45:15.8802380Z       |                                                     1
2024-08-14T18:45:15.8806022Z Error: Symbol ‘g2sec4_temp9’ referenced at (1) not found in module ‘grib2_all_tables_module’
2024-08-14T18:45:15.8807542Z /home/runner/work/fv3atm/fv3atm/fv3atm/upp/sorc/ncep_post.fd/grib2_module.f:499:80:
2024-08-14T18:45:15.8808321Z 
2024-08-14T18:45:15.8808820Z   499 |                            g2sec4_temp0,g2sec4_temp8,g2sec4_temp9,g2sec4_temp44,         &
2024-08-14T18:45:15.8809907Z       |                                                                                1
2024-08-14T18:45:15.8811126Z Error: Symbol ‘g2sec4_temp46’ referenced at (1) not found in module ‘grib2_all_tables_module’
2024-08-14T18:45:15.8812522Z /home/runner/work/fv3atm/fv3atm/fv3atm/upp/sorc/ncep_post.fd/grib2_module.f:501:79:
2024-08-14T18:45:15.8813317Z 
2024-08-14T18:45:15.8813853Z   501 |                            g2sec5_temp3,g2sec5_temp40,get_g2_sec5packingmethod,          &
2024-08-14T18:45:15.8814838Z       |                                                                               1
2024-08-14T18:45:15.8816070Z Error: Symbol ‘g2sec4_temp49’ referenced at (1) not found in module ‘grib2_all_tables_module’
2024-08-14T18:45:15.8986194Z make[2]: *** [upp/sorc/ncep_post.fd/CMakeFiles/upp.dir/build.make:868: upp/sorc/ncep_post.fd/CMakeFiles/upp.dir/grib2_module.f.o] Error 1
2024-08-14T18:45:15.8989127Z make[1]: *** [CMakeFiles/Makefile2:1278: upp/sorc/ncep_post.fd/CMakeFiles/upp.dir/all] Error 2
2024-08-14T18:45:15.8990581Z make[1]: *** Waiting for unfinished jobs....

@DusanJovic-NOAA Please check if g2/3.51 and g2tmpl/1.13.0 are used in CI.

SamuelTrahanNOAA commented 3 months ago

It's using a cached version of spack, not the latest build.

There should be an update to .github/workflows/GCC.yml to increment the cache version.

DusanJovic-NOAA commented 3 months ago

It's using a cached version of spack, not the latest build.

There should be an update to .github/workflows/GCC.yml to increment the cache version.

Actually I think we should ci/spack.yaml, because the cache key is defined based on hash of spack.yaml. So in order to detect potential changes in any of the libraries we should explicitly list every single library and its version , and not use 'upp@develop' which basically 'hides' the changes in upp dependencies. upp@develop can mean different things at different time. Basically all dependencies must be versioned. And instead of just saying upp, I prefer to have a explicit list of libraries with explicit versions.

Try to change spack.yaml in your branch and see if it triggers spack cache rebuild.

SamuelTrahanNOAA commented 3 months ago

I've updated spack.yaml, and it is rebuilding. There were library version conflicts, so I chose the latest version of the two.

What worries me is this one:

ip@develop precision=4,d,8

Using the @develop version means the version may change as the branch is retested. Wouldn't we be better off using a specific numbered version? If so, which one?

DusanJovic-NOAA commented 3 months ago

I've updated spack.yaml, and it is rebuilding. There were library version conflicts, so I chose the latest version of the two.

What worries me is this one:

ip@develop precision=4,d,8

Using the @develop version means the version may change as the branch is retested. Wouldn't we be better off using a specific numbered version? If so, which one?

Yes. All libraries should be versioned, develop is ambiguous. I would use versions from your ufs-weather-model branch ufs_common.lua from modulefiles directory, with updated versions of g2 and g2tmpl. Also, fv3atm does not need sigio, sfcio, nemsio and wrf-io. Those are required by the upp executable, which we are not building here.

SamuelTrahanNOAA commented 3 months ago

Yes. All libraries should be versioned, develop is ambiguous. I would use versions from your ufs-weather-model branch ufs_common.lua from modulefiles directory, with updated versions of g2 and g2tmpl. Also, fv3atm does not need sigio, sfcio, nemsio and wrf-io. Those are required by the upp executable, which we are not building here.

@DusanJovic-NOAA - Could you check the spack.yaml? It should match your specifications now.

DusanJovic-NOAA commented 3 months ago

Yes. All libraries should be versioned, develop is ambiguous. I would use versions from your ufs-weather-model branch ufs_common.lua from modulefiles directory, with updated versions of g2 and g2tmpl. Also, fv3atm does not need sigio, sfcio, nemsio and wrf-io. Those are required by the upp executable, which we are not building here.

@DusanJovic-NOAA - Could you check the spack.yaml? It should match your specifications now.

The spack.yaml looks good. And I see workflow finished successfully. Thank you for fixing this.

SamuelTrahanNOAA commented 3 months ago

I made an issue for the spack.yaml troubles:

It'll be automatically closed when this PR is merged.

zach1221 commented 2 months ago

@jkbk2004 testing is finished on WM PR 2326. Feel free to merge this PR.