JCSDA / spack-stack

Creative Commons Zero v1.0 Universal
21 stars 41 forks source link

CRTM-fix files do not match between Orion and Hercules #1165

Open DavidHuber-NOAA opened 2 days ago

DavidHuber-NOAA commented 2 days ago

Describe the bug A total of 445 fix files differ between Orion and Hercules under spack-stack v1.6.0. I have not looked at other machines to find out which is correct.

To Reproduce

> for file in /work/noaa/epic/role-epic/spack-stack/orion/spack-stack-1.6.0/envs/unified-env-rocky9/install/intel/2021.9.0/crtm-fix-2.4.0.1_emc-qls55kd/fix/*; do
>   f=$(basename $file)
>   cmp $file /work/noaa/epic/role-epic/spack-stack/hercules/spack-stack-1.6.0/envs/unified-env/install/intel/2021.9.0/crtm-fix-2.4.0.1_emc-2os2hw2/fix/$f
> done

Expected behavior The CRTM fix files would match on all systems, with those hosted on WCOSS2 under /apps/ops/prod/libs/intel/19.1.3.304/crtm/2.4.0.1/fix being the standard.

System: Orion and Hercules, possibly others

Additional context Found while testing the GSI https://github.com/NOAA-EMC/GSI/issues/754.

RussTreadon-NOAA commented 2 days ago

Perform the following test

Summary

  1. The common files in WCOSS2, Hera, and Orion are identical. Hera and Orion have 156 coefficient files not found on WCOSS2. WCOSS2 has one coefficient file not found on Hera or Orion.
  2. The Hera and Orion CRTM_FIX are identical with the exception of an extra file in the Hera fix.
  3. Hercules CRTM_FIX contains 445 files which differ from WCOSS2, Hera, and Orion. Hercules CRTM_FIX is the outlier.
AlexanderRichert-NOAA commented 2 days ago

@Hang-Lei-NOAA can you provide any insight into the WCOSS2 installation process for crtm 2.4.0.1?

Hang-Lei-NOAA commented 2 days ago

Thanks for Russ's comparation on these fix files. First of all, the management of fix files is a very difficult task for overall code manager Ben Johnson, since many agencies used the crtm and corresponding specific fix files. We have to know/operate appropriate fix files for EMC.

The previous crtm/2.4.0 fix files was prepared by Russ. Further added into the hpc-stack by me. Installed by Kyle and I on all noaa machines. We used the same code. So, it is trouble free.

But for crtm/2.4.0.1, we had several changes in fix files. The final changes were lead by Andrew Collard. Upon Andrew's testing, Ben released several times for emc. It was finally settled on the wcoss2 versions.

So, multiple release/changes could be the problem. EPIC installers used the spack-stack (which is rapidly changed in development). If they used a different version, or the installer did not update their spack-stack, the difference will occur. Other I can think of for fix files is that if the installer add a new version on the same location of an existing version. Some files may be different or extra. What I did on wcoss2 is to totally removed old installations by only using Andrew's final tarball of fix files.

Andrew and I have emailed EPIC last year to push the EPIC installing the new version. Besides many emails to Jong and Natalie, the info is also included in the ticket https://github.com/JCSDA/spack-stack/issues/901#issuecomment-1850501203

Very important for us is to make sure that EMC required fix files are there. Then consider unifying steps in installations.

climbfuji commented 2 days ago

I think most important (not only for us, but most users) is that the spack recipe for crtm-fix is delivering the correct set of files.