E3SM-Project / E3SM

Energy Exascale Earth System Model source code. NOTE: use "maint" branches for your work. Head of master is not validated.
https://docs.e3sm.org/E3SM
Other
332 stars 334 forks source link

Updates the libraries for GNU-GPU on Frontier #6352

Closed bishtgautam closed 3 weeks ago

bishtgautam commented 4 weeks ago

[BFB]

github-actions[bot] commented 4 weeks ago

PR Preview Action v1.4.7 :---: :rocket: Deployed preview to https://E3SM-Project.github.io/E3SM/pr-preview/pr-6352/ on branch gh-pages at 2024-04-17 20:35 UTC

rljacob commented 4 weeks ago

is this BFB ?

bishtgautam commented 4 weeks ago

It should be BFB because the compiler hasn't been updated.

grnydawn commented 4 weeks ago

I will try a couple of cases to see if they show BFB results and then try to merge it. If you have any specific compset and resolution for the BFB test, please let me know.

bishtgautam commented 4 weeks ago

I don't have any compset/resolution recommendations. I didn't know we had any baselines on Frontier for --compiler gnugpu. Maybe SCREAM has tests that use the gnugpu, but I'm not sure if those tests are run from E3SM repo or the SCREAM repo.

grnydawn commented 3 weeks ago

@bishtgautam, you're right that there is no baseline on Frontier. Sorry for the confusion. I ran the e3sm_developer test suite with this PR. Except for input data download failures (due to my incorrect wget settings), I encountered three build errors similar to the following. However, I believe this issue should be handled separately from this PR, and I think I can merge it to next and master.

/autofs/nccs-svm1_home1/grnydawn/repos/github/E3SM/components/homme/src/share/compose/compose_slmm_islmpi.hpp:548:15: error: unknown type name 'omp_lock_t'; did you mean '_IO_lock_t'?
  ListOfLists<omp_lock_t, HDT> ri_lidi_locks;
              ^~~~~~~~~~
              _IO_lock_t

/lustre/orion/cli115/proj-shared/grnydawn/e3sm_scratch/ERP_Ld3.ne4pg2_oQU480.F2010.frontier_gnugpu.20240418_145407_gjzl94/bld/gnugpu/mpich/nodebug/threads/mct/include/impl/Kokkos_ViewMapping.hpp:2695:50: error: cannot form a reference to 'void'
  using return_type = typename Traits::value_type&;

/lustre/orion/cli115/proj-shared/grnydawn/e3sm_scratch/ERP_Ld3.ne4pg2_oQU480.F2010.frontier_gnugpu.20240418_145407_gjzl94/bld/gnugpu/mpich/nodebug/threads/mct/include/impl/Kokkos_ViewMapping.hpp:3358:27: error: invalid application of 'sizeof' to an incomplete type 'typename ViewTraits<void *, Device<Serial, HostSpace>>::value_type' (aka 'void')
  enum { MemorySpanSize = sizeof(typename Traits::value_type) };
                          ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~