earth-system-radiation / rte-rrtmgp

RTE+RRTMGP is a set of codes for computing radiative fluxes in planetary atmospheres.
BSD 3-Clause "New" or "Revised" License
74 stars 65 forks source link

Unable to run RTE-RRTMGP GPU code on NVIDIA A100 GPU #270

Closed sjsprecious closed 5 months ago

sjsprecious commented 5 months ago

I was unable to run the RTE-RRTMGP GPU code on NVIDIA A100 GPU with either nvhpc/24.1 or nvhpc/24.3 compiler.

The error message looks like:

./check_equivalence test_atmospheres.nc /glade/derecho/scratch/sunjian/rte-rrtmgp/rrtmgp-data/rrtmgp-gas-sw-g224.nc
 gas optics is for the shortwave
   pressure    limits (Pa):    1.005183574463000         109663.3158428461    
   temperature limits (K):    160.0000000000000         355.0000000000000    
   Intialized atmosphere twice
   Default calculation
   Vertical orientation invariance
   Changing TSI fails
   TSI invariance
   halving/doubling fails
   Incrementing with 1scl fails
   Incrementing with 2str fails
   Incrementing with nstr fails
   Incrementing
Warning: ieee_invalid is signaling
Warning: ieee_inexact is signaling
ERROR STOP 1

I am using the same compiler flags suggested by https://github.com/earth-system-radiation/rte-rrtmgp/blob/main/.github/workflows/containerized-ci.yml#L40-L42.

sjsprecious commented 5 months ago

In addition, I thought GitHub did not provide GPU resources and I wondered whether the GPU code was actually tested on a GPU by the CI workflow (OpenACC could run on either CPU or GPU). Correct me if I am wrong here.

RobertPincus commented 5 months ago

Closed with 1b75505, thanks @sjsprecious