ESCOMP / PUMAS

Parameterization for Unified Microphysics Across Scales
9 stars 12 forks source link

Fix the broken GPU code after merging the ML scheme #56

Closed sjsprecious closed 11 months ago

sjsprecious commented 1 year ago

This PR fixes the broken GPU code after the ML code is introduced to PUMAS (https://github.com/ESCOMP/PUMAS/pull/53).

It also passes 2-D arrays rather than 1-D arrays as arguments for some subroutines for better performance.

I verified that this PR produced BFB results for all the warm rain schemes (kk2000,sb2001,tau and emulated) on Cheyenne's CPU using the intel compiler, compared to Cheryl's branch (https://github.com/ESCOMP/PUMAS/tree/pumas_machlearn, commit 08d4129).

As discussed with Kate, we would only focus on the verification of GPU code for the kk2000 and sb2001 schemes for now. However, the ensemble consistency test for these two schemes failed as long as I switched from intel compiler to nvhpc compiler. The test failed for both CPU and GPU runs (using nvhpc compiler), suggesting that there could be a compiler bug or a code bug other than the OpenACC directives.

On 09/11/2023, I added four new diagnostic variables (lamc_out, lamr_out, pgam_out, n0r_out) from Andrew that were required for generating new ML training dataset. No answer changes and just diagnostics.

sjsprecious commented 1 year ago

I could confirm that the ensemble consistency test was passed for the PUMAS GPU code with the nvhpc/22.2 compiler on Casper, when I used the cam6_3_097 tag with pumas_cam-release_v1.29. Thus the failure of ensemble consistency test here might not be caused by the PUMAS code itself.

sjsprecious commented 1 year ago

@cacraigucar @Katetc More information: I checked out the cam6_3_115 tag with pumas_v1.29, and confirmed that the ensemble consistency test (ECT) was passed for the GPU run with nvhpc/22.2 compiler on Casper. However, the ECT for the GPU run on Casper failed when I switched to the cam6_3_116 tag with pumas_v1.29.