Closed gdicker1 closed 1 month ago
NOTE: the changes in this PR do slightly change answers due to the calculations of exner
and exner_base
. If the loop to calculate them (lines 5974 to 5982 in this PR) is instead done on the CPU and appropriate update directives are added, there is no difference in the answers.
I suspect the answer difference is likely due to a difference in how the CPU and GPU handle the calculation of a float raised to a float.
@gdicker1 I agree that the bitwise differences seem to be due to the exner
calculation. If I add this block of code directly below the exner calculation on the device, I get bitwise identical results before and after merging this PR locally:
!$acc update host(rtheta_p, rtheta_base)
do iCell=cellStart,cellEnd
do k=1,nVertLevels
exner(k,iCell) = (zz(k,iCell) * (rgas/p0) * (rtheta_p(k,iCell) + rtheta_base(k,iCell)))**rcv
exner_base(k,iCell) = (zz(k,iCell) * (rgas/p0) * (rtheta_base(k,iCell)))**rcv
end do
end do
!$acc update device(exner, exner_base)
Fixup commit 2f95243 addresses this comment I made.
This is ready to merge!
This PR enables GPU calculation of certain diagnostic fields during the initialization phase by adding OpenACC directives to the
atm_init_coupled_diagnostics
routine.Timing information for the OpenACC data transfers in this routine is captured in the log file by a new timer:
atm_init_coupled_diagnostics [ACC_data_xfer]
.Since this routine is called during the initialization phase, before
mpas_atm_dynamics_init
, no modifications were made to thempas_atm_dynamics_{init,finalize}
routines to copy in or delete the invariant fields used in this routine.