earth-system-radiation / rte-rrtmgp

RTE+RRTMGP is a set of codes for computing radiative fluxes in planetary atmospheres.
BSD 3-Clause "New" or "Revised" License
74 stars 67 forks source link

Bug in rrtmgp/kernels/mo_gas_optics_kernels.F90 #214

Closed m214089 closed 9 months ago

m214089 commented 1 year ago

Hi, we found a bug in line 137/138 of rrtmgp/kernels/mo_gas_optics_kernels.F90. The code is:

            col_mix(itemp,icol,ilay,iflav) = col_gas(icol,ilay,igases(1)) + ratio_eta_half * col_gas(icol,ilay,igases(2))
            eta = merge(col_gas(icol,ilay,igases(1)) / col_mix(itemp,icol,ilay,iflav), 0.5_wp, &
                        col_mix(itemp,icol,ilay,iflav) > 2._wp * tiny(col_mix))

following the Fortran standard is merge following the same rules as conditional expressions like if and a compiler is allowed to evaluate the potential results in parallel to the evaluation of the logical expression. This leads to a potential divide by 0 in the second line given above. To solve this the two lines need to be reformulated so that this cannot happen anymore. We could do this, but first want to know, if you have a better solution in mind how to fix this. Cheerio, Luis

m214089 commented 1 year ago

Maybe this would work instead of the merge construct:

if (col_mix(itemp,icol,ilay,iflav) > 2._wp * tiny(col_mix)) then eta = col_gas(icol,ilay,igases(1)) / col_mix(itemp,icol,ilay,iflav) else eta = 0.5_wp endif

RobertPincus commented 1 year ago

@m214089 Please feel free to open a pull request against develop. I'm curious to know the circumstances in which this has triggered an actual error, and if these circumstances are or are not covered in our test cases.

tlunttil commented 1 year ago

This was found with Cray Fortran 15.0.1. It seems to generate code where the division is always done and then a masked move is used to pick the correct result according to the condition in merge. GNU Fortran (11.2), on the other hand, generates a compare, branch, and only then a division (or not), so there's no floating point exception with the same data that causes one with Cray Fortran.

Trying a simple piece of code on Compiler Explorer shows that ifort produces similar code as Cray Fortran. I'm wondering if this is also the root cause behind issue #159.

m214089 commented 1 year ago

I'm pretty sure it is exactly this. The Fortran standard warns that one should not rely on on the sequence of operations inside an if (...) then bracket (lousy English). I forgot to create a merge request sorry, about this.

RobertPincus commented 9 months ago

Closed with 3ac0636