NCAR / DART

Data Assimilation Research Testbed
https://dart.ucar.edu/
Apache License 2.0
196 stars 145 forks source link

bug: inflation creating out-of-bounds values qceff #749

Open hkershaw-brown opened 1 month ago

hkershaw-brown commented 1 month ago

:bug: Your bug may already be reported! Please search on the issue tracker before creating a new issue.

Describe the bug

  1. List the steps someone needs to take to reproduce the bug.
    Run CAM-FV reanalysis single run of filter example /glade/derecho/scratch/hkershaw/DART/CAM-out-of-bounds/Rean_run inf_flavor 2

  2. What was the expected outcome? Run to completion

  3. What actually happened?

filter errors out:

ERROR FROM:
  source : bnrh_distribution_mod.f90
  routine: bnrh_cdf
  message:  Smallest ensemble member less than lower bound -4.009123520117420E-164  0.000000000000000E+000

Error Message

Please provide any error messages.

Which model(s) are you working with?

CAM-FV

Screenshots

Here are the lines of code: https://github.com/NCAR/DART/blob/464aa57a1261fa098e5f0f999bad41a464f8d7db/assimilation_code/modules/assimilation/adaptive_inflate_mod.f90#L559-L560

Version of DART

Which version of DART are you using? v11.8.1

Have you modified the DART code?

No

Here is a small reproducer

program test_maths

! Using selected_real_kind for double precision
integer, parameter :: dp = selected_real_kind(15, 307)
real(kind=dp) :: ens
real(kind=dp) :: mean
real(kind=dp) :: inflate  
real(kind=dp) :: sd_inflate

ens = 3.2342393548165711e-191_dp
inflate = 1.4082289753147326_dp
mean = 2.1474965713154784e-163_dp

print *, ens    

sd_inflate = sqrt(inflate) 
ens = ens * sd_inflate + mean * (1.0_dp - sd_inflate)

print *, ens    

end program test_maths  
hkershaw@derecho3:/glade/derecho/scratch/hkershaw/DART/Bugs/qceff-maths$  ifort fortran_maths.f90 
hkershaw@derecho3:/glade/derecho/scratch/hkershaw/DART/Bugs/qceff-maths$ ./a.out 
  3.234239354816571E-191
 -4.009123520117420E-164

Or you can run CAM-FV filter /glade/derecho/scratch/hkershaw/DART/CAM-out-of-bounds/Rean_run

The numbers for the test_maths.f90 program are from proc3 (out of 128) j==61138

Build information

Please describe:

  1. Derecho
  2. intel

also mac gfortran GNU Fortran (MacPorts gcc13 13.2.0_4+stdlib_flag) 13.2.0

[hkershaw:problem]() > gfortran fortran_maths.f90
[hkershaw:problem]() > ./a.out 
   3.2342393548165711E-191
  -4.0091235201174199E-164
hkershaw-brown commented 1 month ago

there are some early returns in probit_transform_mod

https://github.com/NCAR/DART/blob/464aa57a1261fa098e5f0f999bad41a464f8d7db/assimilation_code/modules/assimilation/probit_transform_mod.f90#L350-L353

hkershaw-brown commented 1 month ago

I think this may be a reproducer for at least part of #681 and also why #709

hkershaw-brown commented 1 month ago

whole ensemble:

before line 559

"i","value" 1,0 2,0 3,0 4,0 5,0 6,0 7,0 8,0 9,0 10,0 11,0 12,0 13,0 14,0 15,0 16,0 17,0 18,0 19,0 20,0 21,0 22,0 23,0 24,0 25,0 26,0 27,0 28,0 29,0 30,0 31,0 32,0 33,0 34,0 35,0 36,0 37,0 38,0 39,0 40,0 41,0 42,0 43,0 44,0 45,0 46,0 47,3.2342393548165711e-191 48,0 49,0 50,0 51,0 52,0 53,0 54,0 55,0 56,0 57,0 58,0 59,0 60,0 61,0 62,0 63,0 64,0 65,1.5319740054648927e-206 66,1.3760414660445314e-201 67,0 68,0 69,0 70,1.0275904858792872e-161 71,0 72,0 73,0 74,0 75,0 76,0 77,0 78,0 79,6.9040677117309552e-162 80,0

after inflation

"i","value" 1,-4.0091235201174199e-164 2,-4.0091235201174199e-164 3,-4.0091235201174199e-164 4,-4.0091235201174199e-164 5,-4.0091235201174199e-164 6,-4.0091235201174199e-164 7,-4.0091235201174199e-164 8,-4.0091235201174199e-164 9,-4.0091235201174199e-164 10,-4.0091235201174199e-164 11,-4.0091235201174199e-164 12,-4.0091235201174199e-164 13,-4.0091235201174199e-164 14,-4.0091235201174199e-164 15,-4.0091235201174199e-164 16,-4.0091235201174199e-164 17,-4.0091235201174199e-164 18,-4.0091235201174199e-164 19,-4.0091235201174199e-164 20,-4.0091235201174199e-164 21,-4.0091235201174199e-164 22,-4.0091235201174199e-164 23,-4.0091235201174199e-164 24,-4.0091235201174199e-164 25,-4.0091235201174199e-164 26,-4.0091235201174199e-164 27,-4.0091235201174199e-164 28,-4.0091235201174199e-164 29,-4.0091235201174199e-164 30,-4.0091235201174199e-164 31,-4.0091235201174199e-164 32,-4.0091235201174199e-164 33,-4.0091235201174199e-164 34,-4.0091235201174199e-164 35,-4.0091235201174199e-164 36,-4.0091235201174199e-164 37,-4.0091235201174199e-164 38,-4.0091235201174199e-164 39,-4.0091235201174199e-164 40,-4.0091235201174199e-164 41,-4.0091235201174199e-164 42,-4.0091235201174199e-164 43,-4.0091235201174199e-164 44,-4.0091235201174199e-164 45,-4.0091235201174199e-164 46,-4.0091235201174199e-164 47,-4.0091235201174199e-164 48,-4.0091235201174199e-164 49,-4.0091235201174199e-164 50,-4.0091235201174199e-164 51,-4.0091235201174199e-164 52,-4.0091235201174199e-164 53,-4.0091235201174199e-164 54,-4.0091235201174199e-164 55,-4.0091235201174199e-164 56,-4.0091235201174199e-164 57,-4.0091235201174199e-164 58,-4.0091235201174199e-164 59,-4.0091235201174199e-164 60,-4.0091235201174199e-164 61,-4.0091235201174199e-164 62,-4.0091235201174199e-164 63,-4.0091235201174199e-164 64,-4.0091235201174199e-164 65,-4.0091235201174199e-164 66,-4.0091235201174199e-164 67,-4.0091235201174199e-164 68,-4.0091235201174199e-164 69,-4.0091235201174199e-164 70,1.215420420032912e-161 71,-4.0091235201174199e-164 72,-4.0091235201174199e-164 73,-4.0091235201174199e-164 74,-4.0091235201174199e-164 75,-4.0091235201174199e-164 76,-4.0091235201174199e-164 77,-4.0091235201174199e-164 78,-4.0091235201174199e-164 79,8.1528847158862943e-162 80,-4.0091235201174199e-164

hkershaw-brown commented 1 month ago

from_probit_bounded_normal_rh state_ens gets set to probit_ens

Screenshot 2024-10-03 at 2 16 19 PM Screenshot 2024-10-03 at 2 16 33 PM

not sure why the more_params(7) tail_sd_eft is negative? get_bnrh_sd(p) https://github.com/NCAR/DART/blob/464aa57a1261fa098e5f0f999bad41a464f8d7db/assimilation_code/modules/assimilation/bnrh_distribution_mod.f90#L91-L100

hkershaw-brown commented 1 month ago

The sd == 0 so you never transform into (or back out of) probit space.

Screenshot 2024-10-03 at 3 29 19 PM

Inflation pushes the I think in this case ens = (ens - mean) * sqrt(inflate) + mean if your ens(i) == mean then you can not inflate, but this is only going to work for multiplicative inflation. If you're adding noise or whatever various inflation methods do, I think you will have to enforce the bounds. Is this ok? clamping rather than qceff enforced bounds.

 555       ! Spread the ensemble out linearly for deterministic
 556       sd_inflate = sqrt(inflate)
 557       ! HK if ens == mean do nothing
 558       where ((ens - mean) > epsilon(mean) )
 559          ! Following line can lead to inflation of 1.0 changing ens on some compilers
 560          !!! ens = (ens - mean) * sqrt(inflate) + mean
 561          ! Following gives 1.0 inflation having no impact on known compilers
 562          ens = ens * sd_inflate + mean * (1.0_r8 - sd_inflate)
 563       endwhere
hkershaw-brown commented 1 week ago

note there is issue #748 which is a different inflation QCEFF bug which looks like it may create out-of-bounds values also.