Fix uninitialized min_rand variable in Thompson MP when using SPP

JeffBeck-NOAA commented 2 years ago

Description

This suite of PRs fixes the uninitialized min_rand variable in Thompson MP when using SPP and is explained in detail in the ufs-weather-model PR here.

Issue(s) addressed

Fixes NCAR/ccpp-physics Issue #891.

Testing

The RT suite on Hera passed for all normal and debug tests, except for the regional_spp_sppt_shum_skeb RT test, which will now have a new baseline result.

@jwolff-ncar, @willmayfield, @bluefinweiwei, @michelleharrold, @judithberner, @gsketefian

SamuelTrahanNOAA commented 2 years ago

@grantfirl

Kinda important to catch the minimum value of rand_pert. Approved.

It doesn't solve the problem correctly though, since it is only getting the minimum of the current block, not the whole domain. The results will vary depending on threading and blocking settings.

climbfuji commented 2 years ago

Excellent catch, Sam.

On Apr 4, 2022, at 2:12 PM, Samuel Trahan (NOAA contractor) @.***> wrote:

@grantfirl https://github.com/grantfirl Kinda important to catch the minimum value of rand_pert. Approved.

It doesn't solve the problem correctly though, since it is only getting the minimum of the current block, not the whole domain. The results will vary depending on threading and blocking settings.

— Reply to this email directly, view it on GitHub https://github.com/NCAR/ccpp-physics/pull/892#issuecomment-1087967552, or unsubscribe https://github.com/notifications/unsubscribe-auth/AB5C2RIEVMQY57BJ3JBL2S3VDNEL3ANCNFSM5SLIWTRQ. You are receiving this because your review was requested.

SamuelTrahanNOAA commented 2 years ago

The random fields are generally normalized somehow, right? Perhaps [-1,1] or [0,1]. If you know how it was normalized, then you can use that value (-1 or 0 in those examples) instead of computing them.

gthompsnWRF commented 2 years ago

The random fields are generally normalized somehow, right? Perhaps [-1,1] or [0,1]. If you know how it was normalized, then you can use that value (-1 or 0 in those examples) instead of computing them.

I was going to ask about thread safe, but I just don't know how CCPP code is structured as well as I do WRF where I had implemented this in a thread-safe way.

Yes, actually I believe the min value should always be set to -1.0, so in this case, I think it is actually best to set abs_min_rand as a universal constant of 1.0. I realize now how this could be a flaw in my original design because instead of always asking for the lowest possible negative number to add to the original number, I was doing the abs of lowest number at the time, so in the case of slow spin-up, the min_rand number is hardly ever as low as -1.0. So I now believe this can be changed to a simple constant of 1.0 (which would be abs(-1.0) basically.

JeffBeck-NOAA commented 2 years ago

All blocks are consolidated into a single 1D array in the stochastic physics wrapper (stochastic_physics_wrapper.F90) before spp_wts_mp is passed to module_mp_thompson.F90 as rand_pert:

do nb=1,Atm_block%nblks
      GFS_Data(nb)%Coupling%spp_wts_mp(:,:) = spp_wts(nb,1:GFS_Control%blksz(nb),:,n)
end do

SamuelTrahanNOAA commented 2 years ago

It will still depend on MPI decomposition (layout_x and layout_y) since different ranks will have different parts of the domain.

JeffBeck-NOAA commented 2 years ago

The random fields are generally normalized somehow, right? Perhaps [-1,1] or [0,1]. If you know how it was normalized, then you can use that value (-1 or 0 in those examples) instead of computing them.

The random fields aren't normalized, but are capped based on namelist settings (magnitude*cutoff), so they will vary from run to run.

JeffBeck-NOAA commented 2 years ago

The random fields are generally normalized somehow, right? Perhaps [-1,1] or [0,1]. If you know how it was normalized, then you can use that value (-1 or 0 in those examples) instead of computing them.

The ramdon fields aren't normalized, but are capped based on a namelist setting, so they will vary from run to run.

It will still depend on MPI decomposition (layout_x and layout_y) since different ranks will have different parts of the domain.

OK, that makes sense now. It looks like only a constant will work in this case.

SamuelTrahanNOAA commented 2 years ago

The only way I see of doing this is an MPI_Allreduce at the model level, and pass the result down to CCPP. It is likely the stochastic physics component already does that calculation, but it may not send the minimum to the model.

gthompsnWRF commented 2 years ago

Please waste no more energy on this and consider setting this as constant since I think it is the right thing to do. If Jeff has other thoughts, please speak out.

climbfuji commented 2 years ago

You can do an MPI exchange in the time_vary group (timestep_init phase in CCPP lingo).

On Apr 4, 2022, at 2:37 PM, Samuel Trahan (NOAA contractor) @.***> wrote:

The only way I see of doing this is an MPI_Allreduce at the model level, and pass the result down to CCPP. It is likely the stochastic physics component already does that calculation, but it may not send the minimum to the model.

— Reply to this email directly, view it on GitHub https://github.com/NCAR/ccpp-physics/pull/892#issuecomment-1087990197, or unsubscribe https://github.com/notifications/unsubscribe-auth/AB5C2RO3JSIFFU4O4WTFRP3VDNHIDANCNFSM5SLIWTRQ. You are receiving this because your review was requested.

JeffBeck-NOAA commented 2 years ago

Are the following variables (rand1 and rand2) non-reproducable as well? It seems like if rand_pert(:,1) won't work, rand_pert(i,1) won't work either?

if (rand_perturb_on .ne. 0) then
     if (MOD(rand_perturb_on,2) .ne. 0) rand1 = rand_pert(i,1)
     m = RSHIFT(ABS(rand_perturb_on),1)
     if (MOD(m,2) .ne. 0) rand2 = rand_pert(i,1)*2.
     m = RSHIFT(ABS(rand_perturb_on),2)
     if (MOD(m,2) .ne. 0) rand3 = 0.25*(rand_pert(i,1)+abs_min_rand)
     m = RSHIFT(ABS(rand_perturb_on),3)
endif

SamuelTrahanNOAA commented 2 years ago

Are the following variables (rand1 and rand2) non-reproducable as well?

No. Those are in an i,j loop, so rand1 is a temporary variable containing data at that i,j point. (Likewise with rand2 and rand3).

SamuelTrahanNOAA commented 2 years ago

The issue is that this gets the maximum over the thread block, rather than a local point:

abs_min_rand = ABS(MINVAL(rand_pert(:,1)))

The first dimension (with the :) is the horizontal dimension.

JeffBeck-NOAA commented 2 years ago

Are the following variables (rand1 and rand2) non-reproducable as well?

No. Those are in an i,j loop, so rand1 is a temporary variable containing data at that i,j point. (Likewise with rand2 and rand3).

What if I move the abs_min_rand calculation into the i,j loop? Would that work? Something like this:

if (rand_perturb_on .ne. 0) then
       if (MOD(rand_perturb_on,2) .ne. 0) rand1 = rand_pert(i,1)
       m = RSHIFT(ABS(rand_perturb_on),1)
       if (MOD(m,2) .ne. 0) rand2 = rand_pert(i,1)*2.
       m = RSHIFT(ABS(rand_perturb_on),2)
       if (MOD(m,2) .ne. 0) rand3 = 0.25*(rand_pert(i,1)+ABS(MINVAL(rand_pert(:,1))))
       m = RSHIFT(ABS(rand_perturb_on),3)
endif

SamuelTrahanNOAA commented 2 years ago

No. You're still computing the minval across the entire i dimension.

gthompsnWRF commented 2 years ago

We are still spending energy on this! Please consider my request!!!!

gthompsnWRF commented 2 years ago

And the BIT-shifting stuff should not care about anything at all. From how I view this, only one simple decision is needed. Can one line of code be changed to blah-blah +1.0?

SamuelTrahanNOAA commented 2 years ago

@JeffBeck-NOAA It is @gthompsnWRF 's suggestion you use -1 as he says in his earlier message I somehow missed:

Yes, actually I believe the min value should always be set to -1.0, so in this case, I think it is actually best to set abs_min_rand as a universal constant of 1.0. I realize now how this could be a flaw in my original design because instead of always asking for the lowest possible negative number to add to the original number, I was doing the abs of lowest number at the time, so in the case of slow spin-up, the min_rand number is hardly ever as low as -1.0. So I now believe this can be changed to a simple constant of 1.0 (which would be abs(-1.0) basically.

JeffBeck-NOAA commented 2 years ago

@gthompsnWRF, @SamuelTrahanNOAA, I was trying to keep the stochasticity from the original formulation in "rand3" to keep it from becoming just a multiple of "rand1", particularly since the random pattern is not always capped at the same value.

SamuelTrahanNOAA commented 2 years ago

Is the value ever beyond [-1,1] though? If it is, that'll ultimately kill this solution.

JeffBeck-NOAA commented 2 years ago

Is the value ever beyond [-1,1] though? If it is, that'll ultimately kill this solution.

Please see my previous comment: "The random fields aren't normalized, but are capped based on namelist settings (magnitude*cutoff), so they will vary from run to run." So, it could be larger than [-1,1].

SamuelTrahanNOAA commented 2 years ago

Can we pass the namelist cap down to CCPP?

gthompsnWRF commented 2 years ago

Is the value ever beyond [-1,1] though? If it is, that'll ultimately kill this solution.

It was my understanding that it should not be. This is probably where Judith Berner comes in. Maybe we really could have used a telecon for this matter. It's really difficult to do this over 1+ time frame when far less is better.

JeffBeck-NOAA commented 2 years ago

I will see if I can pull in the magnitude*cutoff value to use instead.

SamuelTrahanNOAA commented 2 years ago

@gthompsnWRF If you want to have a meeting, please contact your desired attendees by email, and we'll arrange something. I'd rather not discuss our schedules and locations on github. Joe Olson should be there too, since his PR is the one that sparked all this.

gthompsnWRF commented 2 years ago

Jeff: yes, this is the best possible solution! I forgot that the random field is not intended to be [-1,1]. This solution is the proper one for certain.

JeffBeck-NOAA commented 2 years ago

Jeff: yes, this is the best possible solution! I forgot that the random field is not intended to be [-1,1]. This solution is the proper one for certain.

Thanks, Greg! It doesn't look like it will be too difficult to bring these namelist entries into CCPP (two of the three are already available), so I should have it ready in a day or so. Thanks to @grantfirl for his help here.

JeffBeck-NOAA commented 2 years ago

@gthompsnWRF, @SamuelTrahanNOAA, please see the latest changes to this PR, which I believe will resolve your concerns. The code is now working (ran the RT test; baselines change) and calculates the maximum random perturbation value based on namelist settings brought in from stochastic_physics.

JeffBeck-NOAA commented 2 years ago

Thanks, @SamuelTrahanNOAA. I missed this file in my latest commit. It's now fixed.

NCAR / ccpp-physics