GridTools / gt4py

Python library for generating high-performance implementations of stencil kernels for weather and climate modeling from a domain-specific language (DSL).
https://GridTools.github.io/gt4py
BSD 3-Clause "New" or "Revised" License
112 stars 49 forks source link

Vertical off-center writes convenience #131

Open rheacangeo opened 4 years ago

rheacangeo commented 4 years ago

Some column based atmospheric science algorithms are intuitive to scientists to write in such a way that adjacent vertical layers are modified in conjunction. For example, if you take mass from the layer you are at and add it to the one below you, it is intuitive to write:

with computation(BACKWARD), interval(1, None):
        if mass[0, 0, 0] > 0:
             delta = x * y
             mass[0, 0, 0]  = mass[0, 0, 0]  - delta
             mass[0, 0, -1] = mass[0, 0, -1] + delta 

Without off-center writes, we need to write it something like

with computation(BACKWARD):
       with interval(-1, None):
               delta = x * y
               if mass > 0:
                     mass  = mass  - delta
        with interval(1, -1):
               delta = x * y
               if mass[0, 0, 1] > 0:
                     mass = mass + delta[0, 0, 1]
               if mass > 0:
                     mass  = mass  - delta
        with interval(0, 1):
               if mass[0, 0, 1] > 0:
                     mass = mass + delta[0, 0, 1]

This is not terrible, and makes sense with this small example, but it quickly can get confusing with more complex patterns. The special handling at the top and bottom can be easy to forget to do. Here is a sample we ported:

Python loop version

for j in range(js, je ):
    for k in range(1, nz - 1):
        for i in range(is_, ie):
            if qv[i, j, k] < 0 and qv[i, j, k - 1] > 0.0:
                dq = min(
                    -qv[i, j, k] * dp[i, j, k], qv[i, j, k - 1] * dp[i, j, k - 1]
                )
                qv[i, j, k - 1] -= dq / dp[i, j, k - 1]
                qv[i, j, k] += dq / dp[i, j, k]
            if qv[i, j, k] < 0.0:
                qv[i, j, k + 1] += qv[i, j, k] * dp[i, j, k] / dp[i, j, k + 1]
                qv[i, j, k] = 0.0

Stencil version avoiding offcenter writes-introduce fields upper_fix and lower_fix:

@utils.stencil()
def fix_water_vapor_down(qv: Field, dp: Field, upper_fix: Field, lower_fix: Field):
    with computation(PARALLEL):
        with interval(1, 2):
            if qv[0, 0, -1] < 0:
                qv = qv + qv[0, 0, -1] * dp[0, 0, -1] / dp
        with interval(0, 1):
            qv = qv if qv >= 0 else 0
    with computation(FORWARD), interval(1, -1):
        dq = qv[0, 0, -1] * dp[0, 0, -1]
        if lower_fix[0, 0, -1] != 0:
            qv = qv + lower_fix[0, 0, -1] / dp
        if (qv < 0) and (qv[0, 0, -1] > 0):
            dq = dq if dq < -qv * dp else -qv * dp
            upper_fix = dq
            qv = qv + dq / dp
        if qv < 0:
            lower_fix = qv * dp
            qv = 0
    with computation(PARALLEL), interval(0, -2):
        if upper_fix[0, 0, 1] != 0:
            qv = qv - upper_fix[0, 0, 1] / dp
    with computation(PARALLEL), interval(-1, None):
        if lower_fix[0, 0, -1] > 0:
            qv = qv + lower_fix / dp
        upper_fix = qv

In the stencil version, it's hard to tell what's going on. The examples can get more complex. I think that unrestricted off-center writes might create a lot of problems, but this adjacent vertical adjustments pattern comes up a few times in the FV3 dycore and physics. I could see something like off-center only being allowed in the vertical direction, and enforce the computation cannot be PARALLEL if doing an off-center write being some restrictions that'd need to be made.

jdahm commented 2 years ago

We discussed this in a team meeting and decided that while this would be quite useful for certain vertical solve patterns, there are other open issues and feature requests that will have much higher impact for the work required (for example, directional offsets).

So, I suggest we do not include this in the effort for the v1.0 release, but also not close the issue.

oelbert commented 1 year ago

Something along these lines would be extremely helpful for the microphysics, which uses a double-k-loop to handle sedimentation melting, removing ice/snow/graupel from one k-level and adding it as rain to the appropriate lower levels while updating the heat capacities and temperatures along the way:

for k in range(ke, ks-1,-1):
    if v_terminal[k] >= 1.e-10:
        continue
    if qice[k] > constants.QCMIN:
        for m in range(k+1, ke+1):
            if zt[k+1] >= ze[m]:
                break
            if (zt[k] < ze[m+1]) and (temperature[m] > constants.TICE):
                cvm[k] = moist_heat_capacity(qvapor[k], qliquid[k], qrain[k], qice[k], qsnow[k], qgraupel[k])
                cvm[m] = moist_heat_capacity(qvapor[m], qliquid[m], qrain[m], qice[m], qsnow[m], qgraupel[m])
                dtime = min(timestep, (ze[m] - ze[m+1])/v_terminal[k])
                dtime = min(1., dtime/tau_mlt)
                sink = min(qice[k] * delp[k] / delp[m], dtime * (temperature[m] - constants.TICE) / icpk[m])
                qice[k] -= sink * delp[m] / delp[k]
                if zt[k] < zs:
                    r1 += sink*delp[m]
                else:
                    qrain[m] += sink
                temperature[k] = (temperature[k] * cvm[k] - li00 * sink * delp[m] / delp[k]) / moist_heat_capacity(qvapor[k], qliquid[k], qrain[k], qice[k], qsnow[k], qgraupel[k])
                temperature[m] = (temperature[m] * cvm[m]) / moist_heat_capacity(qvapor[m], qliquid[m], qrain[m], qice[m], qsnow[m], qgraupel[m])
            if qice[k] < QCMIN:
                break
gronerl commented 1 year ago

Should the last update be to temperature[m] ? Like this it looks like the second last update will be overwritten. Or is another index wrong?

oelbert commented 1 year ago

No, that's correct it should be temperature[m], fixed now

gronerl commented 1 year ago

I haven't found a more proper way to do this, but this thing from the darkest corners of my workaround brain might just work, if this is just an isolated instance that needs to be in gt4py at all cost:


from gt4py.cartesian.gtscript import Field, IJ

Nk = 80

fake_2d_field = Field[IJ, (np.float64, (Nk,))]
actual_2d_int_field = Field[IJ, np.int32]

k_index_field = gt4py.storage.full(shape=(Ni, Nj), fill_value=Nk,dtype=np.int32, dimensions=["I","J"], backend=...)
m_index_field = gt4py.storage.zeros(shape=(Ni, Nj), dtype=np.int32, dimensions=["I","J"], backend=...)

qice = gt4py.storage.from_array(data,shape=(Ni, Nj, Nk), dtype=np.int32, dimensions=["I","J", "0"], backend=...)

def nested_loop_computation(v_terminal: fake_2d_field,
                            qice: fake_2d_field,
                            zt: fake_2d_field,
                            ze: fake_2d_field,
                            k_index_field: actual_2d_int_field,
                            m_index_field: actual_2d_int_field,
                            # etc
                            ):
    with computation(PARALLEL), interval(0, 1): # order doesn't matter since it's only one layer
        while k_index_field > 0:
            m_index_field = k_index_field
            while m_index_field < Nk:
                # do the computation, where accesses to fields are like this:
                # qice[0,0][k_index_field] -= sink * delp[0,0][m_index_field] / delp[k_index_field]
                m_index_field += 1
            k_index_field -= 1

So the idea is to not use the vertical capability of stencils at all but rather emulate loops with whiles in data dimensions.

Could something like this work?

gronerl commented 1 year ago

If in the end, DaCe orchestration is the goal, another solution would be to write a NumPy code and directly use @dace.program which will integrate nicely into the DaCe orchestrated context.

havogt commented 1 year ago

@oelbert

oelbert commented 1 year ago

Currently this is just a Python function to be dace-optimized, though I do have a standalone test for it. I've been serializing data from Fortran runs and checking the coverage in Python. It works, but it does feel awkward to have the model be in GT4Py except for this one piece of code that's not. I think Linus' workaround could solve putting this into stencil, though.

As far as splitting the heat capacity and temperature updates from the sedimentation melting, that would be ideal except for the sink = min(qice[k] * delp[k] / delp[m], dtime * (temperature[m] - constants.TICE) / icpk[m]) line, where the amount of ice taken from level k depends in part on the temperature at level m, which updates through the loop.

r1 is a 2D field that tracks the rain that's precipitated to the ground through the sedimentation scheme.