CliMA / ClimaLand.jl

Clima's Land Model
Apache License 2.0
38 stars 10 forks source link

O2.3.11 Add land/sea mask to ClimaCore domain #488

Open kmdeck opened 9 months ago

kmdeck commented 9 months ago

Purpose

For global runs, we currently set up a spherical shell domain (extruded FD) and create fields that are defined everywhere on the domain. The issues arise in three areas:

A related issue is that ultimately we would like to support lateral flow in the soil. This requires boundary conditions in the horizontal.

The benefits to what we are doing now is that (I think) regridding in the coupler is easy because land and atmos have the exact same space at the surface

Priorities

Proposed Work Summer 2024

ClimaLand:

ClimaCore:

Proposed Work 2025

Notes from a while ago - need to revisit When we have lateral flow we want to make use of the current topology/connectivity of points and how this is set up in ClimaCore. This is because it is easier to use a mask in boundary condition setting in the horizontal than to create a totally unstructured mesh.

I checked with Andre and he confirmed that this is what the ocean model does. there are prognostic variables over land but the tendencies are never evaluated. Only indices which are in the mask = 1 region are loaded onto the GPU.

on the ClimaLand side: This solution means that we need to use the mask on the land side because we will still have fields defined over the ocean.

We will need to:

  1. ingest the land/sea mask and store somewhere where it is accessible to the tendency
  2. initial state vector = 0 everywhere or we can set to NaN everywhere (?).
  3. Set initial conditions using the mask (NaN over ocean)
  4. Parameter data should be masked already (NaN over the ocean). When regridding, we'll get back Nans in the ocean for the field. Interpolating to model grid needs to be mask aware
  5. Write a macro which can be used in any broadcasted function when mask = 1, like:
    
    #=
    This assumes that `mask` is in local scope
    =#
    macro masked(expr)
    :(@. ifelse(mask == 0, 0, expr))
    end

function tendency_func!(Y, p, t) (; mask) = p (; x, z) = p # fields (; ϕ, ψ) = Y # fields

@. ϕ = some_tendency(ψ, x, z) # current

@. ϕ = ifelse(mask == 0, 0, some_tendency(ψ, x, z)) # what we want

@masked ϕ = some_tendency(ψ, x, z) # cleaner way to implement

end

6. Repeat for all broadcasted expressions in the tendency, cache, etc.
7. use the mask in plotting (maybe not if NaN in ocean)

- is memory an issue since we are stored fields everywhere, when land is only 30% of the Earth's surface? We should be computed limited on GPU and not memory limited (this is true for the ocean model).

@sriharshakandala @juliasloan25 @Sbozzolo 
```[tasklist]
### Tasks
- [ ] Use ETOPO 2022 global relief model data to generate element-level `land-sea` mask
- [ ] Add `element mask` to ClimaCore spaces
- [ ] Block FD/SE/tendency operations for masked elements in ClimaCore
kmdeck commented 9 months ago

@charleskawczynski no immediate rush to answer, but this is something we'd like to figure out in the next month or two

kmdeck commented 8 months ago

@juliasloan25 has found that this also affects any run (including coupled runs) using the Bucket model, because the land albedo is not defined in the Artic.

charleskawczynski commented 8 months ago

My first thought was to bite the bullet and develop a special ClimaCore MaskedField, but the more that I think of it, the more I’m thinking that it’s easier to handle tendency case-by-case. For example:

@. field_tendency = ifelse(mask==0, field_tendency, foo(Y.c))

We might be able to write a macro to write this more compactly, but I think this pretty effectively reuses our existing infrastructure. Also, this should still work with foo(Y.c)) as a field or a function/expression. So, interpolations etc should still work. Using broadcast expressions is kind of important in terms of letting ClimaCore handle metric terms properly, so I think it’s important to not overlook this key feature.

I’m happy to chat more about this if I’ve missed something or if you think there may be more to the story.

kmdeck commented 8 months ago

Hi Charlie,

thanks for thinking about this! My concern with masking the tendency is that then we also have to mask the initial conditions or set IC over the ocean (ditto with all of our spatial parameter fields), and possibly use the mask in plotting (Im not sure the value of zero for parameters and the state will produce non-NaN tendencies, so we may not be able to set ocean-area values to zero and have it just work). it ends up getting kind of hacky.

we can discuss more! I can also show you our code and we can look at how complex it would be to implement the above^^

Our tendency functions have field vector arguments, so it didnt seem like they would broadcast as we want.

charleskawczynski commented 8 months ago

we can discuss more! I can also show you our code and we can look at how complex it would be to implement the above^^

I think that would be really helpful

kmdeck commented 8 months ago

@simonbyrne, Tapio also suggested asking your opinion on this. He had the impression from you that restricting the grid to only include continents, while maintaining information about connectivity (to support tendencies in the horizontal, and, eventually, non-periodic boundary conditions), was not that difficult.

If that is the case, it seems like the load-balancing problem would not arise, nor would the land model need to worry about about only evaluating tendencies over land, because our fields would not be defined over the ocean.

thank you!

simone-silvestri commented 8 months ago

Hi @kmdeck.

In Oceananigans we have a structured mesh, so we maintain the structure but launch an unstructured kernel, i.e. we (1) map out the active cells in a linear map with map[t] = (i, j, k), (2) launch a kernel with a number of threads equal to the total number of active cells and finally (3) map the unique thread index to the three-dimensional grid index inside the function.

In distributed computations, to load balance it is enough to calculate the total number of active cells in the global grid and divide them by processor. We do keep the memory associated with immersed cells though. In your case, it is probably better to go fully unstructured since you are using climacore that gives you this ability

kmdeck commented 8 months ago

@dennisYatunin, if we had NaNs as our field values over the ocean, and then tried to step Y.soil.variable implicitly, would we always use the max number of iterations over the ocean, since NaN < tolerance is always false?

dennisYatunin commented 8 months ago

Depends on how you have things set up. If you're using the convergence checker specified here, then yes. You can also pass a custom norm to the convergence checker, though. The default is LinearAlgebra.norm, but you can define an alternative version that masks every field before reducing it with the standard L2-norm.

You can also just hardcode the number of Newton iterations, and avoid using a convergence checker altogether. That's what we do in ClimaAtmos.

Sbozzolo commented 2 months ago

It would be good if we could also skip the evaluation of external forcings (ie, read from file) on masked values too.

This is not as critical because it probably won't affect performance that much and it not causing issues NaNs.