Spatially varying parameter

kmdeck commented 1 year ago

Is your feature request related to a problem? Please describe. The purpose of this is to introduce code which allows us to read in datasets of spatially varying parameters which do not vary in time and make use of them throughout the simulation. Time dependent spatially varying parameters are being addressed by @Sbozzolo and link to PR/Issue:

We currently have regridding tools implemented for reading in 2D data from a file and storing in the cache at the beginning of the simulation (the Bucket bare ground albedo). We also support site level runs where the parameters are simply scalars.

Requirements

A unified interface for handling site-level and global runs [surface parameters as scalars or 2d fields, or as 1d fields (depth resolved at a site) or 3d fields (depth resolved globally)].

We would follow the same approach as the TimeVaryingInput: an abstract type AbstractSpaceVaryingInput, a constructor SpaceVaryingInput which has the same interface regardless of domain configuration, and concrete types of SpaceVaryingInput0D (scalar), SpaceVaryingInput2D, etc. We would implement these first and can follow on with the 1D, 3D, and analytic cases as needed. These types implicitly specify several things:

Each concrete type would define a method of evaluate! which updates the values of the parameters (where these values are stored and where this update is called is discussed below). This would happen only once prior to the start of the simulation. NOTE: we may not need to call this evaluate!, i.e. we may not need to extend the function we have already for TimeVaryingInput. Let's discuss if it is cleaner to use a different name and function.
a DataHandler when needed (1D, 2D, 3D). In this case, this would be a FileReader which reads in the data and handles regridding. Details of the regridding would be stored in the FileReader object and the SpaceVaryingInput would not need to know about it. In temporally varying cases, the DataHandler would also use the same FileReader structs and methods, but also contain an object that specifies how and when to read in the data during the simulation. The difference is that in the temporally constant case, we only need the FileReader because all the data will be read prior to the simulation start and because we do not need to update the values in time.

I think the FileReader will ultimately only contain the path to the raw data, and then it will read it in, regrid it, when asked to by evaluate!. Does this mean that in the temporal case that regridding happens on the fly? The alternate is what we have now: regridding up front and storing those in separate files. If we do this, the FileReader will contain the path to the raw and regridded data. I think this is what we have implemented currently.

Decision on where to store the spatially varying parameters.

The options are in the cache, and we set the parameters with set_initial_cache which already exists, or in the parameter struct for the model, and they get set with the constructor.

The upside to the former is that we already have a set_initial_cache function we can work with. We could just add evaluate! commands for all the parameters for that model. It might be nice to define a default which does this. The challenge to this is that then we need to add in parameters based on model type (and model parameterization type) in flexible way, which isnt hard to do (we do this already for other aspects of the model, like prognostic variables), but requires more code changes. The cache would then have a mix of spatially and temporally varying quantities, and globally constant params (e.g. g) would be stored elsewhere.

Alternatively, we can store the parameters in the Parameter struct itself for the model, along with the earth_param_set (global constant params and fundamental constants). This is more akin to what we are doing now and may be conceptually cleaner (parameters are stored in the parameter struct, and not part stored in the cache and part stored somewhere else). It also means that what is in the cache are things that get updated in time, with the exception of the dss_buffer :P and that's kind of nice. And, since the models already specify their parameters in the Parameter struct, we dont need to define a new way of adding these fields to the cache.

I vote option 2.

Proposal

Add PR adding in infrastructure of SpaceVaryingInput. Use the existing FileReader which reads/regrids (across the board, even for temporally varying case). We can then change that later if we want for all inputs that vary spatially. For proof of concept, use for the canopy model (0D case). Store the values in the Parameter struct of the model and not in the cache.
Extend to soil model (0D case)
Extend to 2D case for soil and canopy
Make bucket model conform to this
Update FileReader as needed for both Space and Time varying input
Extend to 1D and 3D as needed

Sbozzolo commented 9 months ago

I think the FileReader will ultimately only contain the path to the raw data, and then it will read it in, regrid it, when asked to by evaluate!. Does this mean that in the temporal case that regridding happens on the fly? The alternate is what we have now: regridding up front and storing those in separate files. If we do this, the FileReader will contain the path to the raw and regridded data. I think this is what we have implemented currently.

FileReader will probably be more complex than just path to raw data. It will probably store in memory some regridded data, and yet an object Regridder that knows how to do regridding. If possible, we should no go through additional I/O to do regridding, especially on GPU.

juliasloan25 commented 9 months ago

This sounds good to me!

I agree that storing spatially-varying parameters in the parameter struct seems cleaner. It would be nice to have all the parameters in one place.

I was thinking that it might be more clear to have a different function than evaluate! for the spatially-varying case, since we won't be evaluating/updating over time. Maybe something like set! would emphasize that we're setting values that won't change throughout the simulation.

kmdeck commented 9 months ago

This sounds good to me!

I agree that storing spatially-varying parameters in the parameter struct seems cleaner. It would be nice to have all the parameters in one place.

I was thinking that it might be more clear to have a different function than evaluate! for the spatially-varying case, since we won't be evaluating/updating over time. Maybe something like set! would emphasize that we're setting values that won't change throughout the simulation.

I like set!! I will try that in the proof of concept

kmdeck commented 9 months ago

I think the FileReader will ultimately only contain the path to the raw data, and then it will read it in, regrid it, when asked to by evaluate!. Does this mean that in the temporal case that regridding happens on the fly? The alternate is what we have now: regridding up front and storing those in separate files. If we do this, the FileReader will contain the path to the raw and regridded data. I think this is what we have implemented currently.

FileReader will probably be more complex than just path to raw data. It will probably store in memory some regridded data, and yet an object Regridder that knows how to do regridding. If possible, we should no go through additional I/O to do regridding, especially on GPU.

Im not sure it should hold the regridded data, because that is the thing we ultimately need. And that will live in the parameter struct as the parameter values. when I have something more concrete we can discuss!

juliasloan25 commented 7 months ago

implemented in ClimaUtilities: https://github.com/CliMA/ClimaUtilities.jl/pull/19

CliMA / ClimaLand.jl

Spatially varying parameter #126