Deltares / imod-python

🐍🧰 Make massive MODFLOW models
https://deltares.github.io/imod-python/
MIT License
17 stars 3 forks source link

make regrid method a package variable rather than a class variable #960

Closed luitjansl closed 4 months ago

luitjansl commented 5 months ago

Currently regridding a simulation using non-default methods is a pain. A non-default regridding method can only be specified on the package.regrid_like function. So if the user wants to regrid a simulation, and wants to use a nondefault method for one package, then this package must be regridded separately and then must be added to the regridded simulation.

It would be more convenient if the regridding method would be an init parameter for each package- then the user has the freedom to choose any method for any package, and regridding the model or simulation would only require calling _regrid_like on the model or simulation.

Huite commented 5 months ago

I would suggest putting this as arguments in the regrid call, rather than the init.

(If people are really serious about it and really always prefer their own methods, they could even subclass a package and overload the class variables.)

luitjansl commented 5 months ago

Each package can have multiple arrays, and each of these arrays can have its own regrid method. Therefore setting these methods as an argument in the regrid_like method of the simulation would involve passing a large list of regridding method, ordered per model, then per package and then per array. Assembling a data structure with all that information would be a bit of a burden on the user. We could also use setters: then each package would have a setter where we pass the regrid method for the arrays of that package.

Huite commented 5 months ago

It's currently stored as a dictionary in the class:

    _regrid_method = {
        "stage": (RegridderType.OVERLAP, "mean"),
        "conductance": (RegridderType.RELATIVEOVERLAP, "conductance"),
        "bottom_elevation": (RegridderType.OVERLAP, "mean"),
        "concentration": (RegridderType.OVERLAP, "mean"),
    }

A dict seems manageable for the user, except it's really annoying to have to set all the values if you're only interested in e.g. setting the stage to use the maximum.

What I rather dislike about it being a dynamic instance variable is that it's a form of state that isn't represented by the dataset. Having all relevant state in the dataset is nice, since it's communicated clearly to the user (no work on repr methods required) and you can directly dump to netCDF / zarr / etc.

Anyway, a different suggestion: instead of a class dict, we could also supply a named tuple or dataclass per package with default arguments:

class RiverRegridMethod(NamedTuple):
     stage: tuple=(RegridderType.OVERLAP, "mean")
     conductance: tuple=(RegridderType.RELATIVEOVERLAP, "conductance")
     bottom_elevation:tuple=(RegridderType.OVERLAP, "mean")
     concentration: tuple=(RegridderType.OVERLAP, "mean")

Then, if you want change regrid methods for a package:

method = RiverRegridMethod(stage=(RegridderType.OVERLAP, "maximum"))
new = river.regrid_like(..., method=method)

With regrid_like:

def regrid_like(self, target_grid, method=None):
     if method is None:
          method = RiverRegridMethod()
    ...

Pydantic with a dataclass is possible too, maybe better since it can validate the user provided regridding method input straightaway.

JoerivanEngelen commented 4 months ago

I like the idea of using separate named tuples or dataclasses to store regridding methods for packages. We could even store these in a separate module, to have all the default regridding settings in one place. This would resolve the issue as described by the title.

Regarding that it is painful to regrid simulations; if we follow @Huite's suggestion, a user has to follow the following steps to regrid a simulation with a custom method for a specific package:

rch_coarse = mf6sim_coarse["gwf"].pop("rch")
mf6sim = mf6sim_coarse.regrid_like(...)
rch_regrid_method = RechargeRegridMethod(rate=(RegridderType.OVERLAP, "maximum"))
rch = rch_coarse.regrid_like(..., method=rch_regrid_method )
mf6sim["gwf"]["rch"] = rch

I think this makes for an acceptable API. Moreover, the popping of the package is not even necessary, just to avoid the overhead of regridding a package twice.

luitjansl commented 4 months ago

Like I argued when we talked about validation: I would like a text file/excel sheet that describes the mf6 data model: the packages that exist, and the arrays that they potentially have. Then the default regrid methods (and the validation criteria and as much other metadata as we can), would be listed in that sheet per package and per array. (For packages that don't have much distinguishing code, we could use a "standard package "class that would allow us to remove the recharge/diffusion/ghb/... packages from the codebase)

Regarding the steps for regridding a simulation with a custom method for a specific package: there is one more step needed. It would be:

JoerivanEngelen commented 4 months ago

You are right about the masking, I missed that one.

A table (excel/textfile) which lists all default settings for packages could be something we work towards to enhance transparency, but I don't see the real need now for it yet. Some of the challenges/questions with it I foresee/have are:

This may come across as that I'm strongly opposed of this idea, which I'm not. I think this could be beneficial to users and aid transparency. However, since we are with quite a small team, I defer from working on this until we have achieved the required functionality, which is still quite a big task.