ben1post / phydra

Phydra is a python package for flexible and reproducible marine ecosystem modelling, using the Xarray-simlab-ODE Framework
https://xarray-simlab-ode.readthedocs.io/en/latest/
BSD 3-Clause "New" or "Revised" License
1 stars 0 forks source link

Comments on Chemostat and ASTroCAT prototypes #1

Closed benbovy closed 3 years ago

benbovy commented 4 years ago

Some answers on questions in https://github.com/ben1post/phydra/blob/master/prototypes/01_Chemostat_MODEL.ipynb and https://github.com/ben1post/phydra/blob/master/prototypes/02_ASTroCAT_MODEL.ipynb

(note: I had only a brief look and I need to look at it more in depth)

add multiple instances of each component

Instead of having separate process classes for each instance or having lists of lists, you could add those instances as a new dimension. It is compatible with the xarray data model, and I think it's cleaner code as you would deal with "flat" numpy arrays insteads of nested lists or many process classes. It would look something like:

@xs.process
class Phytoplankton:
    label_prefix = xs.variable(default='P')
    np = xs.variable(description='number of instances')

    # note `dims` that allows either a unique instance or multiple instances
    label = xs.variable(dims=[(), 'p_number'], groups='label', intent='out')
    init = xs.variable(dims=[(), 'p_number'], intent='in', groups='init')
    size = xs.variable(dims=[(), 'p_number'], intent='in', groups='size')

    # this component property is the same for all instances
    param1 = xs.variable()

    # this component property might have different values for instances
    param2 = xs.variable(dims=[(), 'p_number'])

    def initialize(self):
        if self.np:
            self.label = np.array([f"{self.label_prefix}-i" for i in range(self.np)])
        else:
            self.label = self.label_prefix

The downside is that if you want to set different number of instances for each component, then in each component class you'll need to re-declare all the variables accepting the x_number dimension, with different labels for each component classes, e.g., p_number, z_number, etc. It is not possible to set dimension labels for variables programmatically from other variable values. But that's not a big deal I think, only a few redundant lines of code.

If you want to easily plot the model outputs for all instances of all components, there is a lot of convenient functions in xarray that can be used to re-arrange the data, as a post-processing task. Alternatively, you can create another process class that flattens the data, like I've done in my gist https://gist.github.com/benbovy/f1898ee97d62c6df46976bc93002a14b with the class AllComponents (which could be easily updated to support multiple instances per component).

(not sure if feasible) but would be amazing, if the visualization of the model represented ecological components with their links & processes

I suggest you to implement all component processes (i.e., uptake, etc) in separate process classes instead of internal methods in the components classes, like I've done in my gist. Not only this is much more flexible if you want to drop/replace specific processes, but the model graph visualization will show the components and the processes (although all of them as graph nodes without differentiating component classes and process classes). I find the graph shown in this gist pretty clear.

how could this structure allow for an easy implementation of dimensionality? e.g. so that there are multiple "chemostats" components interacting in a grid.

Could you provide a more detailed example?

what is the best way to structure the xarray to contain all aspects of the model? it should store all relevant inputs, outputs and parameters for later analysis in an easily accesible manner.

Not sure to understand this one. xarray-simlab already stores all this information in output datasets.

lmfit integration

If you want to use lmfit for model calibration, no problem, writing a residual function (like shown in lmfit's docs) that simply wraps a model run might do the job, e.g.,

import xsimlab as xs
from lmfit import minimize, Parameters

def residual(params, model, data, in_ds):
    with model:
        out_ds = in_ds.update_vars(input_vars=params).run()

    return data - out_ds.my_output_var

my_model = xs.Model(...)
in_ds = xs.create_setup(model=my_model, ...)

params = Parameters()
params.add(...)

out = minimize(residual, params, args=(my_model, data, in_ds))   
ben1post commented 4 years ago

Hi Benoît, thank you very much for your help!

Sorry for the ambiguous questions, your reply definitely sent me in the right direction. I have written up a new chemostat model draft, that fixes a few of the problems the previous one had, and I am pretty sure that this could be the basic structure of all models in the package.

You can have a look here: New chemostat model draft

I have thought quite a bit about added functionality, that xsimlab as of yet does not provide, but that I would like the phydra package to provide at some point, and that is:

1. Solving the model via solve_ivp() or odeint(), i.e. without explicit time loop

2. Provide some diagnostics or visualizations that show all fluxes, and the components they act on, in a clear overview (without ModelSetup and connections for inheritance/variables, just the plain fluxes, connecting state variables with their direction. To display and potentially debug the system of ODEs/PDEs that describe the model, ideally also with units)

I think 1. is already an issue you opened on the xsimlab repo, whilst 2. might just be a functionality of my package. Perhaps both can be built into xsimlab... i have a feeling that they are technically related (i.e. dealing with ODEs).

My rough idea to solve 1.+2. within phydra now, would be to include a context object, similarly to what you did in the fastscape package (to use the FORTRAN code, I presume). But this seems a bit hacky, and it might make more sense to work completely within xarray-simlab.

The key "problem" underlying this is that I am only dealing with differential equations, and the self.run_step() in @xs.process is of course not specifically optimized for this case. Perhaps a self.differential() method could be created instead?

I think this would be possible similarly to how I define my "Fluxes" in this example. The way I understand it, the method could easily be backwards compatible with self.run_step() and explicit time steps. A possible self.differential() method, would need to

I have also not yet had an idea for a useful way to define a "Flux" parent class, because like the dimensions of each variable, each flux needs a more specific assignment of "groups" in the child class.

But even without any of these wish-list features, the functionality of the example chemostat model is enough to warrant finishing a first version and publishing that. I'd be happy to hear any suggestion you might have on improving it!

Finally, to spin further on the dimensionality question, but more specifically here: Currently, I define the dimensionality of each component in the child class of Component as a numpy array of the labels. Is that good practice with xsimlab? I am particularly thinking about adding more dimensions, like dims=('N', 'x', 'y'). Am I on the right track with adding a "Grid" process, that defines x and y as an xs.index, and adding dimensions to each component like dims=[('N'), ('N', 'x'), ('N', 'x', 'y')]?

ben1post commented 4 years ago

@benbovy I have worked the structure out a bit more, and written a 2 dimensional model prototype.

Now that I have a better practical understanding of the xsimlab structure, I see why the questions I wrote in the last comment are quite vague and rather complicated to solve. I will write the first draft of the package as is, and only then start thinking about implementing some of the additional functionality I mentioned.

I'd be very happy to hear your comments about the 2D model implementation. Perhaps you can see some easier way to do certain things in the way I used xsimlab here. It follows the structure of your fastscape models. The biggest difference in application from marine ecosystem models to landscape models is (from my impression) that I need to have an ecosystem (of multiple components with each a flexible dimensionality) embedded at each grid point, instead of "just" an elevation value.

Something I am not entirely clear on yet is how to achieve flexible dimensionality. It is somewhat achieved in the 2D prototype by passing the higher dimension (x, y) from GridXY to other processes and initializing the components and state variables as a numpy array of the correct dimensionality. I think I still need the hard-coded choices for dims=[( 'Comp'),('Env', 'Comp'),('x', 'Env', 'Comp'),('x', 'y', 'Env', 'Comp')] since I can't pass a variable to the dims argument of a xs.variable in a xs.process.. right? (e.g. like xs.variable(dims=self.model_dims, intent='out')) Additionally it would be nice to be able to pass the dims label (i.e. xs.index() name in specific process) as the process name chosen when creating the model from processes using xs.Model('P1':Phytoplankton, 'P2':Phytoplankton,...) or as an input variable to the process at xs.create_setup(), so that I can instantiate multiple Phytoplankton processes that don't overlap in their dimensions.. but perhaps I am misunderstanding how xsimlab handles dimensionality and it is simply not possible or feasible.

I will start moving my prototype code from the notebooks to a processes folder now, and use these to build the 3 zero-dimensional models that I plan to present in the publication. Is there a specific philosophy you followed in structuring your fastscape processes? I will try to adapt my structure from there.

benbovy commented 4 years ago

I see that you have used an approach that is opposite to the one I used in my gist, i.e., your component classes declare their state and fluxes and are at the end of the process dependency chain. Perhaps this is a more "readable" solution (I don't have enough knowledge in this domain to know this). One drawback, however, is that you loose some flexibility, e.g., if you want to add or remove a component (Nutrient, Phytoplankton) you need to re-write some process classes like ChemostatGridXYSetup, you can't just do my_model.drop_processes(...) or my_model.update_processes(...). But maybe you don't need that flexibility?

how to achieve flexible dimensionality

I like the analogy between model construction vs. model runtime in xarray-simlab and compile time vs. runtime in compiled languages.

Model construction is like compilation, it performs some operations and returns a static Model object. So, by design, it's not possible to change the model structure during runtime (i.e., from model input values).

Variable dimensions are part of the model structure, they are static and hard-coded. You could still use some constants to define them in other to have a more DRY code, but you can't define them from model input values.

One way to achieve flexible process classes is using class factories, e.g., something like:

import attr

@xs.process
class BasePhyto:
    # ...

def make_phyto_cls(new_cls_name, arg1, arg2):
    attrib_dict = attr.fields_dict(BasePhyto).items()

    attrib_dict.update(...)   # some logic to re-define some variables

    new_cls = attr.make_class(
        new_cls_name, 
        attrib_dict,
        bases=(BasePhyto,),
        init=False,
        repr=False
    )

    return xs.process(new_cls)

my_model = xs.Model({
    'p1': make_phyto_cls('Phyto1', arg1, arg2),
    'p2': make_phyto_cls('Phyto2', arg1, arg2)
})

This is used internally in xarray-simlab _ProcessBuilder.

Alternatively, you could add a dimension for p1 and p2. I think the latter solution is better if what only changes between Phyto1 and Phyto2 processes is a dimension label, and that p1 and p2 have the same size.

ben1post commented 4 years ago

Thanks so much for the explanations! Class factories might just be what I was looking for, I'm currently testing some implementations. I have not tried it so far, but I will probably need to add a few wrapper functions to xsimlab for my specific application.

Regarding the way that states are handled in the 2D chemostat prototype, this was inspired in particular by the advection example model presented in the xsimlab docs.

I'll try to explain my reasoning below. It is not perfectly worked out yet, but I hope it explains a bit more of what I am trying to achieve. In the advection example model, there are a few basic processes:

  1. A grid (in fastscape with x and y dimensions, in the advection example it is the process UniformGrid1D with dims='x'), which forms the spatial structure of the model.

  2. A certain value is placed along this grid, inheriting these dimensions. This would for example be the ProfileU process:

    class ProfileU:
    """Compute the evolution of the profile of quantity `u`."""
    
    u_vars = xs.group("u_vars")
    u = xs.variable(
        dims="x", intent="inout", description="quantity u", attrs={"units": "m"}
    )
    
    def run_step(self):
        self._delta_u = sum((v for v in self.u_vars))
    
    def finalize_step(self):
        self.u += self._delta_u

    This process collects all other processes acting on the property u along the dimensions of x and keeps track of it's state.

  3. intent='inout' requires the state of u to be set as model input at runtime (which would be complicated when dealing with higher dimensions) or from another "initialization process", as is done in the example by InitUGauss.

So like the advection model, the marine ecosystem models I want to build can be setup on a grid (i.e. the dims=(x, y) in my 2D chemostat prototype). But instead of a "just" the profile of u, at each grid-point I have an Environment embedded there, that is a 0D container for all other model components. The grid can be initialized with higher dimensions or without dimensions, so that this Environment is simply a 0D ecosystem model (like the first chemostat prototype for example).

Each Environment (placed at each grid point in a grid of flexible dims [(Env), (x, Env), (x, y, Env), (x, y, z, Env)]) contains a variable number of components that interact. The exchange between components is computed via Fluxes in phydra, additionally we have GridFluxes that compute exchanges between grid-points, like mixing terms in marine ecosystem models. These could only be applied to a model that is initialized on a higher-dimensional grid.

Now I think the way the grid is built is quite straightforward, what I am not certain of yet is how to structure the Environments and their components. What I did in the 2D chemostat prototype is passing down the grid dimensions, and np.tile()-ing the initialized component states to the correct dimensionality supplied from the ChemostatGridXYSetupprocess. Every component tracks its own state in full model dimensions and collects fluxes from processes at any level. Fluxes and Gridfluxes have to always handle full model dimensions and have hard-wired xs.group() connections to components, and therefore be modified for specific model applications.

Ideally it would be possible to define separate fluxes for each dimensional level: CompFluxes for the components of an Environment (0-1 D), ForcingFluxes on the 1D level of the Environment (which is essentially a list of components) and GridFluxes in the 1-3 D dimensions of the grid. It should be flexible so that at model setup it would be enough to supply the parameters and variables for a 0D implementation, and then give additional x and y dimensions, and the model is automatically replicated across dimensions, and set up as a large numpy array in the backend.

All my current ideas require some level of custom labeling/routing of fluxes, so that processes need to be rewritten for different setups, instead of having a modular toolbox where models can be assembled from simple components, only with different parameters (..which might also just be a bit too idealistic of a goal). I need to work on this a bit more, but there was one more specific question that I had:

ben1post commented 4 years ago

Thank you for pointing me in the direction of class factories @benbovy. I was just working on implementing a minimal example, but attrs is giving me this error with the code example you shared. The following part:

new_cls = attr.make_class(
        new_cls_name, 
        attrib_dict,
        bases=(BasePhyto,),
        init=False,
        repr=False
    )

returns this error:

~/opt/anaconda3/envs/xsimlab3/lib/python3.8/site-packages/attr/_make.py in from_counting_attr(cls, name, ca, type)
   1834         return cls(
   1835             name=name,
-> 1836             validator=ca._validator,
   1837             default=ca._default,
   1838             type=type,

AttributeError: 'Attribute' object has no attribute '_validator'

I have never used attrs and don't really understand why it is asking for _validator, if validator = None in the Attribute objects in the attrib_dict dictionary. I put some code to test it down at the bottom of this notebook, that I also linked above with the other question. Do you perhaps know if this can be easily resolved?

EDIT: I could partially solve this error by passing the list of keys instead of the full attrib_dict:

import attr
import xsimlab as xs

@xs.process
class Test:
    TestVar = xs.variable(default=1)

attrib_key_list = list(attr.fields_dict(Test).keys())

new_cls = attr.make_class(
         'Test2',
        attrib_key_list,
        bases=(BasePhyto,),
        init=False,
        repr=False
    )

new_cls

out: __main__.Test2

unfortunately this seems to only be able to rename the class, not change the variables.

ben1post commented 4 years ago

I could solve the previous problem simply by using a function that creates a new subclass of a Componentparent class like this:

def createMultiComp(base_process, comp_label, comp_dim):
    """ This function allows addition of a specific label and dimension to create
    multiple interacting instances of a component"""
    @xs.process
    class AddIndexCompLabel(base_process):
        label = xs.variable(intent='out')
        dim = xs.variable(intent='out')
        index = xs.index(dims=comp_label)

        output = xs.variable(intent='out', dims=(comp_label, 'time'))

        def initialize(self):
            self.label = comp_label
            self.dim = comp_dim
            self.index = [f"{comp_label}-{i}" for i in range(comp_dim)]
            super(AddIndexCompLabel, self).initialize()

    return AddIndexCompLabel

I am not sure if that is the best way to do it, but it does the job. This function is stored in a new utility module of phydra.

_EDIT: I just found that there are issues with using index = xs.index(dims=comp_label), somehow the actual variable name index interferes with the dims argument to xs.index() and creates errors when the value of self.dim (i.e. number of instances) differs between components. This seems strange to me.. is it meant to be that way? Ideally for my application the string passed to the dims argument should be the actual name of the index in the rest of the model._

The last weeks I have been working a lot more on the core code, and one important thing I realized is that I need some kind of solver backend that handles solving these models with an implicit time step (since I am only dealing with ODEs & discretized PDEs). The solution to this that I found is using the python GEKKO package, that provides an interface to the AP Modeling language. It provides a lot of solving tools for ODE systems, I am specifically using the sequential dynamic simulation mode (IMODE=7).

There are certainly more elegant ways to interface it with xarray-simlab.. what I have done now is: creating defaultdicts in a Context process, that are passed around and added to by processes and contain the specific model components as gekko objects (i.e State Variables, Intermediates & Equations). Model evaluation happens within the first time step of clock:[0,1] and output is returned from the state variables in the context defaultdict to xs.variables with the correct dimensions in the finalize_step() of each Component Process. A first prototype is shown in this notebook.

Gekko compiles the model code before solving, which should mean faster solving times as well (I haven't run any meaningful tests yet). Unfortunately this looses some control & benefits from xarray-simlab's discrete time-step implementation, but I feel like it makes more sense for the types of models to utilize a different and more appropriate solving method. Any input or help would be very much appreciated. I need to finish a first version of phydra within the next few days, so I have to move forward with what I have.

benbovy commented 4 years ago

A few comments regarding process class factories:

Looking at your comment, you can't use a dictionary of attr.Attribute objects in attr.make_class. Instead, you need to provide a dictionary of attr.attrib(). The latter function returns a _CountingAttr intermediate object (attr internals), which is converted to an attr.Attribute object only when building the attr class.

In your last comment, the class closure seems to work but it's not very nice: see this gist, in which I wrote an alternative solution. This solution does not use attr.make_class, since actually it is not supported in xarray-simlab (I opened this new issue: https://github.com/benbovy/xarray-simlab/issues/139). Maybe it would make sense to add in xarray-simlab something like xs.make_process(), for convenience.

ben1post commented 4 years ago

EDIT: I assume the question in this message is irrelevant, because there is an open issue about this exact functionality on the xsimlab repo.

@benbovy, thanks a lot for the explanation! Initializing components with the make_Component() function you showed is much cleaner and fixes the problem I was having with differing dimensionality. I have already implemented it.

Another problem that I just came across relates to the simulation Stages that are built into xsimlab by default. Is there any way to modify them for my specific implementation? The specific problem I have run into, is that the three stages:

  1. Initialize()
  2. run_step()
  3. finalize_step()

do not allow me to define some additional fluxes, because run_step (2.) is already the stage at which full model equations are assembled and gekko context solves the model (clock is [0,1], evaluation automatically in right order with current process structures) and finalize_step (3.) is the stage where the gekko output is stored back in xsimlab variables.

I would need an additional step between (1.) and (2.), or a way that I can force a certain process to be evaluated at the beginning of run_step (2.) before the model equations are assembled within the Component process. The natural process sorting does not work with my current structure.

More specifically, what I am trying to do: I need to initialize a defaultdict in Process A that I can dynamically add terms to in other processes (e.g. Process B). Process B inherits the defaultdict via xs.foreign() from Process A. Now I initialize Process A, then Process B and then I need to "initialize" Process A again with all items in the defaultdict. When I add the code to run_step(), it is evaluated after the equations are constructed..

Is there some way to easily add to, or modify SimulationStage in the process.py? or another way this could be achieved?

ben1post commented 4 years ago

@benbovy This notebook highlights an error I receive by using the make_Component function the way you presented it, when dealing with variable dimensions of Components within the model.

The error message points quite obviously to where the problem lies, that vname is the actual index variable identifier when index vars are written to the underlying zarr files... usingindex = xs.index()as a common variable where only the dims= arguments are exchanged therefore mixes them all up. I need to change the actual variable name, or change how xsimlab handles it. I will try to fix this within the make_Component function. This is a very important functionality for me, so any help would be much appreciated!

Concerning my last message, I realized there is already an open issue on the xsimlab repo. For now, I am working with the limited simulation stages.

EDIT: found a working solution (at least as a prototype), that also allows flexible initialization of an index var, with also a specific variable name supplied:

class A:
"""base class that is modified"""
    dim = xs.variable(intent='in')

    def initialize_post_dimsetup(self):
        print('INIT of base class')

def make_process(base_cls, cls_name, dimvar):
    """function that adds a new index var, and adds a new initialize function"""
    new_dim = xs.index(dims=dimvar)
    base_dict = dict(base_cls.__dict__)
    base_dict[dimvar] = new_dim

    def initialize_new(self):
        dim = getattr(self, 'dim')
        setattr(self, dimvar, np.arange(dim))
        cls_here = getattr(self, '__class__')
        super(cls_here, self).initialize_post_dimsetup()

    new_cls = type(cls_name, base_cls.__bases__, base_dict)
    setattr(new_cls, 'initialize', initialize_new)
    return xs.process(new_cls)

B = make_process(A, 'B', 'y')

I am not sure if this use of classes & attributes is good practice.

benbovy commented 4 years ago

The error message points quite obviously to where the problem lies, that vname is the actual index variable identifier when index vars are written to the underlying zarr files... usingindex = xs.index()as a common variable

That's right, xs.index is a special case where only the name of the variable is used for storage in zarr datasets and output in xarray Datasets (the process name is not used), hence the conflict.

I am not sure if this use of classes & attributes is good practice.

While it seems to work, it looks very hacky. I do think there are cleaner solutions. For example, you could avoid using xs.index here (just use a regular xs.variable(intent='out'), and in another process class or manually from your output datasets, you can create a MultiIndex to stack all your components into a flat dimension.

ben1post commented 4 years ago

Thank you for the feedback! From a user perspective I would expect the zarr storage to instead use the label supplied to the dims keyword for storage, but I assume from your wording that this behaviour of xs.index is by design.

my current solution for flexible dimensions:

I might be misunderstanding xarray-simlab design principles, but I am looking for a solution to flexibly create linkages and dimensions between variables in processes after model setup that provides a straightforward user interface. I am reluctant to simply base this on inheritance, because a custom subclass of a process written for each state variable in an ecosystem model seems rather cumbersome for an end user that might not be familiar with object-oriented python programming, particularly when dealing with complex food-webs and multiple functional groups of state variables.

Combined with the caveat that xs.index needs a unique variable name, I have solved it like this for now, with a classmethod that returns a modified copy of the class (Full code of a working model is in this notebook):

class FunctionalGroup:
    """ 
    undecorated class to create xs.process functional group 
    containing multiple state variables that share fluxes
    """
    label = xs.variable(intent='out', description='the label supplied at model initialisation')
    value = xs.variable(intent='out', dims=('not_initalized', 'time'), 
                                                  description='stores output in dimensions supplied at model init')

    num = xs.variable(intent='in', description='number of state variables within group')
    initVal = xs.variable(intent='inout', description='initial value of component')

    m = xs.foreign(GekkoCore, 'm')  # this contains the GEKKO model instance (for assembly and solving)

    def initialize(self):
        self.label = self.__xsimlab_name__
        print(f"functional group {self.label} is initialized with size {self.num}")

        # this creates numbered index with label:
        if self.num == 1:
            index_list = [f"{self.label}"]
        else:
            index_list = [f"{self.label}-{i}" for i in range(self.num)]
        setattr(self, self.label, index_list)

        # self.m.SV creates a state variable in the GEKKO model 
        #  that is stored in a custom defaultdict shared by all processes
        self.m.phydra_SVs[self.label] = np.array([
                           self.m.SV(name=f"{self.label}_{i}", value=self.initVal) for i in range(self.num)])
        self.value = [sv.value for sv in self.m.phydra_SVs[self.label]]

    def run_step(self):
        print(f"assembling Equations for {self.label}")

        # This is the special step needed for the gekko backend, where the equations are assembled
        # from all initialized influx in previous initialize step

        gk_array = self.m.phydra_SVs[self.label]
        fluxes = self.m.phydra_fluxes[self.label]
        for i in range(self.num):
            self.m.Equation(gk_array[i].dt() == sum([flux[i] for flux in fluxes]))

    @classmethod
    def setup(cls, dim_label):
        """ create copy of process class with user specified name and dimension label """
        # have to create a copy of the class, so that classmethod does not modify the base class
        new_cls = type(cls.__name__ + dim_label, cls.__bases__, dict(cls.__dict__))

        # add new index with variable name of label (to avoid Zarr storage conflicts)
        new_dim = xs.index(dims=dim_label, groups='SV_index')
        setattr(new_cls, dim_label, new_dim)

        # modify dimensions 
        new_cls.value.metadata['dims'] = ((dim_label, 'time'),)

        # return intialized xsimlab process
        return xs.process(new_cls)

This would then be called at model setup, like so:

MODEL = xs.Model({'core':GekkoCore, 
                  'solver':GekkoSequentialSolve, 
                  'time':Time, 

                  'N':StateVariable, 
                  'P':FunctionalGroup.setup('P'), # same label needs to be supplied for now (todo)
                  ...
                  })

Instead of hacking the flexibility like this, I could write a sub-process for each model component, but this seems to introduce unnecessary complexity to the library and requires a user to write a new process for every new component with a unique dimension and label. For now, I will probably have to stick to my hacky code of adding a custom index via a classmethod class factory. A more elegant solution would initialize dimensions based on their self.__xsimlab_name__, which would have to be at the xs.create_setup() step, but I don't think this is possible using immutable attrs Attributes.

The only way I could think of solving this using xarray-simlab's built-in functionality, would involve initializing all state variables separately (perhaps with a wrapper process factory called at model creation and setup) and aggregating these subprocesses to a functional group via xs.group variables. But this would still require dynamic or custom group labels and (index) dimensions, to initialize multiple instances of the same model component.

in reference to the github issue 52 on the xarray-simlab repo:

As shown in the code snippet above, my current code structure requires using the run_step simulation stage to assemble model equations, that are then solved at the finalize_step simulation stage. It works due to the clock setup, and actual solve time supplied from the Time process. xs.create_setup(.... ,clocks={'clock': [0,1]}, input_vars={ 'time__days': ('time', np.arange(0, 10, .1)),....} )This is why I commented on the issue on the xarray-simlab repo.

It would be very useful if there was a way I could use something like the following:

@xs.process
class Component
     # x = xs.variable()

     @xs.initialize(level=0)
     def initialize_state variables(self):
          # initializing state variabels

     @xs.initialize(level=1)
     def initialize_equations(self):
         # initializing equations

In order to assemble the model equations from all components and fluxes previously initialized. Then Phydra could have the option to use the run_step stage to solve the model with a step-wise call to GEKKO to solve (self.m.options.IMODE = 4).

This could probably be solved by adding another process per each SV (or functional group) that assembles the equations. But this seems again to unnecessary complicate the user interface. I am not sure if my intuition here is misguided or if this type of flexibility is better solved in another way.

ben1post commented 3 years ago

@benbovy I have completely redesigned the framework over the last months, and am now using a model core class stored as an xs.object() that is shared between all model processes to allow for string label based routing between equations and utilizing different types of solvers. Additionally, the package provides a user interface of a decorated class and attr.attrib based variables with metadata (copied from xarray-simlab's design) that returns a fully functional xsimlab xs.process.

Now the package works well even with multi-dimensional ODE-based models, and I feel confident to close this issue down, and open new & more specific issues in its stead. The code is far from clean, and not documented yet, so now comes the hard work of polishing and making it ready for publication.

I don't know why you stopped responding to my emails, but I assume there is a good reason, if it is some technical issue (e.g. your email has changed) this is my last try to reach out to you. I definitely appreciate all the advice you have given me at the start of the project, and this work wouldn't be possible without xarray-simlab as a starting point and framework. Thank you!

benbovy commented 3 years ago

Impressive work @ben1post! I replied to you last email.