Eliminate traps associated with accessing ThermoPhase objects in use by Reactors

Abstract

For a ThermoPhasein use by a Reactor, the state of the ThermoPhasedoes not necessarily correspond to the state of the Reactor at the current time of the ReactorNet. This can lead to users extracting incorrect state information during integration (see this UG post, for example). In addition, the current requirement that a single ThermoPhasecan be shared across multiple Reactors increases the complexity of evaluating the Reactor's governing equations.

Motivation

After an integration step by ReactorNet, the state of the ThermoPhase object will generally be set to the latest time reached by the CVODES integrator, $t_{int}$. However, this time may be beyond the output time specified by the user (using the advance(t_user) method). If the user accesses the ThermoPhase object directly, they will get the state at $t{int}$, in contrast to getting the state at $t{user}$ by accessing it through the reactor.thermo property, in which CVODES will provide a result interpolated back to the correct time. For simple usage, like plotting the state as a function of time, getting the state at a slightly different time won't noticeably affect results. However, for cases like trying to evaluate sensitivities using a finite difference approach, two reactor networks will not have reached the same value of $t_{int}$, introducing a large amount of noise. This is the phenomenon dubbed "Bjarne's Trap" by @rwest, and has been a source of confusion for a long time.

Because the current structure allows multiple reactors to share a ThermoPhase object, there is a significant amount of complexity in the logic for evaluating the governing equations for all reactors in the network.

The current evaluation logic amounts to the following (in Python pseudocode):

def eval(t, y, ydot):
    # ReactorNet::updateState
    for reactor in all_reactors:
        reactor.thermo.set_TPY(y[...])
        # save the current state information 
        reactor.state = reactor.thermo.state
        # compute auxiliary quantities based on state, for any quantities that may need
        # to be accessed from other reactors, e.g. mass fractions, pressure, or enthalpy
        reactor.local_vars = f(reactor.thermo)

    # ReactorNet::eval
    for reactor in all_reactors:
        reactor.thermo.restore_state(reactor.state)
        ydot[...] = f(reactor.thermo, reactor.kinetics, reactor.local_vars, reactor.neighbors.local_vars)

This complexity has resulted in difficulties for users who want to implement modified governing equations using the ExtensibleReactor framework. For example, see this problem posted on the UG, and this one too.

Possible Solutions

I think the solution to both of these problems to stop having multiple reactors share ThermoPhase/Kinetics objects, and to have ReactorNet.step and ReactorNet.advance automatically synchronize the state of the reactors after each call (by default).

Thanks to the implementation of AnyMap-based serialization, we can still allow users to use a single Solution object to set up multiple reactors. When initializing the ReactorNet, we can then check to see if any Solution object is being used multiple times, and then clone it to create a distinct Solution for each Reactor.
- If a gas object is being used only once, don't make a duplicate, and automatically synchronize it after every call to step/advance (TODO: verify if this is necessary for step; not entirely clear when that would be the case)
- If a gas object is being used for multiple objects, create an internal duplicate for all of the internal uses, leaving the external one corresponding to the initial condition for whichever reactor was set up last. The other reactors will be accessible via the thermo properties of the reactor objects. We will still automatically synchronize these after each step -- not rely on the magic call to syncState when the getter is used.
The automatic synchronization can be disabled by passing a sync=False option to advance() or step().
The "simple" alternative is to force this on the user -- we just check and tell the user that they can't use the same Solution object multiple times, but I think the automatic cloning is much more user friendly and shouldn't be that difficult to implement.

After this change, the evaluation logic would look more like:

def eval(t, y, ydot):
    # ReactorNet::updateState
    for reactor in all_reactors:
        # Reactor.updateState
        reactor.thermo.set_TPY(y[...])
        # Calculation of reactor-local variables is optional, for efficiency only

    # ReactorNet::eval
    for reactor in all_reactors:
        # Evaluation can directly access thermo from this reactor and neighboring reactors
        ydot[...] = f(reactor.thermo, reactor.kinetics, reactor.local_vars, reactor.neighbors.thermo)

I'd like to make sure that any changes to how the work of evaluating the reactor governing equations is divided up would provide relatively clean solutions to a few scenarios that have come up on the Users' Group. Namely:

https://groups.google.com/g/cantera-users/c/0IeoYgEr-Ng: imposing $T(t)$ while keeping other reactor properties (volume, density, etc.) consistent
https://groups.google.com/g/cantera-users/c/ASrSvruxw-g: using thermo properties in an ExtensibleReactor override override of update_state

@speth ... this certainly looks like something we should try to make more user friendly.

Regarding use cases: the first issue (ODE integration) could likely be handled by automatically calling syncState (or something equivalent) at the end? Whenever accessing ReactorBase.thermo, we already have a restoreState in place, so that should be covered?

For the second case, we could also do some thing like

reactor.set_TPY(y[...])

instead of

reactor.thermo.set_TPY(y[...])

and handle things internally. The exact terminology/API is obviously debatable.

At the same time, tackling what amounts to a copy constructor for Solution objects is not necessarily a bad option (would likely be a benefit in other areas also). It's something that is currently missing from the C++ core (although something similar exists for the Python pickle case).

Regarding use cases: the first issue (ODE integration) could likely be handled by automatically calling syncState (or something equivalent) at the end?

At the end of what? At the end of step() or advance(), if multiple reactors share the same Solution, which reactor should that Solution's state correspond to?

Whenever accessing ReactorBase.thermo, we already have a restoreState in place, so that should be covered?

This is the main problem: While the thermo property in Python does this synchronization, there's nothing that prevents the user from inadvertently accessing the Solution object directly without using the thermo property.

Don't read too much into my pseudocode structure. I wrote it with some of the functions effectively "inlined" for what I was hoping would improve clarity, but perhaps didn't. I fully expect the manipulation of the ThermoPhase object to be encapsulated within a method of ReactorBase, which will probably still be updateState.

This is the main problem: While the thermo property in Python does this synchronization, there's nothing that prevents the user from inadvertently accessing the Solution object directly without using the thermo property.

This is indeed the crux of the problem. From where I stand, it is poor practice if a user has expectations that a shared object should hold a specific state (or attempts to use that implicit pathway to update reactor contents). It is unfortunate that the current design paradigm allows for this. It is getting 'abused' and people report 'unexpected behavior'. Overall, I don't think that this is a path we should encourage.

My concern is that disallowing shared objects in some cases will break existing code (example: it is fairly common to use the same Solution object to initialize several reactors).

As we're already breaking things, we may as well create a clean break and shift to a different paradigm where Solution objects are copied, and cannot be shared (i.e. ReactorBase calls a copy constructor for a new Solution object, rather than attaching the previously created shared object). I would prefer this path as this would not break code where one object is used to initialize multiple reactors (a legitimate use case), while putting an end to bad coding practice (expecting a magic link of Reactor and Solution objects that are not ReactorBase.thermo). This approach would then also apply to 1D objects as well as SolutionArray objects.

@ischoegl commented in https://github.com/Cantera/cantera/pull/1788#discussion_r1752775764:

Could you expand? My thinking was a strict hierarchy where the ReactorNet is responsible for Reactor, which themselves are responsible for Solution. I am not sure that I see why we should give ReactorNet the ability to muddle with reactor contents?

The reason I suggested having ReactorNet manage cloning Solution objects is based on the concept that I initially suggested in this Enhancement: that we clone Solution objects only when there are multiple reactors sharing the same Solution. There are a couple of reasons for this:

I expect that the majority of usage of the reactor network model does not make use of Solution objects shared between reactors, with many networks only consisting of a single reactor. In this case, cloning the Solution object introduces unnecessary overhead. While it's possible that overhead is small compared to actually integrating the system, I'd rather not have the Solution <-> AnyMap code become too performance critical.
If we always clone the Solution objects, then we are requiring the Solution <-> AnyMap round trip to be fully implemented for every model that makes up a Solution, which I think is a substantial expansion of the set of functions that models must implement. While this is mostly (bot not entirely) satisfied by the classes that are built into Cantera, I think it would be a burden to require this for all user implemented models.

The reason I suggested having ReactorNet manage cloning Solution objects is based on the concept that I initially suggested in this Enhancement: that we clone Solution objects only when there are multiple reactors sharing the same Solution. [...]

I am not sure that I agree: from my perspective, a lot of the issues with 'traps' this issue is documenting arise due to allowing 'sharing' at all. As I have stated in an earlier comment above, I think that the expectation to create a Solution, link it into a Reactor and then build code with the expectations that modifications of either the former directly, or the latter via the thermo property ‘just works’ is fundamentally flawed. The need to use syncState etc. imho makes the overall implementation brittle by design.

I expect that the majority of usage of the reactor network model does not make use of Solution objects shared between reactors, with many networks only consisting of a single reactor. In this case, cloning the Solution object introduces unnecessary overhead. While it's possible that overhead is small compared to actually integrating the system, I'd rather not have the Solution <-> AnyMap code become too performance critical.

Most of the instances where a Solution object is recycled are for Inlet, Outlet, Reservoir, etc. where no Kinetics is required. I believe that by creating partial Solution (without Kinetics) the overhead can be somewhat reduced. Overall, I strongly believe that a small added overhead is a price that is well worth paying to have 'cleaner' and thus more robust code.

If we always clone the Solution objects, then we are requiring the Solution <-> AnyMap round trip to be fully implemented for every model that makes up a Solution, which I think is a substantial expansion of the set of functions that models must implement. While this is mostly (bot not entirely) satisfied by the classes that are built into Cantera, I think it would be a burden to require this for all user implemented models.

I think that we should consider a switch of paradigms here. I believe that we are 'oversharing' some of the objects we use. In the case of Solution, a unique_ptr may be preferable. Regarding copying objects, I think that - once we get rid of all the raw pointers - we should be able to start allowing for regular copy constructors (rather than the AnyMap serialization route). Regardless, I believe that having round-trip conversion for Solution implemented for all models isn't necessarily a bad goal to have.

I think that overall, there appear to be some fundamental issues that need further discussion. I'd suggest to shelve Cantera/cantera#1788 for the time being. While this may prolong deprecation cycles, I don't want this to unnecessarily delay the release of 3.1.

As a friendly amendment to the evaluation logic posted on top, I am proposing the following pseudo-code, which assumes existence of universal Connector objects, per Cantera/cantera#1788:

def eval(t, y, ydot):
    # ReactorNet::updateState
    for reactor in all_reactors:
        # Reactor.updateState
        reactor.thermo.set_TPY(y[...])
        # Calculation of reactor-local variables is optional, for efficiency only

    # new handling of reactor interactions
    boundary_transfers = np.zeros_like(ydot)
    for connector in all_connectors:
        # update interactions across reactor boundaries
        connector.update(boundary_transfers[...])

    # ReactorNet::eval
    for reactor in all_reactors:
        # Evaluation can directly access thermo from this reactor only (!)
        ydot[...] = f(reactor.thermo, reactor.kinetics, reactor.local_vars) 

    ydot += boundary_transfers

This approach would ensure that cyclic pointer references are avoided by design. The underlying ReactorNet has shared_ptr to all Reactor and Connector objects. Connector objects hold shared_ptr to their neighboring reactors as well as their associated thermo objects to enable efficient calculation of fluxes, but Reactor objects receive net transfer (fluxes times area) from ReactorNet (if they even need them at all). Reactor objects can still hold weak_ptr to connectors for non-critical queries (i.e. whatever is not used during time integration).

Beyond, I believe this would also be a workable approach to resolve #202. Considering #61, there is the interesting question on how to handle reactor surfaces ... while I had reclassified ReactorSurface as a node in Cantera/cantera#1788, there is an alternative where it could be recast as a Connector (which would share some attributes with an inert Wall) associated with a node. It becomes even more interesting for ReactorEdge as envisioned in #203 / Cantera/cantera#1697. PS: I believe that #202 would involve redistributing current ReactorSurface methods between the 'node' and a future 'connector' variant.

Regarding copying objects, I think that - once we get rid of all the raw pointers - we should be able to start allowing for regular copy constructors (rather than the AnyMap serialization route). Regardless, I believe that having round-trip conversion for Solution implemented for all models isn't necessarily a bad goal to have.

Providing regular copy constructors would introduce a ton of additional boilerplate code, not to mention testing requirements. I'd really like to see how effective the AnyMap serialization route is first. I disagree that getting rid of all use of raw pointers should be a goal for Cantera or in C++ in general. The important thing is to avoid manual management of object lifetimes using new and delete directly, which we already do. Beyond that, both shared_ptr/unique_ptr and stack allocation of objects provide good ways of managing object lifetimes.

I agree that we should provide working serialization/round trip capabilities for all models that are provided as part of Cantera. What I'm concerned about is creating a technical requirement for that capability that applies not only to the built-in models but all user-provided models as well (including those implemented using ExtensibleReaction). If a user has to implement full serialization / deserialization just to write the thermodynamic model equivalent of "hello world", I think we're creating a pretty rough barrier to entry, and that's the opposite of my goal of making it easier for users to introduce their own models.

Most of the instances where a Solution object is recycled are for Inlet, Outlet, Reservoir, etc. where no Kinetics is required. I believe that by creating partial Solution (without Kinetics) the overhead can be somewhat reduced. Overall, I strongly believe that a small added overhead is a price that is well worth paying to have 'cleaner' and thus more robust code.

That's true that you could avoid cloning Kinetics objects for Reservoir objects, but you'd still have to do so for the single Reactor that is present in many reactor network simulations. Again, I think once we have an AnyMap-based clone function, we can do a little bit of performance testing and then we'll have an idea of what the overhead of this looks like for typical reactor network simulations.

This approach would ensure that cyclic pointer references are avoided by design. The underlying ReactorNet has shared_ptr to all Reactor and Connector objects. Connector objects hold shared_ptr to their neighboring reactors as well as their associated thermo objects to enable efficient calculation of fluxes, but Reactor objects receive net transfer (fluxes times area) from ReactorNet (if they even need them at all). Reactor objects can still hold weak_ptr to connectors for non-critical queries (i.e. whatever is not used during time integration).

I like the idea of moving the calculation of Connector "transfers" out of the specific reactors. Among other things, this approach has the advantage of only evaluating each connector once, unlike the current approach where functions like updateMassFlowRate are called by both the upstream and downstream reactor each time the governing equations are evaluated. The one question that comes to mind for me is how to extend this approach to allow for calculation of the cross-reactor Jacobian terms, which was started in Cantera/cantera#1634 but not quite finished.

Beyond, I believe this would also be a workable approach to resolve #202. Considering #61, there is the interesting question on how to handle reactor surfaces ... while I had reclassified ReactorSurface as a node in Cantera/cantera#1788, there is an alternative where it could be recast as a Connector (which would share some attributes with an inert Wall) associated with a node. It becomes even more interesting for ReactorEdge as envisioned in #203 / Cantera/cantera#1697. PS: I believe that #202 would involve redistributing current ReactorSurface methods between the 'node' and a future 'connector' variant.

Yes, I can see how this approach to the connectors makes a ReactorSurface into a type of Connector sitting between two Reactors (or a Reactor and a Reservoir representing an inert bulk material that the user may not actually have to define). This would require extending the system to consider the idea that a Connector can have governing equations that have to be solved as part of the system, but I think that's a generalization that we've needed for a while, to support for example things like walls with inertia. I'm not quite sure where that would leave ReactorEdge, though, as it is effectively a connector between three surfaces and three bulk phases where transfers among all of those phases are resolved simultaneously -- that doesn't seem to fit cleanly with the idea of a Connector only linking two ReactorNodes.

@speth ... thanks for your constructive comments. I believe that overall it should be relatively straightforward to work out a consensus for a path forward. I am totally :+1: with sticking to the clone approach. Regarding raw pointers, you are correct that we have already eliminated most owning raw pointers; I still like the idea of moving towards smart pointers where it makes sense. Stack allocated memory is really the only place where there's no good way around, although some of your tests from a couple of years ago indicated that there wasn't much of a speed penalty for cases where it would really matter (see https://github.com/Cantera/cantera/issues/1211#issuecomment-1065991448).

Cantera / enhancements

Eliminate traps associated with accessing ThermoPhase objects in use by Reactors #201