Split CellML parsing & model API code more thoroughly?

MichaelClerx commented 4 years ago

The current Model class handles part of the CellML parsing, via connect_variables() and interface objects in the MetaDummy class. Moving this into the parser would make the model API easier to use / understand, but also pave the way for CellML 2.0 support

See also e.g. #81

Can we do this ignoring the directionality on interfaces? (Makes future CellML 2 support easier.)

Thoughts?

jonc125 commented 4 years ago

My approach would be to iteratively improve the API as needed to implement protocol features, without trying to fix it too much up-front; I don't think we understand the requirements well enough yet. Particularly the directionality aspect has more impact on the dependency graph generation than on the parsing I think.

MichaelClerx commented 4 years ago

But you collapse the two variables into one, in CellML 2, so there's only a single entity and no scope for directionality

MichaelClerx commented 4 years ago

(moved below Jonathan's reply)

jonc125 commented 4 years ago

Which we do already where there is no unit conversion, but if there is, you need to add an equation. And which variable do you make the LHS? Currently we assume all equations are assignments. But in general in CellML (even in v1! Just not in most cardiac models :grin:) equations are just equality declarations, and in code generation information might flow either way; there might even be systems of simultaneous equations to solve.

MichaelClerx commented 4 years ago

Anyway this issue arose from

iteratively improv[ing] the API as needed to implement protocol features

because add_variable looks like this:

def add_variable(self, *, name, units, initial_value=None,
                 public_interface=None, private_interface=None, **kwargs):

and now I'm trying to understand if I need to do anything with those public and private interfaces if I want to add a variable or not

jonc125 commented 4 years ago

and now I'm trying to understand if I need to do anything with those public and private interfaces if I want to add a variable or not

I'd hope not after the initial parse...

tamuri commented 4 years ago

I'd hope not after the initial parse...

Yeah, those are only used for resolving the connections. If you're adding a variable after loading the model, you don't have to give anything (unless you're intending to run connect_variables again with some new connections involving the new variable)

skeating commented 4 years ago

From experience this (separating parsing and manipulation functions) is an almost impossible task. You either end up with a whole load of private functions (parsing) which you gradually make public as manipulation becomes necessary and then people get confused that somethings they can do and some they cannot ! Or you have sort of duplicated code with slightly different names but essential same functionality.

However, if the parsing function (like add_variables) requires a whole load of extra arguments it may be worth having a private version for parsing and a public version for API; which merely calls the private version with dummy unused variables so we dont ever need to try and 'explain' arguments to the user.

MichaelClerx commented 4 years ago

However, if the parsing function (like add_variables) requires a whole load of extra arguments it may be worth having a private version for parsing and a public version for API; which merely calls the private version with dummy unused variables so we dont ever need to try and 'explain' arguments to the user.

I like that plan!

MichaelClerx commented 4 years ago

Also, if we're planning to exchange the parsing back-end for libcellml at some point, we'd get a clean split between parsing (now external), and an API for model manipulation and obtaining equations...

MichaelClerx commented 4 years ago

Closing this for now. Still some areas where I'd like a cleaner separation, but let's leave it as a style thing :D

MichaelClerx commented 4 years ago

Sorry I'm reopening this again in light of e.g. #135

Should we not have a workflow that's something like this?

Parse a CellML model (but don't validate!)
Do any kind of manipulation (possibly making an invalid model valid, or vice versa)
Say "I'm finished manipulating, please give me a graph", at which point the result of the CellML parsing AND manipulation is all validated and some kind of graph - possibly a distinct object from the Model - is returned?

I get very nervous with having parser convenience functions, model manipulation, and cached post-manipulation equations all in the same structure. Would it not be easier to split these things so that if I give you a certain type of model object, you know exactly what you can expect from it? (i.e. if it's a useful tool for parsing, a thing for manipulating, or a fixed and validated set of equations?)

jonc125 commented 4 years ago

I guess this touches on the reusing the same Model object issues that @MauriceHendrix was encountering?

MauriceHendrix commented 4 years ago

Probably. I no longer try to reuse models as even undoing changes I made didn't seem to be enough I could have missed something of course. Also starting with removing stuff if it's there is a bit counter-intuitive.

MichaelClerx commented 4 years ago

Have done a bit of this in #215 (the UnitStore no longer knows about CellML unit definitions: The parser now handles conversion from CellML attributes to a pint unit string)

MichaelClerx commented 4 years ago

I've been running some of my CellML test files through cellmlmanip, and the error messages can be pretty obscure, I'm thinking that's in part due to our approach of

CellML file --> Cellmlmanip model --> validation
             |
      some early validation

where "cellmlmanip model" lacks a lot of the concepts that exist at the CellML level. And rightly so! But this makes some of the errors unavoidably weird. I'm thinking in the long-run the only way to really solve this would be to do

CellML file --> CellML model in memory --> syntactic validation --> Cellmlmanip model --> semantic validation
             |                                      |            |                                  |   
      some early validation                    Most errors       Compatibility errors            Very rare, only stuff like
                                                                 E.g. we don't like DAEs"        "this system will give numerical errors"

I'm not sure using libcellml would solve this either, as the approach there is

CellML 1.0 file --> CellML 2.0 file --> libcellml
                 |                   |
              Mystery            Hopefully
              errors          amazing errors

where "mystery errors" should be read as "errors that make sense if you're using an XSLT to create CellML 2 from 1 but are very surprising if you think you're running a simulation"

MichaelClerx commented 4 years ago

(And yes none of this matters for the current goal of getting things done, but user friendly error message is near the top of the list for the bigger goal of getting an entire community to adopt new tools to work more reproducibly, so I'm thinking ahead in the evenings...)

MauriceHendrix commented 4 years ago

I do agree user friendly errors are important for community adaptation and to some extend even for our own debugging. Which is why I've tried to get some easier to understand ones where missing metadata tags causes codegen errors.

Having said that, there is always a trade-off with how much work they are, plus introducing another library with it's own bugs and dependencies also has it's chellanges. Maybe there are some easy wins in terms of better messages for things that already fail, even if it only hints as a few possible causes for example.

MichaelClerx commented 4 years ago

(We're also using the RelaxNG validation before we do anything ourselves, which picks up a lot of issues and saves us work, but can produce some really unhelpful error messages)

ModellingWebLab / cellmlmanip

Split CellML parsing & model API code more thoroughly? #120