`mech_info` and/or ACC should store originating catalogue of mechanism.

thorstenhater commented 3 years ago

Example

  (decor
    (paint
      (region "soma_group")
      (mechanism "hh"
        ("n" 0.000000)
        ("h" 0.000000)
        ("el" -54.300000)
        ("gnabar" 0.120000)
        ("gkbar" 0.036000)
        ("gl" 0.000300)
        ("m" 0.000000))))

It is unclear from which mechanism catalogue this mechanism was obtained as multiple catalogues can define the same name. Therefore, de-serialising this acc snippet is ambiguous. Concrete examples exist in BBP/Allen, but the situation will get worse as soon as we have more user-defined catalogues.

Options to fix

ACC stores the catalogue(s) similar to this

(catalogue
(merge ("bbp" "bbp") ("" "default")))

and

(paint
  (region "soma_group")
  (mechanism "bbp/NaP"))

mech_info and its serialisations store the originating catalogue.

Regardless, the question of how to deal with user-imported catalogues is a tricky one. The name might not even be defined within arbor-core, but loaded at runtime. Consequently one would have to deliver the catalogues along the ACC file, raising issues of binary compatibility etc.

halfflat commented 3 years ago

Outside the Arbor-provided catalogues, we don't have any control over mechanism names or the names of the catalogues. For Arbor, we can reserve ourselves some prefixes or catalogue names, and use those, but they can't help more generally.

One partial solution is to use the mechanism fingerprint, which has never been fully developed, but which was designed to help solve this problem. More generally though, we should include metadata alongside the mechanism name to disambiguate, but the name of the catalogue whence it came is not sufficient, nor as far as I can see stable enough to provide much help.

It'll be important to decide what we want the mechanism name+metadata to determine:

Full reproducibility in Arbor? I think this is too much: it would demand all sorts of extra information, and this goal would be immediately compromised by the use of a different version of Arbor, or modcc, or hardware platform.
Information sufficient to confirm that the named mechanism comes from the same NMODL source? Fingerprint would do this, but given that our NMODL sources differ from those used in NEURON for the 'same' mechanism, this wouldn't help confirm the same mechanism is used across simulators.
The model that the mechanism represents? This would require a reference to something external to Arbor (and NEURON), and which is not tied to an NMODL description. Some sort of PID refering to a paper or ModelDB entry or something like this would do the job, but we would have to add this metadata to our NMODL model descriptions so that we can present it to users.

My initial feeling is that we should include fingerprint (which can be deliberately ignored at runtime upon user choice) plus room at least for a PID to an external resource that constitutes the 'ground truth' for the model.

thorstenhater commented 3 years ago

Let's focus on the built-in catalogues for a first solution. The general solution is hard, as stated. Observations:

To make a simulation, a catalogue needs to be supplied which contains all mechanisms used in the simulation.
In that catalogue, each (prefix, name)needs to be unique.
Thus, we can use the method outlined as 1. above for builtin/bundled catalogues.
This is needed, as the built-in Allen and BBP catalogues already overload names.

For the general case, one could imagine using the fingerprint, or even serialising the NMODL source in the ACC file.

thorstenhater commented 3 years ago

Although your comment that we do not have control over the names used on external catalogues is correct, we do know that the final catalogue used to make a cable-call, which is composed of multiple builtin/imported catalogues, is well-formed in the sense of uniquely prefixed names. Then the question of course becomes how do we uniquely identify (versioned!) catalogues. A suggestion based on our recent additions: We could use a tagged commit in a repo under arbor-contrib.

halfflat commented 3 years ago

It's still important to clarify what problem this is solving. For example, we make no guarantees that the same mechanism implementation is used between different versions of Arbor for its built-in catalogues. Conversely, if a mechanism specifies "hh", should it be tied to an "hh" in a particular catalogue, or should it match whatever is being supplied in the simulation code?

One thing we should make sure of though, is that we don't have mechanisms with the same name, in different Arbor-supplied catalogues, that model different behaviour.

thorstenhater commented 3 years ago

Ok, you are aiming for something stronger than I had in mind originally. Solving what you are discussing would require that we tie our versioning to the builtin-in catalogues, too. And the scheme sketched above would need to include the arbor-version (down to a specific commit, possibly).

halfflat commented 3 years ago

I just don't want to implement a half-solution. But the key question really is: what do we want to solve? Then we'll be able to determine what to do.

thorstenhater commented 3 years ago

The problem I stumbled upon while making another step on the GUI was that the serialisation is ambiguous. This is what the issue states, first and foremost. I assume there is little disagreeing with that.

thorstenhater commented 3 years ago

I will make the radical proposal here to separate arbor-the-library from the builtin catalogues and make those into separate projects/repos, even the default catalogue. Then these can be linked to arbor-core as submodules and versioned separately. Identification of the catalogue becomes a (repo-id, commit-hash). Also it decouples arbor further from NMODL.

halfflat commented 3 years ago

ACC doesn't solve the serialization problem, and this was intentional. It doesn't contain discretization information, for example, or run-time configuration (dt, domain decomposition, etc.).

The GUI is going to have to store other information outside of ACC, and to solve this particular problem, that should include what catalogues are loaded in with whatever prefixes.

thorstenhater commented 3 years ago

You are saying that this ambiguity is by design and not in scope of ACC?

thorstenhater commented 3 years ago

My problem with that stance is twofold

We cannot round-trip Cable-Cell -> ACC -> Cable-Cell and expect the same cable cell back
The information in ACC is incomplete and ambiguous.

halfflat commented 3 years ago

ACC attempts to separate out the notion of a cell description. And so it is incomplete, because we need more than that to run a simulation. Discretization in particular stands out as an example. But you can indeed otherwise round-trip ACC / Cable cell — it's just not enough to guarantee same simulation behaviour, and we had been discussing alternatives for specifying discretization earlier, which we really should return to.

At the level of the cable cell description object, mechanisms are just names and (non-global) parameters. And this is what ACC currently captures.

thorstenhater commented 3 years ago

I would argue that it makes a huge difference whether I have a foo/pas which is a leak current and bar/pas which might do something completely different and that this should be part of the cable cell.

halfflat commented 3 years ago

If we care about their function, we've got to give them metadata that reflects that (see above!). A catalogue prefix will not solve that problem.

thorstenhater commented 3 years ago

Yes it will, if we also uniquely identify the catalogues and their composition w/ prefix (see above). If we just have the fingerprint, how does one find the catalogues where it lives?

halfflat commented 3 years ago

Because people can bring in catalogues under any name, or different catalogues can provide the same notional dynamics with different implementations. The catalogue name also has little semantic content: how does foo/pas vs bar/pas tell me how one might be a passive current, and the other deals with chlorine ions?

Fingerprints tell us if the mechanism matches an expected implementation. PIDs to human readable mechanism descriptions tell us what the mechanism is supposed to do. The catalogue name does neither.

thorstenhater commented 3 years ago

Again, I do not disagree. However, I am not talking about the catalogue name (which has the same problem as mechanism names), but rather a unique ID (which in the example above where just examples and also builtin catalogues where names might be treated specially), the prefix in the composite catalogue, and the mechanism name. This uniquely identifies each mechanism, correct? (Assuming a sufficiently precise catalogue ID).

To re-iterate: For builtin catalogues names might be enough, under version constraints, for external catalogues we need more precise information. See above for an example how such an ID might be constructed.

halfflat commented 3 years ago

Just to elaborate on my point above: with the current design of mechanisms, the mapping of names to functionality lives outside the model description as presented by cable cell descriptions in a recipe. Short of including a description of the dynamics in the model description itself, which we currently cannot do, the best we can offer is the ability to verify that we're simulating what we expect.

Consider three scenarios:

I want to use the "hh" mechanism as offered by NEURON and as offered equivalently by Arbor. I don't care which implementation is used. So I use "hh" in the cell description, and ensure that the catalogue is set up so that that name corresponds to that set of dynamics. A mate comes along and offers some super-optimized implementation of "hh"? Great, I'll load that one in instead. Adding a catalogue name here would interfere with this sort of use.
My cell is using a specific model of a particular sodium channel. This sodium channel has multiple mathematical models in the literature, but I want one specifically. Calling it "Nav1.2" doesn't cut it then, and I'll need a more specific name or metadata that specifies that model.
I'm trying to reproduce a particular behaviour across different versions of Arbor. Here, I care that the mechanism is described by the same implementation.

I don't think catalogue IDs per se are going to cut it: the unit of difference, the thing we care about, is the mechanism itself. On one hand, simple names describe some measure of intent, and allow a cell description to be used across different mechanism models or implementations of the same biophysics, without modifying that description. On the other, when we do care about implementation or the specific mathematical model, the catalogue, even with an ID, isn't really telling us anything: we would have to then look up the exact same validation information in the context of that catalogue, which is one step removed from the mechanism we care about.

For serialization in the GUI or other context, the problem is different: there has to be a way of specifying what the catalogue environment looks like. It's sort of the environment in which these names are interpreted. But that can be done without changing the names of mechanisms in ACC, or adding catalogue name information in ACC.

thorstenhater commented 3 years ago

Just to elaborate on my point above: with the current design of mechanisms, the mapping of names to functionality lives outside the model description as presented by cable cell descriptions in a recipe. Short of including a description of the dynamics in the model description itself, which we currently cannot do, the best we can offer is the ability to verify that we're simulating what we expect.

Consider three scenarios:
1. I want to use the "hh" mechanism as offered by NEURON and as offered equivalently by Arbor. I don't care which implementation is used. So I use "hh" in the cell description, and ensure that the catalogue is set up so that that name corresponds to that set of dynamics. A mate comes along and offers some super-optimized implementation of "hh"? Great, I'll load that one in instead. Adding a catalogue name here would interfere with this sort of use.
It does (not?) matter to you whether "hh" refers to some kind of Hodgkin-Huxley dynamics mechanism? If you have an improvement over an old implementation, great, load the ACC, paint over the old one, dump it again or run it from here.
2. My cell is using a specific model of a particular sodium channel. This sodium channel has multiple mathematical models in the literature, but I want one specifically. Calling it "Nav1.2" doesn't cut it then, and I'll need a more specific name or metadata that specifies that model.

Yes, but if you interact with this model in arbor, you probably have an implementation as NMODL, but for sure some catalogue of at least this model. Possibly you even have a split said model into more than one NMODL file. So, having a catalogue with a unique ID and version (in which you are free to name this thing NaV) specifies exactly what you need. Put it on GH, cite the paper in the README, and everything is clear.

3. I'm trying to reproduce a particular behaviour across different versions of Arbor. Here, I care that the mechanism is described by the same implementation.

Yes. Also solved by the method above.

I don't think catalogue IDs per se are going to cut it: the unit of difference, the thing we care about, is the mechanism itself. On one hand, simple names describe some measure of intent, and allow a cell description to be used across different mechanism models or implementations of the same biophysics, without modifying that description. On the other, when we do care about implementation or the specific mathematical model, the catalogue, even with an ID, isn't really telling us anything: we would have to then look up the exact same validation information in the context of that catalogue, which is one step removed from the mechanism we care about.

But the granularity of interaction is at the catalogue level. There is no way of loading a mechanism without a catalogue. In the glorious future, hh might be a thin veneer over an aggregate of three channels. Catalogue IDs should change with semantic versioning, so it would be up to the catalogue/mechanism author to decide whether to bump it or not.

For serialization in the GUI or other context, the problem is different: there has to be a way of specifying what the catalogue environment looks like. It's sort of the environment in which these names are interpreted. But that can be done without changing the names of mechanisms in ACC, or adding catalogue name information in ACC.

True, this can be done, but I think it separates pieces of information belonging together. hh from different catalogues might be very different things and I am still convinced that this information is important to the cell, not the environment. It is not a 'matter of interpretation' whether hh has two or three channels, how these channels are called, etc

halfflat commented 3 years ago

The catalogue is the wrong place to make these guarantees, because the catalogue is a bag of mechanism metadata and mechanism implementations. The catalogue interface allows new mechanisms to be added on the fly, and mechanisms with different global parameters or ions to be derived on the fly.

Mechanism metadata, which is what this is all about ultimately, belongs in the provided mechanisms on one hand, and with the requested mechanisms on the other.

arbor-sim / arbor

`mech_info` and/or ACC should store originating catalogue of mechanism. #1482