Epic 4: Develop fitting spec & implement it using FC+PINTS

jonc125 commented 5 years ago

(Updated) to-do list

74 Design a Python/PINTS based "fitting spec" that will work for a sine wave fit, i.e. some kind of interface through which a python script can:
- Specify required protocol outputs
- [ ] could be ontology term + units,
- [ ] but what about post-processed "columns", e.g. tau vs V
- [ ] #63 obtain data (CSV columns) loaded by FC for the specified outputs
- [ ] obtain a runnable "protocol" object, that provides the specified outputs
- [ ] specify required model parameters / adjustables
- Create boundaries or priors
- [ ] on the parameters
- [ ] on other model variables (e.g. rates)
- Tweak simulation properties
- [ ] solver tolerance
- [ ] random seed?
- [ ] define an ErrorMeasure or LogPDF in PINTS
- Run an optimisation or inference problem and store the results
- [ ] Create a Controller, passing in the required method
- [ ] Tweak the method, if required! E.g. controller.sampler().set_special_setting(123)
- [ ] Run an somehow get results that WL can interpret again
Create something to run the fits
- ??? See below
Update WL front-end for fitting
- [ ] Place to specify fits
- [ ] Place to run fits and see results
- [ ] Initially don't give anyone permission to run them, eventually add sandboxing to whatever it is that runs the fits (https://github.com/ModellingWebLab/weblab-fc/issues/185)

Things that need to be captured by a fitting spec are:

Fitting method, priors, noise model
- Give (distribution+) bounds on e.g. "a rate parameter" "for" "Ikr", "b rate parameter" "for" "Ikr"
- The "for" "Ikr" bit will be done by filtering using the dependency tree, rather than annotating all parameters by which current they're for. So you annotate all "a rate parameter" variables, etc., and the Ikr variable, then use the extended dependencies for Ikr to filter all "a" variables to just the ones of interest.
- Need to check which bqbiol predicate to use for “a rate parameter”: is? isVersionOf? hasProperty? isProperty? Something else?
- Probably safer to have 4 terms: "forward rate a parameter", "forward rate b parameter", "backward rate a parameter", "backward rate b parameter"
Mapping from prediction output to dataset column - possibly automatic if prediction outputs are annotated? (i.e. add oxmeta annotations for protocol output specifications)
Boundaries / constraints
- E.g. say all "forward rate" "for" "Ikr" should be in [k_min, k_max]
  - So need “forward rate” and “backward rate” terms
Optional RNG seed?

See also #74.

MichaelClerx commented 4 years ago

~old to-do list was here~

MichaelClerx commented 4 years ago

Tests:

Kylie/Dom/Sanmitra #69
Chon #71
https://github.com/ModellingWebLab/project_issues/issues/47 (stretch goal)

jonc125 commented 4 years ago

To discuss: how much we want a spec 'language' for fitting (perhaps just a config file essentially, but supporting comments unlike JSON!), and how much just allow pints code. Cf sandboxing discussion in #61.

mirams commented 4 years ago

I think I am in favour of just pints code (I thought that was what we concluded in last face to face meeting). Please feel free to edit this list of pros and cons:

Pros:

One less thing in the conceptual hierarchy to think about
One less thing to learn how to write*.
More flexible, I can imagine the spec quickly getting out of hand to deal with weird and wonderful corner cases and unusual distributions/likelihoods etc. for anything that isn't least squares/Gaussian assumptions.
We wouldn't need to spend time on a spec, parser, UI to do the spec writing, raising nice errors to the user if the spec doesn't parse etc.

Cons:

*Maybe for really simple cases (least squares fit) learning Pints is harder than learning a simple spec command like Aidan's prototype, but it's one of those cases where we probably want to make people understand what they are doing behind the scenes - so maybe no bad thing to force them to look at the pints code.
We'd need to keep careful track of the version of Pints that was used quite explicitly at the top of the 'spec' (code), since the interface is still in flux. Whereas a 'spec parser' could be updated just once to keep track of latest pints release API. So if we allow pints code we'd need to make sure we can get hold of old versions to run the fitting. (when we discussed I think you both said numpy API was very stable so not as much of an issue there).
We'd need a 'local executable' for people to play with to write their pints code (but this would be useful anyway), then some way of ensuring versions of everything were compatible on server
Would we need some way of saying what annotated model parameters are available in python - some new WebLab wrapping Python interface to the data and the simulations for humans to use which hasn't existed before?
Sandboxing needed for Joe Bloggs to upload fitting specs.

jonc125 commented 4 years ago

I've added some thoughts on what a fitting spec needs to capture in the issue description - please edit!

MichaelClerx commented 4 years ago

Let's have another chat about this via skype then? I think I'm also in favour of fitting scripts. It gets complex really fast

MichaelClerx commented 4 years ago

@jonc125 I've updated the top post above with a tentative to-do. Now wondering about the approach for running fits:

FC as a simulation engine?

WL calls "runner"
"runner" sees something is a fitting spec, passes it to fit runner
fit runner uses FC as a tool
Something exists, somewhere, that can
- read model annotations, seeing which variables are available
- read FC model interfaces, seeing which variables are required
- read fitting specs, seeing which variables (and unnamed protocol outputs?) are required
- read data sets, seeing which variables (and unnamed processed outputs?) are provided

FC as an everything engine

WL calls "runner"
"runner" calls FC
FC sees it has a "fitting protocol", does fitting?
Something exists, in FC, that can
- read model annotations, seeing which variables are available
- read FC model interfaces, seeing which variables are required
- read fitting specs, seeing which variables (and unnamed protocol outputs?) are required
- read data sets, seeing which variables (and unnamed processed outputs?) are provided

Something like this? Neither?

jonc125 commented 4 years ago

Either approach should work, depending on how much you want separate libraries. The fitting runner would need to know a fair bit about FC, so it might make more sense to combine them.

There will be some improvements to fc-runner in any case to use the new weblab-fc protocol parser as a library to extract the protocol interface (now needing to include outputs as well - so this bit would overlap with a fitting use case) and send it to the Web Lab front-end when new protocols are uploaded. Currently it uses a rather hacky bit of code partially parsing the protocol! It will also then need to use cellmlmanip instead of pycml to determine model/protocol compatibility - again overlapping with processing needed for fitting. So these common features would sit either in cellmlmanip or weblab-fc to be used by other components.

MichaelClerx commented 4 years ago

Or should we define some common language? Some kind of manifest file that everyone can read that tells you what an entity is, what it needs, etc.? Would open the door to future additions!

MichaelClerx commented 4 years ago

(where by "language" I mean one or two standardised fields in an xml or json file)

jonc125 commented 4 years ago

Well, we could, but for models & protocols this information is already defined internally to those documents (or associated RDF files for models in due course). So you'd still need some software to extract it from there to a new common format.

MichaelClerx commented 4 years ago

For models it's going to end up separately though, presumably in a COMBINE archive. So it'd make sense for every component (model, protocol, data, fitting spec) to be a COMBINE archive too?

jonc125 commented 4 years ago

They already are COMBINE archives. Doesn't change the fact that for e.g. protocols the canonical list of outputs is in the protocol file itself, and would need to be copied into another file in the archive.

MichaelClerx commented 4 years ago

Yeah. I suppose parsing just the "most interfacy" bits of the model interface section isn't hard though, so might be good to have this outside of FC. I really like the idea of having it modular (even if just in principle), so that we could theoretically support other FC implementations (e.g. other domains, other simulation types)

ModellingWebLab / project_issues