Open jonc125 opened 5 years ago
Need probably many-to-many links between ExperimentalData and Protocol, and only once a fit is performed do you tie down exactly which were used? @MichaelClerx ?
Not sure what you mean here! Would an ExperimentalData
be a single time series? (Or a single matrix, more abstractly)?
ExperimentalData
is the wet lab analogue of Prediction
(currently called Experiment
), i.e. a complete dataset with potentially multiple matrices (CSV files). So it's everything you need to compare your prediction against to do a complete fit.
Ideally I'm assuming you'd have one-to-many correspondence between a protocol version and experimental data (since you can repeat the experiment to get new datasets...) but realistically people are going to develop their own protocols to improve the match to the wet lab procedure, or clone existing ones, etc. Hence many-to-many.
Related to this, we need to think about versioning of datasets. Do we want ExperimentalData to have versions in the same way Prediction does (i.e. re-run exactly the same protocol on exactly the same model - results might still differ due to new code version or stochasticity)? Would versions differ only in metadata, or is changing files allowed? Or should new files require a new ExperimentalData object? (With a many-to-many mapping this isn't really a problem, providing you name things sensibly and/or have good search!)
I think an update in the Protocol
version that data is linked to should probably create a new version of the ExperimentalData
too. To force re-runs etc. with the refined protocol description to better match how the experiments were done.
But probably need to stop and think for a little while whether we want:
FittingSpec
to work with one ExperimentalData
(which may include many protocols); FittingSpec
to work with multiple ExperimentalData
s that each are linked to one Protocol
.I think I would lean to the second case, so that we can keep a simple one-to-one between ExperimentalData
and Protocol
not least to make it easy to think about. Lots of ExperimentalData
can share the same Protocol
, but I think they would only name one.
ExperimentalData is the wet lab analogue of Prediction (currently called Experiment), i.e. a complete dataset with potentially multiple matrices (CSV files). So it's everything you need to compare your prediction against to do a complete fit.
Would it be a better idea to have some kind of FittingData
that's a view of a larger ExperimentalData
set? For example, Kylie's data for a single cell contains multiple time series, and we use different ones in different fits. Even overlapping ones should be possible, so maybe we could have a FittingData just be a list of pointers to things inside our new combine data files?
At the workshop we talked about hierarchical data sets with annotations, and that's still something I'd like to implement in the combine format. (In fact, we have the prototype working, and schedule wednesday to work on that further). For example, Kylie's data set would have a whole bunch of meta data, but then there'd be 7 subdirectories that added protocol-specific meta data, and then 9 subdirectories in each that added cell-specific meta data (temperature and capacitance)
It would make sense to me to then have a FittingData or something that points to this massive bulk of data, instead of having lots of copies with their own meta data copies etc. etc.
Related to this, we need to think about versioning of datasets. Do we want ExperimentalData to have versions in the same way Prediction does (i.e. re-run exactly the same protocol on exactly the same model - results might still differ due to new code version or stochasticity)? Would versions differ only in metadata, or is changing files allowed? Or should new files require a new ExperimentalData object? (With a many-to-many mapping this isn't really a problem, providing you name things sensibly and/or have good search!)
Would say every run of an experiment is a new data set! Changing files would be allowed though, if we discover e.g. the pre-processing was wrong?
Related to that, I would love to have links between data (a semantic web of data sets) so that we can say things like "Data B by Gary is a processed version of Data A". I'm not saying we should have the web lab do the processing, just the ability to document these links would be incredibly useful. In many cases we'll even have "Data B, made available by a modeller, is a processed version of Data A, which has a publication but no online data set"
I think an update in the Protocol version that data is linked to should probably create a new version of the ExperimentalData too.
Not sure I follow this - where would we get this new experimenta data?
But probably need to stop and think for a little while whether we want:
- one FittingSpec to work with one ExperimentalData (which may include many protocols);
- or whether we want one FittingSpec to work with multiple ExperimentalDatas that each are linked to one Protocol.
I think I would lean to the second case, so that we can keep a simple one-to-one between ExperimentalData and Protocol not least to make it easy to think about. Lots of ExperimentalData can share the same Protocol, but I think they would only name one.
I think I'd also prefer the second! So 1 protocol = 1 time series, that's the way experimenters use the term so I strongly believe we should stick to that (even if multi-dimensional arrays are cleaner)
The way I see it is that a FittingSpec refers to exactly one each of ModelVersion mv, ProtocolVersion pv, and ExperimentalData ed. Running pv on mv with a given parameter set produces a Prediction p (when done as part of a fit, this isn't stored in the WL of course, only the overall result of a fit is). A Prediction may include many outputs, because it's not just doing the time-series simulation, it's doing all the post-processing, and in general it may be the post-processed outputs you compare against post-processed ExperimentalData (though you can also fit to raw data). So a FittingSpec will always also need to state which outputs from Prediction p get compared to which specific series (or other data) from ed. So ExperimentalData could have multiple datasets corresponding to multiple (even different) protocols within it, and the FittingSpec selects those of interest. This also matches the direction SED-ML is taking with linking to data.
Now, it's probably worth distinguishing at this point between what we want to get running for June 3rd and what it'll eventually look like.
So 1 protocol = 1 time series, that's the way experimenters use the term so I strongly believe we should stick to that (even if multi-dimensional arrays are cleaner)
But our protocols are not the same as experimental protocols, since they incorporate post-processing, e.g. producing I-V curves.
A Prediction may include many outputs, because it's not just doing the time-series simulation, it's doing all the post-processing, and in general it may be the post-processed outputs you compare against post-processed ExperimentalData (though you can also fit to raw data).
Can we draw a diagram or write a short outline of this or something? What are we calling the result of running a "voltage protocol" on a cell? What are we calling post-processed data from such a protocol? And what are we calling a grouping of such results? Are you using ExperimentalData to mean an instance, or a class of things?
Sorry I'm getting really confused here
But our protocols are not the same as experimental protocols, since they incorporate post-processing, e.g. producing I-V curves.
That's fine! I'm sure no-one minds referring to a voltage step sequence + post processing method as a "protocol", e.g. an activation protocol, an IV-curve protocol, etc. I just don't like the idea of an "ExperimentalData (which may include many protocols)". Surely that would be a data set or something?
Looking back it's this bit already:
ExperimentalData is the wet lab analogue of Prediction (currently called Experiment), i.e. a complete dataset with potentially multiple matrices (CSV files). So it's everything you need to compare your prediction against to do a complete fit.
I would call that a set of predictions, compared against a set of experimental results
Names are annoying!
I thought we'd managed to agree at Harmony last year :( The decision there was to have the following as Web Lab concepts - each are more like classes than instances:
Are these at least the concepts we need, even if the names might need refining?
Then, since there are some commonalities, we have some base classes:
I think an update in the Protocol version that data is linked to should probably create a new version of the ExperimentalData too.
Not sure I follow this - where would we get this new experimenta data?
No new data, just a new version of the entity containing the data that is linked to the updated protocol. So that anything that used that data can see that it has been updated, and needs to be re-run with the updated protocol!
OK, what Gary is on about on the other ticket is calling Result in the above diagram Fit!
I don't think so - Result in the above diagram has nothing to do with fitting, it's the data arising from wet lab experiments.
Oh, yeah, sorry forget that!
I think an update in the Protocol version that data is linked to should probably create a new version of the ExperimentalData too.
Not sure I follow this - where would we get this new experimental data?
No new data, just a new version of the entity containing the data that is linked to the updated protocol. So that anything that used that data can see that it has been updated, and needs to be re-run with the updated protocol!
I was thinking of supporting this kind of thing by having a FittingResult reference the protocol version, model version, dataset, etc. used to create it. So if a new version of any of those is created, you could potentially flag to users that they might want to refit.
Decided to go for one ExperimentalDataset links to at most one Protocol for now (possibly zero @mirams @MichaelClerx?). But multiple datasets can link to the same protocol.
Sounds good!
(And yes possibly zero!)
I've updated the issue description with priorities for the workshop. @MichaelClerx @mirams any comments on these, particularly wrt ordering of the last 2 points and what the data compare UX flow should be?
Probably not a huge priority for next week, if the point above (data with predictions) enabled us to include more than one dataset that was linked to the protocol then that would enable the same thing.
Again for now, all available data could pop up in the plot legend, maybe unselected, and then user could click to view each one manually, that would be fine.
Looks good to me!
This may still get split into sub-issues...
[x] Figure out what is common with Experiment (cf #130) and put in a mixin class / common 'dataset' app (see also https://github.com/ModellingWebLab/project_issues/wiki/Workshop-notes-2018 and #203)
[x] Need probably many-to-many links between
ExperimentalDataset
andProtocol
, and only once a fit is performed do you tie down exactly which were used? @MichaelClerx ?Add views to (these are probably almost identical to the entity views):
[x] Create an
ExperimentalDataset
model in its owndatasets
appEXPERIMENT_BASE
, with a corresponding config setting giving the root path.UserCreatedModelMixin, VisibilityModelMixin, models.Model
and needs aname
Protocol
- this should be set on creation, e.g. from a drop-down list[x] Create views to (these are probably almost identical to the entity views):
datasets:archive
view like inexperiments/views.py
datasets:file_download
view like inexperiments/views.py
[ ] Add ability to compare data with predictions
dataset-link
inexperimentversion_detail.html
andexperiment.js
[ ] Add ability to compare data with data
ExperimentComparisonView