AMICI-dev / AMICI

Advanced Multilanguage Interface to CVODES and IDAS
https://amici.readthedocs.io/
Other
108 stars 31 forks source link

Extend ExpData #262

Closed FFroehlich closed 5 years ago

FFroehlich commented 6 years ago

We want to extend the ExpData class

yannikschaelte commented 6 years ago

One side-goal could also be to avoid repeated forward integration in case of multiple replicates (same dynamical parameters) by passing multiple ones at the same time to AMICI. This would accelerate the adjoint approach. Since we usually have few repetitions, the impact of this would however be limited.

FFroehlich commented 6 years ago

Regarding the issue of repeated forward integration, this could probably implemented by simply storing final x and sx [if available] for every condition vector. If we store this in a hashtable (hash from ExpData object), we could dynamically check if respective x and sx have been computed and or whether presimulation has to be performed. The question is where this hashtable should be stored. Although it does not 100% belong there, the Model class is probably the best place to store it, as it can persist across multiple calls to runAmiciSimulation (this would work before we actually implement vectorization of ExpData).

dweindl commented 6 years ago

If we cache that in Model, it would have to be invalidated on changes of/to Solver. This looks quite difficult (and undesirable) to me. I would only reuse during a single call to runAmiciSimulation.

FFroehlich commented 6 years ago

I agree, lets not put this in Model but create this in amici.cpp.

dweindl commented 6 years ago

Are there already any ideas/efforts for allowing replicate measurements? (Same timepoint, same observable, different value) I think this could be quite helpful.

One side-goal could also be to avoid repeated forward integration in case of multiple replicates (same dynamical parameters) by passing multiple ones at the same time to AMICI.

@yannikschaelte You are suggesting to handle that via arrays of ExpData?

FFroehlich commented 6 years ago

Actually, replicate measurements, i.e., multiple datapoints for the same timepoint, are already allowed and should work just fine :)

dweindl commented 6 years ago

:flushed:

yannikschaelte commented 6 years ago

but the loop is done in python, right?

dweindl commented 6 years ago

I meant C++, without resimulation. Is this already possible?

yannikschaelte commented 6 years ago

cause having the loop in c++, one can also speed up things by doing just one forward simulation with adjoints

FFroehlich commented 6 years ago

I think so, simply have the same timepoint in edata.ts twice. For the second timepoint nextTimepoint > model->t0() will be false and handleDataPoint(it) will be called without additional simulation.

I also just figured this out a couple of days ago when specifying an according edata by accident 😅

yannikschaelte commented 6 years ago

You are suggesting to handle that via arrays of ExpData?

would be easiest

FFroehlich commented 6 years ago

I did not check correctness of adjoint sensitivities yet, but even if it does not work yet, it should be easy to fix.

dweindl commented 6 years ago

Ah, since ExpData::timpoints overrides Model::timepoints. Okay. Good to know :) :+1:

FFroehlich commented 6 years ago

I think having everything in a single ExpData instance is way easier than weaving together multiple ReturnData results from an array of ExpData (although we might have to that at some point anyways)

dweindl commented 6 years ago

Maybe worth adding a note in ExpData, that timepoints have to be sorted but don't have to be unique.

I think having everything in a single ExpData instance is way easier than weaving together multiple ReturnData results from an array of ExpData.

Totally agreed. Question would have been if we want something like we apparently already have, or if we want to add a replicate-dimension. The current solution is sparser.

yannikschaelte commented 6 years ago

formally this would afaik involve adding an additional dimension to each (return-)data array (nt, nr, ny). you suggest just flattening this (data and simulations alike) to (nt*nr, ny)?

dweindl commented 6 years ago

Yes. Where nr currently can be time point specific

yannikschaelte commented 6 years ago

even more flexible :) sounds good, so looks like in the input one just has to flatten/re-arrange the data array, and as @FFroehlich pointed out make sure the sensitivities are correct

yannikschaelte commented 6 years ago

to implement hierarchical optimization then, this format demands some nasty index shuffling, but should be not too difficult with the approach from canpathpro?

dweindl commented 6 years ago

Very good point. That won't work. Firstly, so far we assume same offset/scaling/sigma parameters for all timepoints (would be straightforward to extent). Secondly, we assume all conditions to have the same number of timepoints (so here sparsity would be over). Need to think how to best manage that.

dweindl commented 6 years ago

I think for specifying the data, the most convenient would be a dense (nt, ny, nr) matrix, but for large models it would be good if that would be stored in a sparser manner.