Closed elbaraim closed 4 years ago
Hey @elbaraim,
the same issue actually arises in the wiki example, Computation of integration based prediction bands, if you look at the plots.
It seems like the prediction bands you calculated are actually obtained from validation confidence intervals, not from prediction confidence intervals, see Likelihood based observability analysis and confidence intervals for predictions of dynamic models for details on this topic. In short: Prediction Confidence Intervals are a measure of uncertainty for model predictions, while Validation Confidence Intervals additionally include the uncertainty of a possible measurement. These are generated from Prediction Profiles, which take the model prediction as argument and from Validation Profiles, which take a data point as argument.
I am no expert on the prediction band functionalities, but I would make this educated guess: Since you have set PPL_options('Integrate',false)
, prediction bands are not generated by continuous integration but by calculating the prediction/validation confidence intervals at different time points. Validation profiles have the property that evaluation for negative data values does not constitute a problem, since data only enters the objective function over the squared residuals. Hence, validation confidence intervals can possibly have negative boundaries, even if this makes no sense for model predictions. However, prediction confidence intervals respect bounds given by the model structure.
The default option whether to use validation or prediction confidence intervals in the prediction band algorithm is the validation mode. Since you have not set this explicitly, I would assume that you have calculated validation confidence intervals. In principle, this should be fixed by setting the option PPL_options('Integrate',false,'doPPL',true)
. But since prediction profiles are derived from validation profiles, their calculation can be possibly problematic.
Thus, if the prediction band looks odd for some time points using the prediction mode, you can check whether the prediction profile crossed the confidence threshold with arPlotPPL. If it did not cross the confidence thresholds to both sides of the optimum, you can recalculate your prediction bands with the additional options
PPL_options('Integrate',false,'doPPL',true,'xstd',sigma,'n_steps_profile',N)
,
where N is the number of points used to sample a profile and sigma is the standard deviation of the validation data point. Setting sigma to smaller values makes prediction and validation profiles more similar, since they should in theory coincide if sigma goes to zero. However, setting it too close to zero can lead to technical issues.
I hope this suffices to produce sensible results in your case!
Thank you so much for the help! I will give it a try and let you know how it goes :+1:
Hi @litwiti
I confirm, it works now as expected. Thank you again for the support.
Regarding the number of steps required for the profile, I have seen that:
1000
, which I need for at least one of my model states (specifically due to the lower bound) the profile calculation continues even if the likelihood threshold is already achieved (for instance on the upper bound of this specific state I just mentioned).I attach here an screenshot of the result:
You can see that the lower bound still needs a larger number of steps, while for the upper bound is already sufficient. However the upper bound is calculated for all defined steps (instead of, e.g., automatically stopping when the threshold is reached).
Note: I am using default xstd
value.
This substantially increases the computation time.
Thank you in advance (again) :smile:
Hey @elbaraim,
I'm currently revising arPPL
and the related functions and there seem to be some limits on which cases can be handled reasonably well by the algorithm. Unfortunately, the case you presented is problematic in the current implementation.
First of all, to respond to your issue, the upper bound calculation should be cancelled once the threshold is exceeded by some value of about 1 on the PPL-scale. Thus, your proposal of stopping if the threshold is exceeded is already implemented and should also be functional by default. Since the prediction profile looks unnaturally straight to the right side of the minimum, I suspect that plotting the profile just connects two points on the right side, yielding one straight line, such that there aren't actually many sampled points on this side. I actually managed to construct an example by use of arPPL
where you can see this effect:
Since increasing step size proved to be not particularly efficient, I would propose to rely on 'xstd'
as recommended earlier. Decreasing this value improved the sampling above to this result:
Mind the different scale. As you can see, the prediction profile is sampled adequately now. Setting xstd
at least close to the magnitude of the prediction profile width should hopefully achieve this in your case too. So what I imagine could work would be:
PPL_options
with your previous arguments and additionally use the name-value pair 'xstd',sigma
which you estimated for one specific time point and state. Call arPPL
for this time point and state only.I can not guarantee that this will work, but it could possibly help. Since this is not an especially user-friendly fix, the function is currently under revision. But if you'd like to try this anyway, feedback would facilitate the development process immensely.
Sure I will give it a try. I am happy to help! And I will let you know how it goes. Thanks for the advise and support.
Hi @litwiti
As a first try for some of the model states (and time points) that I want to profile, it worked now following your suggestion on adjusting the xstd
value. However some other points still struggle which I think I should re-run the setting with a higher number of steps and then do the tuning that you suggested. This will take me longer time to test and I still want to provide with a quick feedback now.
Some things I encountered which may be of interest for you:
When trying to rerun the arPPL for a specific state and time point using the new xstd
value I got the following error:
The prediction profile you want to compute for t=2.700000e+00 and state 5 already exists. If you want to overwrite them, delete the value at ar.model.data/condition.ppl.ub_fit(_vpl)
The prediction profile you want to compute for t=4.400000e+00 and state 5 already exists. If you want to overwrite them, delete the value at ar.model.data/condition.ppl.ub_fit(_vpl)
First thing is that ar.model.condition.ppl.ub_fit
does not exist as a field in the ar struct.
>> ar.model.condition.ppl.ub_fit
Reference to non-existent field 'ub_fit'.
The work-around is solved by setting to nan (i.e. replacing the Inf values) the corresponding entries in the following:
ar.model.condition.ppl.ppl_ub_threshold_profile(1:4,4)=nan;
ar.model.condition.ppl.ppl_ub_threshold_profile(1:4,4)=nan;
Then the function will run with no problem. Maybe this could be automatically implemented.
Regarding the topic on higher number of steps: Another thing that I observed is that if I want to run only one of this "problematic" cases for a longer profile steps (while keeping the old results untouched), the following error occurs:
Subscripted assignment dimension mismatch.
Error in arPredictionProfile (line 177)
ar.model(m).(data_cond)(c).ppl.xtrial_profile(ts_tmp, jx,:) = xtrial_tmp;
Error in arPPL (line 129)
arPredictionProfile(t, ppl_general_struct, true, 0, []);
xstd
values) is something to consider as well.I hope this information is valuable :smile:
Hey @elbaraim,
first of all, thanks for the feedback! Knowing which bugs occur makes troubleshooting much easier. It's good to know that to have some first-hand experience that setting xstd
actually is a valid option to get to better prediction profiles. I've uploaded a new version of some functions now, but these are only small fixes and mostly just a load of comments in the code.
About the issue that re-running arPPL
leads to premature termination of the algorithm: I guess this is intended this way, since you do not want to overwrite already existing results without getting confirmation from the user prior to deleting the results. But the issue you mentioned is relevant anyway, since the occuring error message was obsolete: The field ar.model.data/condition.ppl.ub_fit(_vpl)
does not exist anywhere in the current implementation.
The work-around you found is exactly what the user is intended to do in this situation: By setting the specific entries in ar.model.condition.ppl.ppl_ub_threshold_profile
corresponding to the time points and states that you want to re-run to NaN
, the function is forced to re-evaluate them if you use arPPL
again. Changing xstd
prior to the second run should then not be an issue.
However changing the number of steps, i.e. n_steps_profile
, for a second run is not possible at this point. This comes down to exactly this dimension mismatch that you stumbled upon, which occurs because the containers which store the profile results are not increased in size when re-running arPPL
, producing this mismatch (although I would imagine that this is something which could be fixed in a reasonable amount of time).
Thus, as a user you need to specify the number of profile steps correctly at the beginning, meaning that you can not use the results you've already obtained. Since choosing too many steps is not particularly troublesome (except for producing a lot of NaNs), this would be the way to go when running arPPL for the first time.
I hope that the presented information is enough to completely fix your prediction bands (that is if you are still interested to actually obtain them). If I knew whether this approach worked in your case, I could summarize the ideas in the wiki for other users who run into similar problems. So if you retry this and it works (or doesn't), feedback would be very welcome.
Hi @litwiti I will keep you updated. Currently I am testing the pipeline is a rather small model (<10 parameters). I was concern about the fact of setting to large number of integration steps (for all states even if not needed for some) since my goal in the long run is to compute prediction profiles in larger models.
Tuning then xstd
value and re-run for a subset of the states should be fine :+1:
If you like, we can keep this issue open and I will get back here once I obtain my final results to provide further feedback.
Hey @elbaraim,
About the number of steps for each profile you should specify: Setting n_steps_profile
to large numbers should usually not be problematic from a technical point of view, since this has nothing to do with integration but specifies how many steps are used for one prediction profile in each direction of the optimum. However, using a large number of steps is inefficient if xstd
is chosen poorly, because then the algorithm will try all steps although there is no real chance of getting to the threshold. Thus, I would suggest setting n_steps_profile = 200
. If the prediction profile can not be sampled in 200 steps, it is most likely to a bad specification of xstd
anyway.
PPL
-functions to make an analysis of prediction bands in large models more accurate and feasible. Right now, I would suggest the following:xstd
by looking at the trajectories. In this setting, estimation means that you should get a feeling for the scale of the data and choose a small xstd
accordingly. For example, if your trajectory spans a range of 0-100 on a data scale, sigma = 1 would about be a reasonable first choice. loop_list = {{m1,d1,ix1,ts1,sigma1},{m2,d2,ix2,ts2,sigma2},...};
for ii = 1:length(loop_list)
PPL_options('Integrate',false,'doPPL',true,...
'n_steps_profile',200,'xstd',loop_list{ii,5});
arPPL(loop_list{ii,1},loop_list{ii,2},loop_list{ii,3},...
loop_list{ii,4},1);
end
xstd
(which most likely is the issue for poor sampling). This is done by setting the corresponding ar.model.condition.ppl.ppl_ub_threshold_profile
to NaNs and re-running the code above with newly specified values for sigma, since existing profiles are simply skipped....\Examples\Raia_CancerResearch2011
, using the following code:
Setup;
sigmas = [0.01,0.2,0.1,0.02,10,0.02,2,1]; % determine sigmas from trajectories ts = [1,3,10:10:130]; % avoid t = 0
for ii = 1:3
% loop over first 3 observables for example
for jj = 2:4
% loop over conditions
% condition 1 are mostly trajectories fixed to zero,
% for which PPL can not be sampled
PPL_options('Integrate',false,'doPPL',true,...
'n_steps_profile',200,'xstd',sigmas(ii));
arPPL(1,jj,ii,ts,1);
end
end
![Raia_Example](https://user-images.githubusercontent.com/60781115/76238077-9457fa00-622f-11ea-95c8-70b7f0d1953a.png)
* Since calculation of prediction profiles can nonetheless be problematic in the current implementation, a more efficient and reliable calculation of prediction profiles is currently under construction and will soon replace the existing algorithm. This will hopefully lead to faster and more reliable prediction bands. I would suggest waiting until this change has been made before trying to tackle large models.
* And sure, if you are trying this workflow an update on if this worked reasonably well would be much appreciated.
Hi @litwiti Thank you very much for all the detailed steps and hints. Very much appreciated :+1:
Hey @elbaraim,
I have replaced the currently implemented prediction/validation profile calculation (which was just a basic subfunction in the PPL algorithm) with the independent function VPL
, which is documented in the article Estimating Prediction Uncertainty. After some testing on how well this worked when inserted into the PPL algorithm, I am pretty certain that the prediction profile calculation in arPPL
should work a lot better now.
However, algorithm performance and robustness still depends on a good choice of xstd
. Hence, don't hesitate to use arPPL
with different xstd
-values even along the same trajectory, e.g. if the prediction uncertainty is expected to change drastically along the time course. In cases of misspecification prediction profiles may still reach their boundaries, but may then be inaccurate (which mostly can be fixed by re-running with appropriately specified xstd
). This inaccuracy will be clearly visible if checked with arPlotPPL
.
For now I do not intend to further look into the prediction band code. Thus, if you do not encounter any bugs in a larger model you may close this issue. Thanks for the comments so far!
Hi @litwiti I have an additional question: I calculate the PPL for a given state, time point and condition at 99 confidence level.
Hey @elbaraim,
if I understood correctly: You used PPL_options('alpha_level, 0.01')
(and other options) to calculate profiles and intervals at the 99% confidence-level.
If additionally you used the option 'Integrate', false
then this should not have worked properly until now because I did not consider other confidence levels other than 95% when I reworked the PPL-functions. But I have fixed this just now, so this is not a problem anymore.
But I think your main question was:
I don't think that such a function is implemented in this framework, so you would need to do this manually. But this shouldn't be too much of a hassle, you just need to find the intersections of the prediction profiles with the desired confidence threshold.
Hope this helps.
Hi @litwiti Yes, I used the option listed above and also Integrate as False. Good to know that is fixed now! :) Thank you for looking into this!
Yes, about extracting the confidence intervals I thought about what you just suggested, but I was wondering whether I missed about a function that automatically does that.
Hi @litwiti
I have what I hope may be one of my last requests :see_no_evil:
The large model I am trying to calculate the prediction profiles is computationally slow and I am running this on a server which has as upper bound of computation time 48h. I am currently using 200
steps (as discussed above) and even tried to reduce to 100
steps. However, the routine gets interrupted due to reached computation time limit.
I was wondering whether it would be possible to, e.g., split the process such that one job would run the "profile to the right" and another one the "profile to the left". Then the final result would be just merging these.
Hey @elbaraim,
Since the prediction bands are calculated by obtaining many prediction profiles for each specified condition, have you tried to parallelize your computations by starting separate jobs for e.g. each experimental condition? This would certainly split the workload, although I don't see a trivial way to merge the results (however it is certainly possible).
Calculating profiles to the Left and to the Right separately is currently not supported for prediction profiles, but I would favor the solution that I just proposed anyway. I also think that it should be possible to implement this, but I would rather not further meddle with the original code before I break any other features/compatibilities.
I hope this answers your questions.
Hi @litwiti
Yes, I have already tried by parallelizing by experimental condition, model state and time point. Each job runs for only one of these. For one of the models I am using, this is sufficient and it works. Then I just merge the results extracting them for each individual run (with the respective mat
file).
However, for the other model I am working with (actually this one https://bmcsystbiol.biomedcentral.com/articles/10.1186/1752-0509-7-41) the parallelization by exp. cond., state and time point is not sufficient. Therefore I thought about that possibility of a further layer of parallelizing "left" and "right" profiling.
Hey @elbaraim,
even if the model is large I, it seems troublesome that a single profile takes up to 48 hours and may be a hint that something does not work as intended. Since you are already working on single profile level, I can recommed use of the function VPL
which was exactly built for the task of sampling single profiles and which should already be called in the PPL
code. However, use of VPL
allows you to easily manipulate the parameters determining how profiles are calculated. Additionally, you can probably get a feeling for why the calculation takes so long. See the article: Estimating Prediction Uncertainty.
However, it seems strange that at most 200 local fits take up to 48 hours, even if the model is large. In order to possibly enhance optimization and integration performance, you might want to refer to the article: How to check and improve optimization.
Dear d2d team,
I have been recently using the routines for prediction profile likelihoods in one of my models. I got negative lower bounds for some of the model states I am evaluating, which values should be zero (as predicted by MCMC sampling). These are the settings I am using:
As you can see down below, model uncertainties are higher predicted by MCMC sampling but never negative. While this is not the case for PPLs.
These are the results I obtain by PPL calculation:
These are the results I obtain by MCMC sampling:
Thank you in advance for the help.