JeschkeLab / DeerLab-Matlab

Data analysis and method development toolbox for dipolar EPR spectroscopy
MIT License
4 stars 2 forks source link

Changes to parametric fitting and fixed values in parametric models #87

Closed laenan8466 closed 4 years ago

laenan8466 commented 4 years ago

Hey, sadly I havn't followed all changes in the last month. When revisting an 'old' project I came up with some open questions about the design for parametric fitting. I try to group the questions together. I also add some points I noticed, that could be interesting for others and I would like to emphasize because they are genius (tremendous work @luisfabib and all others).

Parametric model fit

Things I noticed/proposed features

Fixed variables and boundaries

With fitparamodel it is possible to change the inital guess, but it is not possible to fix a parameter individually. This is of interest, if a parameter can be determined with higher accuracy in a single experiment. Possible workarounds could be bilevel fit (see below) or a custom dd-model with a fixed parameter/specialized boundaries.

Background for parametric fit

When performing a parametric fit (fitparamodel) of a mixed model with the example provided it is not possible to include a background correction, if the modulation depth is unknown. The reason for this is, that the dipolarkernel cannot be generated with a background model but unknown modulation depth. See documentation of dipolarkernel. Workaround could be using bilevel fit (see below) or fitsignal (with the restrictions listed below).

Global fit (proposal for feature addition)

It could be of interest to not only fit several traces, resulting in one distribution, but fit only some parameters (in a parametric model e.g. the mean position but not the width) and determine others individually.

Bilevel optimization

In a version 0.8 bilevel optimization was existing as an example and was removed in v0.9. I guess the reason that it was removed is that fitsignal is now doing this job. The way bilevel optimization worked could be used for fixed variables/boundaries or global fit of several (different) samples.

Multi-component parametric fit

The possiblities introduced with fitmultimodel are really great. This could be helpful if having a series of macromolecules with increased size and it is unclear what is included (e.g. polymerization).

Questions to @luisfabib

Maybe it will be helpful for you to devide some feature requests in seperate issues. If you are interested in having s.th. of that I proposed, I can add it in a seperate issue. Just give me a hint. ;-)

luisfabib commented 4 years ago

Hey, thanks for the inputs. I will try to go through the issues in order to answer or complement them:

Fixed variables and boundaries

Individual parameters can be fixed in DeerLab by defining a new function handle for the model. For example: take a Gauss model dd_gauss(r,[rmean fwhm]) which you would normaslly fit via

paramfit = fitparamodel(V,@dd_gauss,r,K);

but now you want to fit rmean with a fixed fwhm, then one needs to use

fwhm = 0.54; %nm
model = @(r,rmean) dd_gauss(r,[rmean fwhm]);
par0 = 5; %nm
paramfit = fitparamodel(V,@dd_gauss,r,K,par0,'lower',1,'upper',20);

since the fwhm is defined before the construction of the function handle model, that value will be fixed inside the new model. Then fitparamodel can be called, albeit this time the start values par0 as well as upper/lower boundaries need to be specified.

There is currently no way to fix a parameter of a built-in model while keeping the boundaries/start values information of the other parameters.

Global fit

Similarly, global fitting in DeerLab allows to fit models with fixed parameters, global parameters as well as local parameters.

The following script is an example of fitting two different signal to two different bimodal Gauss distance distributions, whose mean distances are equal for the two signals but their amplitudes differ, i.e. the mean values are global parameters, the amplitudes are local parameters and the widths are fixed to a known value.

This kind of processing requires full scripting as (currently) there are no wrapper functions around this.

clc,clear,clf

rng(1)

%preparations
r = linspace(2,6,300);
t1 = linspace(0,4,200);
t2 = linspace(0,6,100);

%parameters
rmeanA = 3.45;
rmeanB = 5.05;
fwhmA = 0.5;
fwhmB = 0.3;
ampA1 = 0.8;
ampA2 = 0.2;

%two different distributions for two different samples
P1 = dd_gauss2(r,[rmeanA fwhmA ampA1 rmeanB fwhmB]);
P2 = dd_gauss2(r,[rmeanA fwhmA ampA2 rmeanB fwhmB]);
%...and two different signals for two different samples
V1 = dipolarsignal(t1,r,P1,'noiselevel',0.01);
V2 = dipolarsignal(t2,r,P2,'noiselevel',0.01);

%dipolar kernels
K1 = dipolarkernel(t1,r);
K2 = dipolarkernel(t2,r);
%parameters to fit [rmeanA rmeanB ampA_1 ampA_2]
par0 =   [2  2  0.5 0.5];
lower =  [1  1  0   0];
upper =  [20 20 1   1];

%Collect data for global fit
Ks = {K1,K2};
Vs = {V1,V2};
ts = {t1,t2};

%Fit
model = @(t,par,idx) mymodel(t,par,idx,r,Ks);
parfit = fitparamodel(Vs,model,ts,par0,'lower',lower,'upper',upper,'multistart',50);

%Get results
[Vfit1,Pfit1] = mymodel(t1,parfit,1,r,Ks);
[Vfit2,Pfit2] = mymodel(t2,parfit,2,r,Ks);

subplot(221)
plot(r,P1,r,Pfit1)
subplot(222)
plot(t1,V1,t1,Vfit1)
subplot(223)
plot(r,P2,r,Pfit2)
subplot(224)
plot(t2,V2,t2,Vfit2)

function [Vfit,Pfit] = mymodel(t,par,idx,r,Ks)

    %Global parameters
    rmeanA = par(1);
    rmeanB = par(2);

    %Fixed parameters
    fwhmA = 0.5;
    fwhmB = 0.3;

    %Local parameters
    ampA1 = par(3);
    ampA2 = par(4);

    if idx == 1
        %Use ampA1 for signal Vs{1}
        AMP = ampA1;
    else
        %Use ampA2 for signal Vs{2}
        AMP = ampA2;
    end
    %Generate signal-specific distribution (mix global+local parameters)
    Pfit = dd_gauss2(r,[rmeanA fwhmA AMP rmeanB fwhmB]);
    %Generate corresponding signal (either V{1} or V{2})
    Vfit = Ks{idx}*Pfit;

end

Bilevel optimization

Yes, nested LSQ optimization of full models (we use this term now instead of bilevel optimization) is now way more accessible from the new fitsignal function. fitsignal is a pure wrapper around fitparamodel and fitregmodel so therefore, all workflows that were done before this function are still available if scripted.

The main advantages of fitsignal over a custom-implementation is that this wrapper uses several tricks to largely boost computation times (up to factor 6 in speed for some cases that we have seen) and handles the confidence interval computation for this kind of nested problems which is not trivial.

However, as you said in 0.9.0 there is no way to fix a parameter in fitsignal while fitting the rest. This is one of the many interface issues that are still to be tackled in future releases of DeerLab.

Questions

Am I correct, that bilevel optimization was replaced with fitsignal?

As explained above, it has not been "removed" since it is still scriptable, but fitsignal functions as a wrapper for easier scripting and speed.

Is there a feature for fixed variables or boundaries in your code I missed? Could this I would be willing to provide an example for such analysis, if this is of interest for the repo.

See examples above. If you meant something else please let me know ;)

Is it possible to include the background model, but not the modulation depth in the kernel to use with fitparamodel?

I understand that you want to simulate/fit a system with 100% modulation depth that still has a background function suppressing the modulations? This can be accomplished by just specifying a modulation depth lambda=1, i.e

lam = 1;
B = bg_exp(t,k);
K = dipolarkernel(t,r,lam,B)

Do you have better workarounds for this problems/restrictions in mind?

The future releases of DeerLab will primarily focus on the user-program interface to try to simplify different workflows and processing types.

Hopefully this helps, otherwise please let us know ;)

laenan8466 commented 4 years ago

Wow @luisfabib. That was fast and so helpful. Everything answered, except the last point.

Fixed variables while fitting background

The problem

What I meant was the following, starting from the mixmodel example. Imagine the trace has a background modulation:

%Axis definition
t = linspace(-0.5,4,350);
r = linspace(2,6,200);

% Distribution parameters
rmean = 3.5;
width = 0.3;
chain = 4.3;
pers = 10;
wlcwidth = 0.1;
amp = 0.2;

% Background parameters
lambda = 0.3; % modulation depth main signal
conc = 200;   % spin concentration (uM)
% bg_hom3d will be chosen.

% Generate distribution
P = dd_gauss(r,[rmean width]);
P = amp*P + (1 - amp)*dd_wormgauss(r,[chain pers wlcwidth]);
% Normalize distribution
P = P/sum(P)/mean(diff(r));
% Generate dipolar evolution function
V = dipolarsignal(t,r,P,lambda,bg_hom3d(t,conc,lambda),'noiselevel',0.02);

Now this model should be fitted and the exact modulation depth would be unknown. When using the fitparamodel example, this is not possible, as dipolarkernel will only accept a background model, if also a modulation depth is given. Of course, this could be solved by fitsignal:

gausswlc = mixmodels(@dd_gauss,@dd_wormgauss);
[Vfit,Pfit,Bfit,parfit,parci,stats] = fitsignal(V,t,r,gausswlc,@bg_hom3d)

But then the restrictions listed above are in place. When trying to fix one parameter, I will have to go over the way with fitparamodel and dipolarkernel. In this case we need to use the "nested LSQ optimization of full models" (hihi, learned a new phrase 😉) to fit the background parameters as well. This is of course possible easily, but will not benefit from the computation boost you mentioned. Also it is more code to think about for the user.

Possible Solution with changes in behavior of DeerLab

Easier way for the user (but maybe not possible from how the scripts work): If dipolarkernel would take a background model without the need of a fixed modulation depth, the user wouldn't have to go the long was with a nested LSQ optimization.

Another solution: Making a custom model with hardcoded varialbes

Another solution (keeping the benefits from fitsignal) would be to generate a custom dd_model in every case. This would have some variables hardcoded (not so nice).

Another Idea of structuring code for models

(This idea just popped up in my mind. It means some efford (for you) and has to be overthought. So feel free to 'ignore' it. 😉 ) Taking on the idea of making a custom model, but beeing able to modify/fix some variables at runtime, here is an idea: Models could be an object and inherit some of their properties.

As far as I understood the requirements for dd_models are quite small:

For this it would be easier (for me) to have a class dd_model from which models are inherited. Then it is really easy to make new (custom) models and use them with the standard procedure. Would this interfere with the computation boost you mentioned? What do you think?

laenan8466 commented 4 years ago

I created an example for nested processing with a background.

I wasn't able to get to a valid fits in all executions. Processing with fitsignal: fitparamodel_fitsignal Processing with nested function as in example for nested processing with a background.: fitparamodel_nested

It seems to be very important to have a narrow range of parameters or larger number of starting points (see #29 ). Can you comment, if there is a flaw in my execution script? The difference in these two results makes me nervous and I hope, I did an error. EDIT: This is done with different noise seeds. So the results aren't comparable. See below.

Fitting takes a while, what is in line with your report of a computation boost in the normal case. I would state it's even more than factor of 6, but this can be also an issue with the drawnow function.

Feel free to use the example for whatever way you like to. I would be glad for all comments on possible improvements.

laenan8466 commented 4 years ago

Here is another example of a nested fit with a fixed amplitude. Of course all other variables could be fixed as well. fitparamodel_nested_fixamp

This could get messy, to keep the overview...

stestoll commented 4 years ago

As of a few commits ago (dev version, post 0.9), you can fix a parameter in fitsignal by setting its lower and upper bound to the same value ('Lower' and 'Upper' options). But the interface to the fitting functions is in flux, so don't take this as the final way.

laenan8466 commented 4 years ago

Thanks for the hint @stestoll !

Fixed amplitude with figsignal

I tested that as well and this seems to be ne most reliable way (Using commit b7a2ec70a0a0579b59d0daacd96add3b26d4090a for the record). Furthermore this is Here is an example with a fixed amplitude:

gausswlc = mixmodels(@dd_gauss,@dd_wormgauss);
% Define parameters
par0 = {[0.2,3.5, 0.5, 3.7, 010, 0.200],...
    [150],...
    {0.3}};
lower = {[0.2,1,0.2,1.5,2,0.001],...
    [10],...
    {0}};
upper = {[0.2,20,5,10,100,2],...
    [400],...
    {1}};
[Vfit,Pfit,Bfit,parfit,parci,stats] = fitsignal(V,t,r,gausswlc,@bg_hom3d,@ex_4pdeer,par0,'Lower',lower,'Upper',upper)

fitparamodel_fitsignal_boundaries-ampfix

Performing the analysis multiple times, the result isn't differing.

Comparison to nested LSQ

Important remark to my post above: I did a mistake with generating the noise anew for all examples, so they are not comparable! So here is the data with the example of a nested fit with a fixed amplitude on the same noise once again: fitparamodel_nested_fixamp Here is to note, that the result differs when perfoming the analysis a second time. This could be due to the random MultiStart.

Computation time

Time comparison with same start parameters for dd-fit: fitsignal: 3-4 s nested LSQ: 5min 15s (Background start parameters as in script, MultiStart: 15) nested LSQ: 14 s (Multistart: 1, result useless) So the fitsignal wrapper seems to be best in speed performance. Nice!

Conclusion

In both cases quality isn't brilliant in comparison to the 'truth', but the start parameters weren't choosen ideally (at purpose). I will use this specific commit for production and report if there is anything newsworthy. ;-)