JeschkeLab / DeerLab-Matlab

Data analysis and method development toolbox for dipolar EPR spectroscopy
MIT License
4 stars 2 forks source link

Design of a new confidence intervals interface #90

Closed luisfabib closed 4 years ago

luisfabib commented 4 years ago

The current interface for confidence intervals, while functional, has several drawbacks:

  1. Confidence levels of the CIs are either hard-coded or must be specified via options
  2. Outputs containing CIs are often deeply nested mixtures of cell and numerical arrays.
  3. The interfaces of covariance and bootstrapped CIs are different.
  4. Currently error propagation from some given CIs to another related model is not available on DeerLab. Covariance matrices are not available as output and doings so would complicate the output format even more.

We need a new interface for the CI system which allows full related functionality while keeping a simple and clean interface for the users.

We could achieve this by constructing a MATLAB class for CIs with the following properties/methods:

Such an implementation would largely simplify the interface, while providing new and more general functionality.

luisfabib commented 4 years ago

I managed to write an interface function cistats which emulates object-oriented programming. The function would allow for fit functions to return a structure from where all variables of interest as well as methods would be called.

Here is an example of how this would look like.


clc,clf,clear

%Some dummy data
covmat = diag(rand(3,1));
parfit = 0.5+rand(3,1);
lb = [-1; -2; -3];
ub = [3; 2; 3];

%% Confidence intervals

%Generate the CI structure
cistats = cistats(parfit,covmat,lb,ub);
%... this would be returned as e.g.
% [parfit,~,cistats] = fitparamodel(___)

%Get different CI confidence levels
ci95 = cistats.ci(0.95);
ci50 = cistats.ci(0.50);
ci43 = cistats.ci(0.43);

%Print the 95%-CIs
for i=1:numel(parfit)
    fprintf('parfit(%i) %.2f (%.2f, %.2f) \n',i,parfit(i),ci95(1,i),ci95(2,i))
end

%% Parameter distributions

%Get distributions of the fit parameters
dist1 = cistats.pardist(1);
dist2 = cistats.pardist(2);
dist3 = cistats.pardist(3);

%Plot
plot(dist1.values,dist1.dist,dist2.values,dist2.dist,dist3.values,dist3.dist)
axis tight,grid on
xlabel('Parameter values')
ylabel('PDF')

%% Error Propagation

% Prepare a model which depends on the fitted parameters
x = linspace(-10,5,200);
model = @(par) polyval(par,x);
lb = -20 + x;
ub = realmax + x;

% Propagate the parameter errors onto the model
modelcistruct = cistats.propagate(model,lb,ub);

% Get the model confidence intervals
modelci95 = modelcistruct.ci(0.99);
modelci50 = modelcistruct.ci(0.25);
mean = modelcistruct.mean;

%Plot the model mean and CIs
clf
hold on
plot(x,mean,'k');
fill([x fliplr(x)],[modelci95(1,:) fliplr(modelci95(2,:))],'b','FaceAlpha',0.3);
fill([x fliplr(x)],[modelci50(1,:) fliplr(modelci50(2,:))],'b','FaceAlpha',0.5);
axis tight, grid on, box on
xlabel('x')
ylabel('f(x)')

function cistruct = cist(parfit,covmat,lb,ub)

parfit = parfit(:);
lb = lb(:);
ub = ub(:);

%Create confidence intervals structure
cistruct.mean = parfit;
cistruct.covmat = covmat;
cistruct.ci = @(p)ci(p);
cistruct.pardist = @(n)pardist(n);
cistruct.propagate = @(model,lb,ub)propagate(model,lb,ub);

%-----------------------------------------------
% Covariance-based confidence intervals
%-----------------------------------------------
    function x = ci(p)
        % Compute covariance-based confidence intervals
        x(1,:) = max(lb,parfit - norminv(p)*sqrt(diag(covmat)));
        x(2,:) = min(ub,parfit + norminv(p)*sqrt(diag(covmat)));
    end

%-----------------------------------------------
% Covariance-based parameter distributions
%-----------------------------------------------
    function out = pardist(n)

        %Generate Gaussian distribution based on covariance matrix
        sig = covmat(n,n);
        mean = parfit(n);
        x = linspace(mean-4*sig,mean+4*sig,500);
        dist = 1/sig/sqrt(2*pi)*exp(-((x-mean)/sig).^2/2);

        %Clip the distributions at outside the boundaries
        dist(x<lb(n)) = 0;
        dist(x>ub(n)) = 0;

        %Store in structure
        out.dist = dist;
        out.values = x;
    end

%-----------------------------------------------
% Error Propagation
%-----------------------------------------------

    function modelcistruct = propagate(model,lb,ub)

        jacobian = jacobianest(model,parfit);
        modelfit = model(parfit);

        modelfit = max(modelfit,lb);
        modelfit = min(modelfit,ub);

        modelcovmat = jacobian*covmat*jacobian.';
        modelcistruct = cist(modelfit,modelcovmat,lb,ub);

    end

end

I believe that this is the direction to go for DeerLab. I would need to think more about the interface before doing any implementation into DeerLab.

laenan8466 commented 4 years ago

Hey! I would love to see this feature. I'm just confused, what do you mean with 'emulate' object oriented behavior? Why not use a 'real' class? The code you posted doesn't work, because a variable name is assigned the same name as a function that already exists.

Here is a mockup for a class. The Matlab script:

clc,clf,clear

%Some dummy data
covmat = diag(rand(3,1));
parfit = 0.5+rand(3,1);
lb = [-1; -2; -3];
ub = [3; 2; 3];

%%Confidence intervals
%Generate the CI structure
cistats = cistats();
cistats.set_data(parfit,covmat,lb,ub);
%... this would be returned as e.g.
% [parfit,~,cistats] = fitparamodel(___)

%Get different CI confidence levels
ci95 = cistats.ci(0.95);
ci50 = cistats.ci(0.50);
ci43 = cistats.ci(0.43);

%Print the 95%-CIs
for i=1:numel(parfit)
    fprintf('parfit(%i) %.2f (%.2f, %.2f) \n',i,parfit(i),ci95(1,i),ci95(2,i))
end

%%Parameter distributions
%Get distributions of the fit parameters
dist1 = cistats.pardist(1);
dist2 = cistats.pardist(2);
dist3 = cistats.pardist(3);

%Plot
plot(dist1.values,dist1.dist,dist2.values,dist2.dist,dist3.values,dist3.dist)
axis tight,grid on
xlabel('Parameter values')
ylabel('PDF')

%%Error Propagation
% Prepare a model which depends on the fitted parameters
x = linspace(-10,5,200);
model = @(par) polyval(par,x);
lb = -20 + x;
ub = realmax + x;

% Propagate the parameter errors onto the model
modelcistruct = cistats.propagate(model);

% Get the model confidence intervals
modelci95 = modelcistruct.ci(0.99);
modelci50 = modelcistruct.ci(0.25);
mean = modelcistruct.mean;

%Plot the model mean and CIs
clf
hold on
plot(x,mean,'k');
fill([x fliplr(x)],[modelci95(1,:) fliplr(modelci95(2,:))],'b','FaceAlpha',0.3);
fill([x fliplr(x)],[modelci50(1,:) fliplr(modelci50(2,:))],'b','FaceAlpha',0.5);
axis tight, grid on, box on
xlabel('x')
ylabel('f(x)')

The class file with name cistats.m:

classdef cistats < matlab.mixin.Copyable

    properties (GetAccess=public)
        parfit
        covmat
        lb
        ub
    end

    properties (Dependent = true)
        mean
    end

%Create confidence intervals structure

    methods
        % Set data input
        function set_data(self, varargin) % parfit, covmat, lb, ub
            assert(nargin == 4+1);
            self.parfit=varargin{1};
            self.covmat=varargin{2};
            self.lb=varargin{3};
            self.ub=varargin{4};
        end
    %-----------------------------------------------
    % Covariance-based confidence intervals
    %-----------------------------------------------
        function x = ci(self,p)
            % Compute covariance-based confidence intervals
            x(1,:) = max(self.lb,self.parfit - norminv(p)*sqrt(diag(self.covmat)));
            x(2,:) = min(self.ub,self.parfit + norminv(p)*sqrt(diag(self.covmat)));
        end

    %-----------------------------------------------
    % Covariance-based parameter distributions
    %-----------------------------------------------
        function out = pardist(self,n)

            %Generate Gaussian distribution based on covariance matrix
            sig = self.covmat(n,n);
            meanl = self.parfit(n);
            x = linspace(meanl-4*sig,meanl+4*sig,500);
            dist = 1/sig/sqrt(2*pi)*exp(-((x-meanl)/sig).^2/2);

            %Clip the distributions at outside the boundaries
            dist(x<self.lb(n)) = 0;
            dist(x>self.ub(n)) = 0;

            %Store in structure
            out.dist = dist;
            out.values = x;
        end

    %-----------------------------------------------
    % Error Propagation
    %-----------------------------------------------

        function modelcistruct = propagate(self,model)

            jacobian = jacobianest(model,self.parfit);
            modelfit = model(self.parfit);

            modelfit = max(modelfit,self.lb);
            modelfit = min(modelfit,self.ub);

            modelcovmat = jacobian*self.covmat*jacobian.';
            modelcistruct = cistats(modelfit,modelcovmat,self.lb,self.ub);

        end
    end

    methods 
        function mean = get.mean(self)
            mean=self.parfit;
        end

        function set.mean(self,value)
            self.parfit=value;
        end
    end

end

I couldn't test everything, as I didn't have a function jacobiannest.

luisfabib commented 4 years ago

System implemented with 9957b6d