Not 100% intuitive behavior with `dd_gaussX`

JeschkeLab / DeerLab-Matlab

Data analysis and method development toolbox for dipolar EPR spectroscopy

MIT License

4 stars 2 forks source link

Not 100% intuitive behavior with `dd_gaussX` #101

Closed spribitzer closed 4 years ago

spribitzer commented 4 years ago

I recently noticed that with dd_gaussX the input for the relative amplitudes can get confusing:

The functions take relative amplitudes as input, with the amplitude of the last Gaussian omitted, as it is calculated from the X-1 amplitudes.

The unexpected behavior happens when the provided amplitudes add up to be larger than 1.

r = linspace(1.5,6,201);
P = dd_gauss2(r,[3 0.4 1.1 4 0.3]);

This completely ignores the second Gauss.

Similar with 3 Gaussians

P = dd_gauss3(r,[3 0.4 1.1 3.5 0.7 0.4 4.3 0.5]);

In this snippet the second Gauss has an amplitude assigned (I'm guessing 0.4/(1.1+0.4) or is it 0.4?) even though the first Gauss has an amplitude > 1, and the amplitude for the 3rd Gauss is set to zero.

Though it makes sense to some extend, this behavior is a little unexpected. Maybe it is worth adding a warning if the provided relative amplitudes add up to more than 1?

stestoll commented 4 years ago

This is not straightforward to fix, since there is probably a more fundamental issue: The multi-Gauss models take the amplitude parameters assuming they are independent. For example, dd_gauss3 defines the upper limits for both A(1) and A(2) to be one, so this point in parameter space violates the normalization condition A(1)+A(2)+A(3)==1 without even having a specific (nonnegative) value for A(3).

The question is what are the practical consequences of this for fitting.

luisfabib commented 4 years ago

There are two possible situations:

Bounded ampltiudes + Implicit amplitudes: what we are doing right now. Having the amplitudes in the range [0,1], and constraining the last amplitude such that it is given by the normalization condition. It leads to the effects described above but allows to reduce the parameter space of such multi-components models by one parameter.
Upper unbounded amplitudes + Normalization: by allowing the amplitudes to adapt any non-negative value and then internally normalizing to the sum, i.e. A = A/sum(A) one would always satisfy the normalization condition. It would also avoid situations where Gaussians with non-zero amplitude would get "deleted" (as shown in the OP). The disadvantage here is that all amplitudes must be explicit parameters and the last amplitude cannot be inferred from the other amplitudes anymore. Therefore, an additional fit parameter would be introduced.

In both situations, any optimal solution to a problem is contained in the parameter space, therefore during fitting the solution should be found independently of the case one chooses. The differences arises in (probably) fitting times and multi-start runs required to find the global solution of these models.

stestoll commented 4 years ago

We should implement the second approach, i.e. have N amplitude parameters in an N-Gauss model, and normalize internally. Although this adds one more parameter to each of the mixture model, this should be quite transparent for the user. It also eliminates the current problem during fitting with an N-Gauss model, where the model sometimes collapses to an N-1 Gauss model because the last amplitude is set to zero.

The only downside is that the models are not uniquely specified, since the overall scale of the amplitudes has no effect on the model output.

luisfabib commented 4 years ago

With these changes, this issue should be closed.

Although it is still possible to get into situations where the N-Gauss model collapses into a N-1 model during fitting, their frequency should be considerably reduced. This also means that less (or potentially none) multi-starts are required to find the optimal global solution.