gaussian fit failing - Githubissues

This is solved in https://github.com/OxIonics/ionics_fits/pull/153

The underlying issue is that the dataset had too few points for the FFT heuristic to work well. While there is an improvement we should make to the FFT heuristic (see below), small datasets like this are always going to be a challenge for it, so I've added a second heuristic based on peak finding.

These two heuristics are pretty complementary: the FFT works extremely well for noisy datasets with outliers since we filter out high-frequency noise; the peak-based heuristic works really well for smaller datasets, but can't tolerate significant outliers. If you've got a dataset that's both small and has outliers then it's probably fair for ionics_fits to leave it up to you to provide some user estimates!

In the implementation of the peak heuristic, we find sigma by looking at all points which are 1/e of the peak above the baseline. We then take the peak-peak x of this dataset and calculate sigma from that. This will be thrown off, for example, if there is an outlier far from the peak which is above the 1/e threshold (or, just noise pushes a point over the line).

In practice that's probably not too much of an issue since if the dataset is large enough to have outliers that far from the peak, it's probably large enough for the FFT to work. However, it's probably worth raising the threshold up the peak a bit since 1/e is pretty close to the baseline - let's modify this to looking for the FWHMH.

Another approach would be to look for a set of continuous points above the baseline and take the peak-peak of that, rather than looking for any points above the baseline. However, that can end up underestimating sigma if we have an outlier the other way. Without making some assumptions about the kinds of datasets we're fitting, it's not clear to me which approach will be more robust in practice. So, I propose to keep it "as is" (which is the simplest thing to implement) and see how it survives contact with real data. If we have failures we can come back and resolve as appropriate.

Comments:

[ ] Before we close this issue, we should roll this fix out to other models which use the FFT heuristics (e.g. Lorentzian, Sinc, etc)
[ ] While we're at it, it's probably work breaking the peak-finding heuristic that's currently part of find_offset_sym_peak_fft into its own function
[ ] find_offset_sym_peak_fft should probably also try the symmetry heuristic as well for good measure - it's intended as a kitchen sink heuristic after all!
[ ] In the Gaussian fft-based heuristic we back sigma out from the FFT analysis out by trying to find the 1/e point. Since we typically don't have much data here and the data is decaying exponentially, we use interpolation to try to improve the accuracy. In practice, this doesn't work well for small datasets. It's probably worth just fitting an exponential to it and doing this properly (the priority for this toolkit is robustness/flexibility not speed and this is a pretty quick fit to execute). This would also allow us to pin the x0 for the exponential to the "true" origin of the FFT, taking into account the "DC" sample that we've chopped off.
[ ] Make peak-based threshold for finding sigma 1/2 not 1/e to reduce the chance of an outlier from the baseline significantly skewing our estimate for sigma

OxIonics / ionics_fits

gaussian fit failing #152