Open hartytp opened 5 months ago
This is solved in https://github.com/OxIonics/ionics_fits/pull/153
The underlying issue is that the dataset had too few points for the FFT heuristic to work well. While there is an improvement we should make to the FFT heuristic (see below), small datasets like this are always going to be a challenge for it, so I've added a second heuristic based on peak finding.
These two heuristics are pretty complementary: the FFT works extremely well for noisy datasets with outliers since we filter out high-frequency noise; the peak-based heuristic works really well for smaller datasets, but can't tolerate significant outliers. If you've got a dataset that's both small and has outliers then it's probably fair for ionics_fits
to leave it up to you to provide some user estimates!
In the implementation of the peak heuristic, we find sigma
by looking at all points which are 1/e of the peak above the baseline. We then take the peak-peak x of this dataset and calculate sigma
from that. This will be thrown off, for example, if there is an outlier far from the peak which is above the 1/e threshold (or, just noise pushes a point over the line).
In practice that's probably not too much of an issue since if the dataset is large enough to have outliers that far from the peak, it's probably large enough for the FFT to work. However, it's probably worth raising the threshold up the peak a bit since 1/e is pretty close to the baseline - let's modify this to looking for the FWHMH.
Another approach would be to look for a set of continuous points above the baseline and take the peak-peak of that, rather than looking for any points above the baseline. However, that can end up underestimating sigma if we have an outlier the other way. Without making some assumptions about the kinds of datasets we're fitting, it's not clear to me which approach will be more robust in practice. So, I propose to keep it "as is" (which is the simplest thing to implement) and see how it survives contact with real data. If we have failures we can come back and resolve as appropriate.
Comments:
find_offset_sym_peak_fft
into its own functionfind_offset_sym_peak_fft
should probably also try the symmetry heuristic as well for good measure - it's intended as a kitchen sink heuristic after all!x0
for the exponential to the "true" origin of the FFT, taking into account the "DC" sample that we've chopped off.1/2
not 1/e
to reduce the chance of an outlier from the baseline significantly skewing our estimate for sigma
Test dataset