aewallin / allantools

Allan deviation and related time & frequency statistics library in Python
GNU Lesser General Public License v3.0
217 stars 78 forks source link

Confidence intervals for deviations #47

Open aewallin opened 8 years ago

aewallin commented 8 years ago

We should add confidence intervals (errorbars) for the computed deviations. For most common deviations the equivalent degrees of freedom are already computed in uncertainty_estimate()

However it looks like this will require a change in the API, both for the calling signalture which is now: mdev(data, rate=1.0, data_type="phase", taus=None) to something like: mdev(data, rate=1.0, data_type="phase", taus=None, ci=None) where the ci parameter would be used to select between simple 1/sqrt(N) errorbars and the more complicated calculation based on edf. In theory one could allow asymmetric confidence intervals, but perhaps this is not required? For example ci=0.683 would give standard error (1-sigma) confidence intervals.

Also on the output-side we need an API change since the return values are now: (taus, devs, deverrs, ns) A straightforward change would be to add the lower and upper confidence intervals: (taus, devs, devs_lo, devs_hi, ns) But I wonder if this starts to be too many variables to remember when calling the function. An alternative would be a DevResult object where the outputs would have names: myresult = allantools.adev(...) myresult.taus myresult.devs myresult.devs_lo ..and so on...

any other ideas or comments on this?

aewallin commented 8 years ago

From the above I have forgot the noise-estimation completely.

On the input side we need a noise-estimate from the user (e.g. alpha-parameter, noise-PSD exponent), or an indication of what noise-estimating function to use e.g. "B1" (see NIST SP1065).

on the output side for each (tau, dev) pair the estimated noise-type shown as an integer alpha could also be output.

With so many output parameters I am now leaning towards a ResultObject so that one doesn't have to remember positional output-arguments...

mivade commented 8 years ago

I definitely like the idea of using a ResultObject (or maybe just a dict) for the output. In addition to not having to remember the order of positional output, it also would mean that any further changes to the output values would not break existing code.

aewallin commented 8 years ago

It looks like the latest Stable32 doesn't use the simple formulas of NIST SP1065 but instead an algorithm by Greenhall available over here: http://ntrs.nasa.gov/archive/nasa/casi.ntrs.nasa.gov/20050061319.pdf I will try to implement it over the next week - so we at least have some tests and algorithms that reproduce confidence intervals from Stable32 first. After that we can change the API.

fmeynadier commented 8 years ago

Hello,

Trying to examine if it is really necessary to break the API again :

Not sure yet what to do with the output "estimated noise" integer value. Another possibility would be to write a wrapper object, leaving current functions as they are.

mivade commented 8 years ago

I've started implementing a proof-of-concept results object here. It's not ready yet, but I have added an rtype keyword argument to calculations allowing easy switching between the current tuple return values and the unified object return values. Advantages to the approach of using an object:

Thoughts? I could make a pull request if it would make more sense to continue discussion there (I have not yet since this is nowhere near ready).

fmeynadier commented 8 years ago

Hi,

Here is my take on this issue : fmeynadier@fab5bc9

I propose a simple front-end object (that could also use the results object proposed above...) that encapsulates the existing functions and gives room for adding other inputs/outputs/methods.

You can test it simply by getting frontend.py and call it along those lines (it should be possible to get a fancier call, but you get the idea) :

import allantools
import allantools.frontend
import numpy as np

a = allantools.frontend.AllantoolsFrontend()
a.set_input(np.random.rand(1000))
a.compute(allantools.mdev)
print(repr(a.out))
mivade commented 8 years ago

@fmeynadier I'm personally not a fan of that style of giving inputs since it generally seems like it overuses objected oriented features to the point of being too verbose (it isn't in this simple case, but I would worry that it could become that way if more is added to it). Instead I prefer keyword arguments to functions which still allows for a lot of flexibility. That said, the nice thing about your proposal is that we can have both options, so whatever style the user prefers is viable. Some suggestions:

  1. It would seem cleaner to me if there were aliased methods so instead of doing a.compute(function) one could call a.mdev().
  2. It would be nice to be able to provide the input data on initialization.
  3. The name "frontend" doesn't seem terribly descriptive. Maybe something like allantools.dataset.Dataset?
fmeynadier commented 8 years ago

Thanks for the comments. I agree : I am no fan either of full OO style, and I confess not being very good at picking names...

So, please have a look at fmeynadier@a77c07a , here are my answers :

  1. I agree but for the time being (i.e. early phases of development) I'd prefer to keep the generic "compute" method : less code to write and modify if we change our minds with Dataset(). But in the end, yes, we shall have one method per allantools stat function (benefits, apart from cleaner calls : it naturally limits the number of functions that can be called, and it allows to customize calls when necessary).
  2. Done. "data" is a keyword argument there too, so it can be optional.
  3. Done. I also added the relevant lines in the package's "init.py", so now you can simply use it as
import allantools
import numpy as np

a = allantools.Dataset(data=np.random.rand(1000))
a.compute(allantools.mdev)
print(repr(a.out))
mivade commented 8 years ago

Looks good.

I think you make a good point with using the compute method. We might even be able to use some Python magic to automatically map method calls to the relevant computations which would mean not having to manually update or add additional functions.

One final naming suggestion: maybe out should be called result instead?