OttoStruve / muler

A Python package for working with pipeline-produced spectra from IGRINS, HPF, and Keck NIRSPEC
https://muler.readthedocs.io
MIT License
15 stars 9 forks source link

Add a method for estimating uncertainty from the data itself #52

Open gully opened 3 years ago

gully commented 3 years ago

We often want to estimate the SNR from the data itself. This operation is not unique and depends strongly on the properties of the data itself. A common and fair assumption is that data are near-critically sampled, with a few pixels per resolution element, such that the spectrum possesses many regions in which adjacent pixels share exactly the same input flux, but with only the noise values different. The input flux need not even be exactly the same, but merely smoothly-varying-in-wavelength so that a trendline perceives and removes the trend leaving only a residual spectrum with scatter.

From this residual spectrum we quantify the pixel-to-pixel scatter with a robust standard deviation metric, either homoscedastic or heteroscedastic with some sort of window-function-based-standard-deviation, like binned_statistic

Finally, an uncertainty derived in this way can be compared to the pipeline-reported SNR to gauge the reliability of the pipeline, or flag bad data. We could add a method or show a tutorial for overriding the pipeline uncertainty with this user-derived uncertainty in cases where the pipeline uncertainty is inaccurate.

We already have some of the tools in place to generate this residual spectrum: .smooth_spectrum() exists and is exactly what we aim for as the running-average model estimate.

gully commented 3 years ago

Ok, we already have an .estimate_uncertainty() method. Yay!

We should update this method in a few ways. First: We should make use of the .smooth_spectrum() keyword argument bandwidth=. We would essentially set the bandwidth equal to some small number of pixels, say 2-10 pixels. Then the Gaussian Process will essentially act as a running-average around those few pixels. Currently the default is for a large bandwidth that may not be suitable for all spectra.

Second, we may want another method that does an in-place operation, meaning that the returned object is an EchelleSpectrum (or equivalent, e.g. HPFSpectrum) that has uncertainty values updated as described above.