astropy / specreduce

Tools for the reduction of spectroscopic observations from Optical and NIR instruments
https://specreduce.readthedocs.io
62 stars 38 forks source link

Improvements to dealing with masked/nonfinite data in Specreduce operations #192

Open cshanahan1 opened 1 year ago

cshanahan1 commented 1 year ago

This issue is meant to be a catchall issue to summarize meetings and internal tickets at Space Telescope on the topic of improving masks / treatment of NaNs in data across the package. Link any related issues here, and the floor is open for discussion on this!

Current issue(s) with masking

Options for treating NaNs

In all operations where masking is relevant (trace / extract at least), provide options for treatment of NaNs:

  1. When possible, filter nonfinite values before computation (e.g in FitTrace).
  2. Omit columns with nonfinite values when the mask is collapsed from 1D to 2D (current behavior for Extract)
  3. Have a fill value of 0 for non-finite values
  4. Interpolate between good values.

Proposal:

cshanahan1 commented 1 year ago

@tepickering @eteq @kecnry @rosteen

tepickering commented 1 year ago

my take is that tracing and extraction are different use cases that should handle NaN's differently, or at least use different defaults. for extraction, being conservative and throwing out any column with any NaN's is an appropriate default. otherwise columns with less valid data can create artificial absorption features. doing anything other than that at best complicates determining uncertainties.

tracing, however, is often defined independent of any given data. FlatTrace being one example. future examples will include edge detection in flat-field images for multi-slit or multi-order data.

FitTrace is kind of a special case that uses the data itself. since it's a process that already involves interpolation/extrapolation, it can be a lot more lenient in how it handles masked data. either interpolating NaNs or using np.nansum to bin along the dispersion axis seem appropriate.

i should also note that saturated values probably shouldn't always be masked as NaN's. they are numbers and you know the lower limit to their actual values. so there is information there that can be used up to a point. in the case of FitTrace, including saturated values when centroiding can often lead to better results than leaving them masked.

cshanahan1 commented 11 months ago

I am currently working on a PR to add a new argument to all specreduce operations called mask_treatment with two implemented options - 'omit' and 'zero-fill'. A follow up effort to add an 'interpolate' option will be next. Thoughts?