Improvements to dealing with masked/nonfinite data in Specreduce operations

cshanahan1 commented 1 year ago

This issue is meant to be a catchall issue to summarize meetings and internal tickets at Space Telescope on the topic of improving masks / treatment of NaNs in data across the package. Link any related issues here, and the floor is open for discussion on this!

Current issue(s) with masking

In both Horne and Boxcar extract, 2D masks are allowed as input on NDData. However, these masks are then collapsed to 1D and entire columns are excluded in the presence of one NaN. This has been creating issues for JWST data which are littered with NaNs, so there should be an options to treat this to avoid holes in extractions/traces. (https://github.com/astropy/specreduce/issues/167)
In FitTrace, fits to individual columns within a bin filter masked values, and fall back to an the 'all-bin fit' when fully masked (which is different default behavior than Extract). This can produce strange results, and would benefit from additional masking options (interpolation, or setting to 0).

Options for treating NaNs

In all operations where masking is relevant (trace / extract at least), provide options for treatment of NaNs:

When possible, filter nonfinite values before computation (e.g in FitTrace).
Omit columns with nonfinite values when the mask is collapsed from 1D to 2D (current behavior for Extract)
Have a fill value of 0 for non-finite values
Interpolate between good values.

Proposal:

New arg mask_treatment on all operations (at least extractions, fit trace)
options = ['filter', 'omit', interpolate', 'zero-fill'], set to either 'filter' or 'omit' to maintain current behavior as default for each operations extract = specreduce.extract.HorneExtract(image-bg, trace, variance=var_array, nan_treatment='zero-fill')

cshanahan1 commented 1 year ago

@tepickering @eteq @kecnry @rosteen

tepickering commented 1 year ago

my take is that tracing and extraction are different use cases that should handle NaN's differently, or at least use different defaults. for extraction, being conservative and throwing out any column with any NaN's is an appropriate default. otherwise columns with less valid data can create artificial absorption features. doing anything other than that at best complicates determining uncertainties.

tracing, however, is often defined independent of any given data. FlatTrace being one example. future examples will include edge detection in flat-field images for multi-slit or multi-order data.

FitTrace is kind of a special case that uses the data itself. since it's a process that already involves interpolation/extrapolation, it can be a lot more lenient in how it handles masked data. either interpolating NaNs or using np.nansum to bin along the dispersion axis seem appropriate.

i should also note that saturated values probably shouldn't always be masked as NaN's. they are numbers and you know the lower limit to their actual values. so there is information there that can be used up to a point. in the case of FitTrace, including saturated values when centroiding can often lead to better results than leaving them masked.

cshanahan1 commented 11 months ago

I am currently working on a PR to add a new argument to all specreduce operations called mask_treatment with two implemented options - 'omit' and 'zero-fill'. A follow up effort to add an 'interpolate' option will be next. Thoughts?

astropy / specreduce

Improvements to dealing with masked/nonfinite data in Specreduce operations #192

Current issue(s) with masking

Options for treating NaNs