Open cshanahan1 opened 1 year ago
@tepickering @eteq @kecnry @rosteen
my take is that tracing and extraction are different use cases that should handle NaN's differently, or at least use different defaults. for extraction, being conservative and throwing out any column with any NaN's is an appropriate default. otherwise columns with less valid data can create artificial absorption features. doing anything other than that at best complicates determining uncertainties.
tracing, however, is often defined independent of any given data. FlatTrace
being one example. future examples will include edge detection in flat-field images for multi-slit or multi-order data.
FitTrace
is kind of a special case that uses the data itself. since it's a process that already involves interpolation/extrapolation, it can be a lot more lenient in how it handles masked data. either interpolating NaNs or using np.nansum
to bin along the dispersion axis seem appropriate.
i should also note that saturated values probably shouldn't always be masked as NaN's. they are numbers and you know the lower limit to their actual values. so there is information there that can be used up to a point. in the case of FitTrace
, including saturated values when centroiding can often lead to better results than leaving them masked.
I am currently working on a PR to add a new argument to all specreduce operations called mask_treatment
with two implemented options - 'omit' and 'zero-fill'. A follow up effort to add an 'interpolate' option will be next. Thoughts?
This issue is meant to be a catchall issue to summarize meetings and internal tickets at Space Telescope on the topic of improving masks / treatment of NaNs in data across the package. Link any related issues here, and the floor is open for discussion on this!
Current issue(s) with masking
In both Horne and Boxcar extract, 2D masks are allowed as input on NDData. However, these masks are then collapsed to 1D and entire columns are excluded in the presence of one NaN. This has been creating issues for JWST data which are littered with NaNs, so there should be an options to treat this to avoid holes in extractions/traces. (https://github.com/astropy/specreduce/issues/167)
In FitTrace, fits to individual columns within a bin filter masked values, and fall back to an the 'all-bin fit' when fully masked (which is different default behavior than Extract). This can produce strange results, and would benefit from additional masking options (interpolation, or setting to 0).
Options for treating NaNs
In all operations where masking is relevant (trace / extract at least), provide options for treatment of NaNs:
Proposal:
mask_treatment
on all operations (at least extractions, fit trace)extract = specreduce.extract.HorneExtract(image-bg, trace, variance=var_array, nan_treatment='zero-fill')