Keck-DataReductionPipelines / MosfireDRP

http://keck-datareductionpipelines.github.io/MosfireDRP
10 stars 13 forks source link

Optimal extraction algorithm not working and proposed fix #126

Open calvarez17 opened 5 years ago

calvarez17 commented 5 years ago

The issue

The optimal extraction algorithm relies on calculating a polynomial fit to the spatial profiles using a sigma-clipping algorithm. The spatial profile fit (one per row within a given aperture) is used as a weighting function for the optimal extraction algorithm. Pixels in the spatial profiles that are rejected by the sigma-clipping algorithm should be given a weight of zero. However, the current algorithm gives non-zero values to the rejected pixels. As a consequence, the optimally-extracted spectra are extremely noisy.

The following plot shows the K-Band 1D spectrum of the standard star HIP98640 extracted using the optimal extraction algorithm included in mospy:

hip98640_k_long_1d_00_bad

As a comparison, the following plot shows the 1D spectrum of the same star extracted by collapsing all the lines on the same aperture as the one used for the optimally-extracted spectrum:

hip98640_k_long_1d_00_stdx

The following plot shows the aperture used for both extractions:

hip98640_k_long_1d_00_aperture

The base 2D rectified spectrum was the same for both extractions (HIP98640_K_long_eps.fits).

The following plots show some examples of spatial profile fits within the extraction aperture (rows 2, 9, 10, 11 and 18). Green dots represent the spatial profile and the red line represents the corresponding polynomial fit. The red "delta" spikes correspond to values given by the fitting algorithm to pixels rejected by the sigma-clipping algorithm. The rejected pixels should have a value of zero on the fit. Instead, the plots below show that the rejected pixels have non-zero values on the fit:

hip98640_k_long_2d_00_p_init_02

hip98640_k_long_2d_00_p_init_09

hip98640_k_long_2d_00_p_init_10

hip98640_k_long_2d_00_p_init_11

hip98640_k_long_2d_00_p_init_18

The issue seems to be aggravated close to the trace peak (lines 9, 10 and 11).

Possible solution

The following two modifications to the Extract.py code produce optimally-extracted spectra of higher quality than spectra extracted by collapsing all the rows in an aperture:

Original code:

resid = (DmS[i]-f*srow)**2 / V[i] newmask = (resid > sigma)

Corrected code based on Equation 7, Table 1 in Horne (1986)

resid = (DmS[i]-f*srow)**2 / V[i] newmask = (resid > sigma**2)

The result of fitting.LinearLSQFitter():

fitted_poly = fit_poly(poly0, xcoord[~xcoord.mask], srow[~srow.mask], weights=weights[~weights.mask]) fit = np.array([fitted_poly(x) for x in xcoord])

is not a masked array, but a regular numpy array where the masked values are filled with either the maximum or the minimum of the fit. I have the impression that this is a bug in fitter.LinearLSQFitter(), because one would expect that if the input is a masked array, then the output should also be a masked array.

I have replaced the output from the fit by a masked array with the same mask as the input array and then fill the masked values with zeros:

In the original code:

fit[(fit < 0)] = 0.0 Pnew[i] = fit

In the new code:

fit[(fit < 0)] = 0.0 fit_masked = np.ma.MaskedArray(data=fit, mask=weights.mask) fit_masked_zeroes = np.ma.filled(fit_masked, 0.0) Pnew[i] = fit_masked_zeroes

The optimally-extracted spectrum using the version of Extract.py containing the modifications described above is shown on the following plot:

hip98640_k_long_1d_00_optx

This spectrum looks considerably better than the optimally-extracted spectrum using the original version of Extract.py and also moderately better than the spectrum extracted by collapsing all the rows within the aperture.

stinyanont commented 4 years ago

Why is this bug still not fixed in the official pipeline? It's literally 3 lines and it cleaned up the reduction nicely.