LAMOST continuum normalization

annayqho commented 9 years ago

Using a running median to identify continuum pixels for the pseudo continuum normalization step doesn't make sense for raw LAMOST spectra, because of their overall shape. So, the new approach should be:

1) Use a running Gaussian kernel to smooth the spectra and get rid of the large-scale shape.

2) Identify pseudo continuum pixels using a running median, as we did with APOGEE. Normalize.

3) The above step is SNR-dependent. For SNR-independent continuum normalization, identify continuum pixels using a cut to median and variance flux (instead of running through The Cannon). Normalize again.

Here's how to find the smoothed Gaussian spectrum: For each pixel, there is a Gaussian distribution of weights centered on that pixel with width L, where to start we can set L to a few times the width of the magnesium line (we want it to be >> the width of one line and << the length of the spectrum). Then the "mean" flux at that pixel is the sum of the Gaussian weight * ivar * flux over all pixels, divided by the sum of the weight * ivar over all pixels. The weight is exp[-0.5*(lambda_i - lambda_0)^2 / L^2]. Once you have that mean value, divide the original flux by it.

davidwhogg commented 9 years ago

I don't agree with this. Let's just do (1): Estimate the "continuum" by doing the gaussian smooth, inverse-variance weighted. Nothing else. Let's start with that.

annayqho commented 9 years ago

With my current implementation, this takes about 30 seconds per spectrum. This seems infeasible if I'm going to ultimately be dealing with ten thousand spectra--just for the training step.

mkness commented 9 years ago

DWH may hate this but I propose you could try just with gaussian_filter_1d(flux, no_pixels) I believe this will be ~ as good.

davidwhogg commented 9 years ago

That will be much faster, but it won't ignore low-ivar pixels, and it will be a gaussian in pixel space not wavelength space. But yes, it's worth a try. So is my matrix trick, which I put in some other issue.

davidwhogg commented 9 years ago

Wait -- what is taking 30 seconds -- the median smooth? Because we don't ever need to do that one.

annayqho commented 9 years ago

No, it's the Gaussian smoothing that takes 30 seconds per spectrum. I'll try both the matrix trick and the simple gaussian filter.

davidwhogg commented 9 years ago

Once you have the matrix trick implemented, I will want to look at (a) code and (b) a profile.

annayqho commented 9 years ago

In TheCannon/continuum_normalization.py, The Gaussian weight matrix is constructed on line 32 in the function gaussian_weight_matrix The weighted mean spectrum is computed on lines 110-111 in the function _find_cont_gaussian_smooth The actual normalization is performed in the function _cont_norm_gaussian_smooth

Code below, too: def gaussian_weightmatrix(wl, L): return np.exp(-0.5(wl[:,None]-wl[None,:])_2/L2) w = gaussian_weight_matrix(dataset.wl, L=50) val = (dataset.tr_ivar \ dataset.tr_flux).T cont = (np.dot(w,val) / np.dot(w,dataset.tr_ivar.T)).T

where L = 50 (>> line width, << spectrum width) dataset.wl is the wavelengths of the spectrum dataset.tr_ivar and dataset.tr_flux are the inverse variance and flux blocks respectively

davidwhogg commented 9 years ago

okay, except for some crazy .T stuff, I think that's okay. Let's close this for now.

annayqho / TheCannon

LAMOST continuum normalization #56