Closed annayqho closed 9 years ago
I don't agree with this. Let's just do (1): Estimate the "continuum" by doing the gaussian smooth, inverse-variance weighted. Nothing else. Let's start with that.
With my current implementation, this takes about 30 seconds per spectrum. This seems infeasible if I'm going to ultimately be dealing with ten thousand spectra--just for the training step.
DWH may hate this but I propose you could try just with gaussian_filter_1d(flux, no_pixels) I believe this will be ~ as good.
That will be much faster, but it won't ignore low-ivar pixels, and it will be a gaussian in pixel space not wavelength space. But yes, it's worth a try. So is my matrix trick, which I put in some other issue.
Wait -- what is taking 30 seconds -- the median smooth? Because we don't ever need to do that one.
No, it's the Gaussian smoothing that takes 30 seconds per spectrum. I'll try both the matrix trick and the simple gaussian filter.
Once you have the matrix trick implemented, I will want to look at (a) code and (b) a profile.
In TheCannon/continuum_normalization.py, The Gaussian weight matrix is constructed on line 32 in the function gaussian_weight_matrix The weighted mean spectrum is computed on lines 110-111 in the function _find_cont_gaussian_smooth The actual normalization is performed in the function _cont_norm_gaussian_smooth
Code below, too: def gaussian_weightmatrix(wl, L): return np.exp(-0.5(wl[:,None]-wl[None,:])_2/L2) w = gaussian_weight_matrix(dataset.wl, L=50) val = (dataset.tr_ivar \ dataset.tr_flux).T cont = (np.dot(w,val) / np.dot(w,dataset.tr_ivar.T)).T
where L = 50 (>> line width, << spectrum width) dataset.wl is the wavelengths of the spectrum dataset.tr_ivar and dataset.tr_flux are the inverse variance and flux blocks respectively
okay, except for some crazy .T stuff, I think that's okay. Let's close this for now.
Using a running median to identify continuum pixels for the pseudo continuum normalization step doesn't make sense for raw LAMOST spectra, because of their overall shape. So, the new approach should be:
1) Use a running Gaussian kernel to smooth the spectra and get rid of the large-scale shape.
2) Identify pseudo continuum pixels using a running median, as we did with APOGEE. Normalize.
3) The above step is SNR-dependent. For SNR-independent continuum normalization, identify continuum pixels using a cut to median and variance flux (instead of running through The Cannon). Normalize again.
Here's how to find the smoothed Gaussian spectrum: For each pixel, there is a Gaussian distribution of weights centered on that pixel with width L, where to start we can set L to a few times the width of the magnesium line (we want it to be >> the width of one line and << the length of the spectrum). Then the "mean" flux at that pixel is the sum of the Gaussian weight * ivar * flux over all pixels, divided by the sum of the weight * ivar over all pixels. The weight is exp[-0.5*(lambda_i - lambda_0)^2 / L^2]. Once you have that mean value, divide the original flux by it.