Adding new baseline fitting method

jakeown commented 6 years ago

This is the baseline fitting method we use in KEYSTONE, which uses a sliding window of width 31 channels (adjustable parameter) to calculate a “local” standard deviation centred on each channel. The channels at the centre of the local windows with the lowest standard deviations (the lowest 40% are selected by default, but this is also adjustable) are then used for a baseline fit up to third order. The idea being that windows with high standard deviation contain emission lines or noise spikes and those are excluded from the fit. The reduced chi squared values for each of the three polynomial fits is then compared to select the “best-fit” model, which is then subtracted from the original spectrum.

I have tested this method on several GAS regions that suffered from negative baselines and it has solved those issues. Here is a comparison of an averaged spectrum from OrionB of the previous GAS rebaselined map (top panel) and the new method (bottom panel).

rfriesen commented 6 years ago

@jakeown I've tested this on Orion A and the individual spectra look really great (apart from toward OMC1, but none of our baseline functions work well there as there's very little/no 'baseline' to fit). Thanks!

I've noticed that when you average spectra together, though, I do find negative features. Since the negative features aren't obvious at the pixel scale, though, I'm not sure how to solve the problem.

oriona_compare_baselines_2

low-sky commented 6 years ago

This must just be the mask including some signal values into the fit data. Dropping mask cut to be 0.3 may help a bit. Or maybe expand the window size?

I'm just parking this thought here. It's possibly worth trying a baseline + ammonia model fit to the spectra and then subtracting off the baseline component and retaining the ammonia parameters. It would be as hacky as our other methods but it would at least go a great job of sorting out which wiggles are signal and which are baseline.

keflavich commented 6 years ago

That latter approach can be done, but it requires a little modification of the model. To me, the baseline & signal in these data look separable enough that we shouldn't have to resort to that.

jakeown commented 6 years ago

Yes, let's try tweaking the mask_percent parameter and window_size parameter. The current value of 40% for mask_percent might be too high for these spectra, including some signal channels in the fits.

rfriesen commented 6 years ago

I played around with using a lower mask_percent value (0.2, 0.3), and it doesn't improve things much in OrionA. I also tried changing the window_size parameter but got hit with a bunch of errors - if you'd like to take a look when you have a chance, @jakeown, that would be great.

jakeown commented 6 years ago

@rfriesen Sorry the window_size option wasn't working. My original code had that parameter hard-coded into the functions as 31, but when I decided to change it to an adjustable parameter there were some places I forgot to adjust to provide that functionality. I fixed those problems with my latest commits and things should run smoothly for you now. That being said, I also re-ran the fitter on a cropped version of the OrionA map surrounding the region you highlighted with bad fits. I used a window_size of 41, but the results were essentially the same. I've noticed that just averaging over bright emission regions, the baselines look more-or-less flat, but when averaging over bright+faint regions, then those negative baselines arise. So is this maybe an issue more with the fits on the faint spectra rather than the bright spectra?

rfriesen commented 6 years ago

Ah, that could be it - low level line emission may not have "high" enough standard deviation to be avoided in the baseline fitting, but is still influencing the resulting baseline fit. In fact, if I look at individual pixels in Orion A with faint line emission, I can see that the baselines show some odd features in places. I'll look specifically at the faint regions with varying combinations of window_size + mask_percent.. even for faint emission, I think your method should still find the best line-free channels.

rfriesen commented 6 years ago

Hey @jakeown, I've looked into varying the window size (thanks for the update!) and mask percentage, but am just not having much luck removing the negative features in the faint emission regions. For the most part, this is only really clear when averaging spectra over a wider area, but can also be seen in individual spectra with faint, broad emission. Is this just an issue in the GAS data? Do you see any of this toward fainter regions in the KEYSTONE data? I'm happy to merge but for now will stick with our defined windows for baselining for GAS.. I've removed an extra baseline step that was added in somewhere to fix the negative features we ended up with in some of the DR2 files.

jakeown commented 6 years ago

@rfriesen I don't think this issue is GAS-specific since I can also see some cases when average spectra over larger areas results in slightly negative baselines. When I only look at bright regions, the averaged spectra have good baselines. But when I average over bright+faint regions I can sometimes see negative dips over the range of the spectrum that contains the hyperfine components. See some cases below.

Still not sure about a solution to this problem, but it seems like the cause is bad fits to the faint pixels. Maybe we need to implement a baseline fit over averaged pixels?

low-sky commented 6 years ago

I'm leaving an idea here that may be worth exploring, namely that the baseline should be relatively stable across a single scan across the source and so it may be worth making a frequency vs. scan integration function to get more of a sense of what the baseline looks like before gridding. Then bright pixels could use information from blank pixels to improve their baseline solution.

Practically, this is probably overkill and I think that a hybrid solution of robust signal identification (@jakeown) combined with defined windows (GAS strategy) would solve things.

GBTAmmoniaSurvey / GAS

Adding new baseline fitting method #179