astropy / astroscrappy

Speedy Cosmic Ray Annihilation Package in Python
70 stars 34 forks source link

WIP: Support spectra & noise propagation #32

Closed jehturner closed 3 years ago

jehturner commented 7 years ago

This is the same PR as https://github.com/cmccully/lacosmicx/pull/1, rebased on astroscrappy, so we can close that old one. It doesn't yet implement the intended behaviour summarized in that PR, so do not merge yet.

jehturner commented 7 years ago

Regarding this:

  • You'd like to get rid of pssl and my bgsub and replace both of them with bkg, which can accept either a fixed background level (like pssl) or an array of model values to be subtracted (like bgsub, mainly for spectroscopy).

There's actually no reason to support subtracting a constant background internally, is there? You were probably just proposing to get rid of pssl altogether for imaging and require the user to supply indat with its background still included, unless also passing a noise image.

coveralls commented 7 years ago

Coverage Status

Coverage remained the same at 100.0% when pulling 110f16243b2f037160c0d653f5e4f95acac801f4 on jehturner:spectroscopy into acaa02eb6b55a2488d929ced2c5d839f78db56f5 on astropy:master.

coveralls commented 7 years ago

Coverage Status

Coverage remained the same at 100.0% when pulling d628fc81b48a1ae09f383eb468da7dd174ea4fcc on jehturner:spectroscopy into 750cc76fee4a25e4b63e9a585f50c232ffa57609 on astropy:master.

jehturner commented 7 years ago

Does this look like what you had in mind, @cmccully? I have not added a noise argument yet, but this removes the pssl argument -- instead expecting indat to have its sky background included -- and renames my bgsub to bkg, an optional array that gets subtracted temporarily from the data during detection (I don't see an obvious reason to subtract a constant for imaging but one could).

jehturner commented 6 years ago

For completeness, I've had a go at adding a var parameter that accepts a user-supplied variance array, though perhaps it should go its own PR. Anyway, on reflection, it may be less useful than I had been thinking, given the following considerations:

  1. If bkg contains all the signal that has been subtracted prior to running astroscrappy then there isn't really any additional information about the noise in var.

  2. Like the main data array, I think var needs its own median filtering and cleaning, to determine what the noise would be in the absence of CRs (which is obviously a bit more processing).

  3. If (unlike the main data array) var is filtered without subtracting & restoring bkg, I think that can lead to less accurate noise structure by blurring out fine background details (such as sky lines or IFU fibres).

  4. In the case I've tested so far, there was very little difference in the cleaned result between using var or just indat, with only ~0.01% of pixels having significant differences. Those cleaned pixels appear similar but a little noisier when using var than they do with the original noise algorithm.

jehturner commented 6 years ago

I'll push what I did so you can see it. We can always revert that commit if needed...

I'm not returning a noise image here, but I suppose cleanvar could be added to the return tuple? I tend to re-clean my data after running LA Cosmic anyway, since I find it does a better job of identifying the CRs than removing them (eg. compared with local interpolation), so crmask seems like the most useful return value.

jehturner commented 5 years ago

Rebased as requested.

jehturner commented 5 years ago

Not sure what is going on with the first, Python 2.7 build test? It seems to install Python 3.7 with Conda and then fail, but I don't think it's relevant to this PR.

jehturner commented 5 years ago

Before answering your detailed suggestions (thanks), I'd like to point out that I was leaning towards reverting my last commit (see comment of 16 April). Input variance was something we discussed on Slack that seemed like a good idea at the time, but once I actually implemented var, I realized that it doesn't work as well as the original LACosmic noise estimate, for 2 reasons: First, var has no equivalent of the bkg image for indat, so continuum/sky structure gets left in the variance when filtering for cosmic rays, producing noise estimates that are less, rather than more, accurate. Second, bkg should help ensure that all the signal gets accounted for anyway when producing LACosmic's usual noise estimate, because it includes any "missing" flux that a separate variance array would otherwise tell you about. So I would probably just get rid of var to avoid overcomplicating things, but obviously if you want to keep it then we can. Admittedly, I'm mostly interested in spectroscopy, where bkg is going to be essential, whereas for imaging bkg is more likely to be a scalar and then you might get slightly more accurate noise structure using a var image as well (off the top of my head). Does that make sense?

jehturner commented 4 years ago

In our meeting earlier this week, @cmccully mentioned that he's keen to retain variance input as an option because he has a use case (I think for echelle data) where prior processing produces a non-trivial correspondence betweeen indat & var. I've therefore had another look at what's needed to address the limitations I mentioned above on 16 April 2018, but I've had quite a hard time remembering what I was doing here after 3 years...

The good news is that I think the code here already does most of the duplicate processing necessary to use input variance accurately and is "only" missing subtraction/restoration of the background structure (as for indat).

The problem is simply how to go about subtracting the background from the variance. Unless the only processing done prior to cosmic ray removal is bias subtraction and maybe stacking (in which case you don't really need var, because the original algorithm can derive the same thing from indat), the levels of background structure in var and indat may differ, even if just by a scaling factor or similar. AstroScrappy doesn't know anything about that, so the caller would be required to provide, say, bkg and varbkg separately, which is already getting a bit messy. Moreover, at present the run time for spectroscopy is already dominated by background fitting rather than by detect_cosmics itself, so if the calling function (equivalent to lacos_spec.cl) has to fit the sky+continuum for both indat & var separately, that's quite a large overhead. Alternatively, the caller might know enough about previous processing to convert bkg into varbkg directly, but that might sacrifice generality and/or produce a convoluted API -- I think it would be good to understand your use case a bit better here. And if the caller doesn't get this exactly right, things are liable to go (at least slightly) wrong without it being obvious.

So what are your thoughts on how exactly var would be used? I think I'm fairly clear on how detect_cosmics would need to be modified if we know how we want it to look.

cmccully commented 4 years ago

Can you rebase this into master @jehturner ?

jehturner commented 4 years ago

Rebase from master? OK.

jehturner commented 4 years ago

That was suspiciously easy... Anyway, the diff is just 1 file now.

codecov[bot] commented 4 years ago

Codecov Report

Merging #32 into master will decrease coverage by 1.93%. The diff coverage is 47.50%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master      #32      +/-   ##
==========================================
- Coverage   94.88%   92.94%   -1.94%     
==========================================
  Files           7        7              
  Lines        1016     1049      +33     
  Branches       53       53              
==========================================
+ Hits          964      975      +11     
- Misses         52       74      +22     
Impacted Files Coverage Δ
astroscrappy/astroscrappy.pyx 69.29% <47.50%> (-5.83%) :arrow_down:
astroscrappy/utils/medutils.c 100.00% <0.00%> (ø)

Continue to review full report at Codecov.

Legend - Click here to learn more Δ = absolute <relative> (impact), ø = not affected, ? = missing data Powered by Codecov. Last update fd4acfe...0d45f31. Read the comment docs.