Closed jsdillon closed 3 years ago
Merging #390 (57153bd) into master (a15c511) will increase coverage by
0.00%
. The diff coverage is100.00%
.
@@ Coverage Diff @@
## master #390 +/- ##
=======================================
Coverage 97.06% 97.06%
=======================================
Files 8 8
Lines 3098 3101 +3
=======================================
+ Hits 3007 3010 +3
Misses 91 91
Impacted Files | Coverage Δ | |
---|---|---|
hera_qm/xrfi.py | 99.78% <100.00%> (+<0.01%) |
:arrow_up: |
Continue to review full report at Codecov.
Legend - Click here to learn more
Δ = absolute <relative> (impact)
,ø = not affected
,? = missing data
Powered by Codecov. Last update a15c511...57153bd. Read the comment docs.
I like this solution Josh. Could you add a unit test, or paste in some plots showing the solution fixes the issue?
Here's the waterfall I'm looking at (it's the anntenna averaged autocorrelation):
Here's the metric before and after:
The white is where the old metric is np.inf
.
When performing
detrend_medfilt
on a very smooth function (i.e. with a near-constant slope and little noise), I was getting a lot of modified z-scores ofnp.inf
. This arose because the median pixel in the kernel was often the pixel of interest. This led to residuals of 0. With high enough density of 0 residuals, doing the median filter of the residual squared to estimate the denominator of the modified z-score led to a lot of zeros in the denominator and thus infinite z-scores. This wasn't a problem for most XRFI tasks in the current pipeline, which were noisy and also had both real and imaginary parts that both had to match, but I was recently trying to find RFI in H4C antenna-averaged autocorrelations (which is very low noise) and it was flagging large RFI-free regions as a result of theseinf
s.This PR removes the pixel of interest from the median calculation using
scipy.ndimage.median_filter
'sfootprint
kwarg (we moved to using that function in #389, since it's slightly faster and I was getting weird segfaults with the old one). This makes it far less likely that the residual is 0 in a data with any kind of noise.Preliminary tests indicate that it has no impact on runtime.