HERA-Team / hera_qm

HERA Data Quality Metrics
MIT License
2 stars 2 forks source link

Remove center pixel from medfilt footprint #390

Closed jsdillon closed 3 years ago

jsdillon commented 3 years ago

When performing detrend_medfilt on a very smooth function (i.e. with a near-constant slope and little noise), I was getting a lot of modified z-scores of np.inf. This arose because the median pixel in the kernel was often the pixel of interest. This led to residuals of 0. With high enough density of 0 residuals, doing the median filter of the residual squared to estimate the denominator of the modified z-score led to a lot of zeros in the denominator and thus infinite z-scores. This wasn't a problem for most XRFI tasks in the current pipeline, which were noisy and also had both real and imaginary parts that both had to match, but I was recently trying to find RFI in H4C antenna-averaged autocorrelations (which is very low noise) and it was flagging large RFI-free regions as a result of these infs.

This PR removes the pixel of interest from the median calculation using scipy.ndimage.median_filter's footprint kwarg (we moved to using that function in #389, since it's slightly faster and I was getting weird segfaults with the old one). This makes it far less likely that the residual is 0 in a data with any kind of noise.

Preliminary tests indicate that it has no impact on runtime.

codecov[bot] commented 3 years ago

Codecov Report

Merging #390 (57153bd) into master (a15c511) will increase coverage by 0.00%. The diff coverage is 100.00%.

Impacted file tree graph

@@           Coverage Diff           @@
##           master     #390   +/-   ##
=======================================
  Coverage   97.06%   97.06%           
=======================================
  Files           8        8           
  Lines        3098     3101    +3     
=======================================
+ Hits         3007     3010    +3     
  Misses         91       91           
Impacted Files Coverage Δ
hera_qm/xrfi.py 99.78% <100.00%> (+<0.01%) :arrow_up:

Continue to review full report at Codecov.

Legend - Click here to learn more Δ = absolute <relative> (impact), ø = not affected, ? = missing data Powered by Codecov. Last update a15c511...57153bd. Read the comment docs.

adampbeardsley commented 3 years ago

I like this solution Josh. Could you add a unit test, or paste in some plots showing the solution fixes the issue?

jsdillon commented 3 years ago

Here's the waterfall I'm looking at (it's the anntenna averaged autocorrelation):

image

Here's the metric before and after: image

The white is where the old metric is np.inf.