HERA-Team / hera_pspec

HERA power spectrum estimation code and data formats
http://hera-pspec.readthedocs.io/en/latest/
BSD 3-Clause "New" or "Revised" License
5 stars 3 forks source link

Pseudo-Stokes Nsamples Accounting #391

Closed jsdillon closed 7 months ago

jsdillon commented 7 months ago

https://github.com/HERA-Team/hera_pspec/blob/fc06cf7d1b8ba370e4194b02eb49a1b08b757152/hera_pspec/pstokes.py#L148

I'll repeat what I said in Slack:

In the case where ee- and nn-polarized visibilities have different nsamples (because of different antennas being flagged on different days, for example), it’s not clear to me that this is the proper nsamples for computing the thermal noise. Consider a case where uvd1.nsample_array is 10 and uvd2.nsample_arrayis 1. The variance will be dominated by uvd2.nsample_array. So if we want nsamples to properly reflect the variance, then I think we want to do something like uvdS.nsample_array = 4 * (uvd1.nsample_array**-1 + uvd2.nsample_array**-1)**-1 where the 4 accounts for the fact that uvdS.data_array = uvd1.data_array / 2 + uvd2.data_array / 2. In other words, if the ee nsamples and the nn nsamples are equal, then the answer is just the sum of the nsamples. But if one of the two is zero, then the subsequent nsamples should also be 0 reflecting the fact that psuedo-I and psuedo-Q have infinite variance

jsdillon commented 7 months ago

I've verified that this is correct with a numerical experiment. I made Nsamples flat in time and frequency for both pols, but I gave one pol 10x the samples of the other and then generated pure thermal noise.

Here's what the current Nsamples for pseudo-I gives:

image

And here's what happens if you do 4 * (uvd1.nsample_array**-1 + uvd2.nsample_array**-1)**-1:

image