Comparison to `ngmix.example.metacal.metacal.py`

EiffL commented 2 years ago

This PR #37 adds a script that runs autometacal within the ngmix example script for direct comparison against ngmix's finite differences.

With default parameters I'm getting:

1000/1000 100%

S/N: 80161
-------------------
ngmix results:
R11: 0.351127
m: -0.000182258 +/- 0.000409339 (99.7% conf)
c: 9.31003e-07 +/- 4.1336e-06 (99.7% conf)
-------------------
autometacal results:
R11: 0.355601
m: -0.0126295 +/- 0.00031753 (99.7% conf)
c: 1.71161e-07 +/- 3.14146e-06 (99.7% conf)

So we see a discrepancy in m, but we keep in mind that:

we have validated the measurement of moments against ngmix with tests/test_tf_ngmix.py
we know that our autodiff derivatives are wrong when using linear interpolation, something that is being fixed in #26

I would attribute the difference in multiplicative bias to the difference in response matrix, and ultimately at this stage to our faulty interpolation gradients.

@andrevitorelli could you review the code in this PR #37 and try to run this script with your cubic interpolation code?

EiffL commented 2 years ago

And for fun, I did a timing comparison :-) autometacal is about 65x faster than ngmix

andrevitorelli commented 2 years ago

@EiffL if I understood correctly, the nb should reproduce these exact results? I'm not getting them, I might be missing something.

EiffL commented 2 years ago

hummmm which notebook?

EiffL commented 2 years ago

if you mean running the script in this PR: https://github.com/CosmoStat/autometacal/pull/37 you should get different results from mine, because my R matrix is wrong because I don't have the cubic interpolation code.

andrevitorelli commented 2 years ago

Hey, @EiffL , do you think we could have a consistent definition of ResamplingType in galflow? The way I usually do this is define a hidden variable that's used everywhere throughout the package, and if it needed be, I set it externally when doing experiments. There are many places where ResamplingType is used throughout and at times it is tough to keep track if I've set them up properly.

In any case, with our cubic interp, the results are:

autometacal results:
R11: 0.34922
m: 0.00541301 +/- 0.000323341 (99.7% conf)
c: 1.74688e-07 +/- 3.19889e-06 (99.7% conf)

...which are closer but not there yet.

EiffL commented 2 years ago

humm the simple way you would typically do this is just by choosing a convention, and making it the default in all functions, so that you don't have to think about it. In which part of the code do you need to switch ResamplingType? In GalSim for instance, the default is Quintic and you never have to worry about it.

EiffL commented 2 years ago

Hummmm interesting..... and slightly worrying ^^' I don't know why we would be getting differing results.....

EiffL commented 2 years ago

This would mean that either our shear values, and/or our gradients are wrong :-/

andrevitorelli commented 2 years ago

Yes, exactly... ...so, in many functions as gf.convolve.convolve, gf.python.tfutils.transformer.perspective_transform... they are defined, separately. My idea was to have a single variable as gf._ResamplingType_that would be initalised and used throughout the code. In this way you'd never need to worry that they are all used consistently. And if you ever come across something that you want to do with a different interpolator, as an experiment you just callgf._ResamplingType_ = tf.image.ResamplingType.YOURINTERPOLATOR and run with it.

andrevitorelli commented 2 years ago

This would mean that either our shear values, and/or our gradients are wrong :-/

Is this it or we just actually need quintic? Because we have improved by an order of magnitude...

EiffL commented 2 years ago

Yeah it could be that we need quintic...

There is one way to know I guess. We can use our code but compute finite differences instead of autodiff response (like what you are doing in your comparison notebooks). If the bias stays roughly the same, it means that it most likely comes from the interpolation order. If the bias disappears, it means that our gradients are wrong. If it becomes larger, well, we can't conclude much....

andrevitorelli commented 2 years ago

Thus spake The Code:

-------------------
ngmix results:
R11: 0.351127
m: -0.000182258 +/- 0.000409339 (99.7% conf)
c: 9.31003e-07 +/- 4.1336e-06 (99.7% conf)
-------------------
autometacal results:
R11: 0.34922
m: 0.00541301 +/- 0.000323341 (99.7% conf)
c: 1.74688e-07 +/- 3.19889e-06 (99.7% conf)
finitediff results:
R11: 0.349435
m: 0.00479424 +/- 0.000323172 (99.7% conf)
c: 1.74447e-07 +/- 3.19686e-06 (99.7% conf)

Stepsize in finitediff was 0.01.

andrevitorelli commented 2 years ago

w/ stepsize = 0.002, we get closer to autometacal, as we expected from previous investigations:

autometacal results:
R11: 0.34922
m: 0.00541301 +/- 0.000323341 (99.7% conf)
c: 1.74688e-07 +/- 3.19889e-06 (99.7% conf)
finitediff results:
R11: 0.349269
m: 0.00527089 +/- 0.000323326 (99.7% conf)
c: 1.74529e-07 +/- 3.19838e-06 (99.7% conf)

EiffL commented 2 years ago

Ok, so, this is great, what can we conclude from this test?

The gradients of the cubic interpolation and the whole autocal chain seem to be correct, because R11 is converging to the same value.
It means that something is not quite right in our implementation of the metacal images and/or shape measurement. It's not a huge effect because otherwise it would also show up in our tests test_metacal.py and/or tests/test_tf_ngmix.py. It could be due to the interpolation kernel not being accurate enough.

So, the residual m bias can come from two things:

Misestimation of the R11 factor: And the evidence would seem to point towards this because we find a significantly R11 value compared to ngmix.
Misestimation of the g values (i.e. measured ellipticity without correction). We can try to make sure that this second point is ruled out by printing the "pre-correction" g value as measured by ngmix and autometacal. We should find that they agree, and thus rule out issues in the FFT, inverse FFT, and moments measurements.

andrevitorelli commented 2 years ago

~~Using Bernstein & Gruen 2014 quintic interpolation we have:~~

ngmix results: R11: 0.351127 m: -0.000182258 +/- 0.000409339 (99.7% conf) c: 9.31003e-07 +/- 4.1336e-06 (99.7% conf) ------------------- autometacal results: R11: 0.351393 m: -0.000804964 +/- 0.000321342 (99.7% conf) c: 1.73607e-07 +/- 3.17911e-06 (99.7% conf) finitediff results: R11: 0.351884 m: -0.00219971 +/- 0.000320923 (99.7% conf) c: 1.73232e-07 +/- 3.17461e-06 (99.7% conf)

EiffL commented 2 years ago

oh oh oh, interesting :-) Does that mean quintic in TensorFlow?

It's just a bit surprising taht the finite diff doesnt converge to the same value :-/ hummm

andrevitorelli commented 2 years ago

After a more careful check of interpolations, the current result is (for an unrealistically high SNR (~1e7) galaxy - just for the sake of unit testing!)

-------------------
ngmix results:
R11: 0.34145
m: 0.000118535 +/- 0.000112539 (99.7% conf)
c: -1.37714e-07 +/- 1.19257e-06 (99.7% conf)
-------------------
finitediff results:
R11: 0.341446
m: 0.00016892 +/- 0.000142665 (99.7% conf)
c: -4.46461e-07 +/- 1.46364e-06 (99.7% conf)

autometacal results:
R11: 0.341706
m: -0.000592995 +/- 0.000142565 (99.7% conf)
c: -4.45364e-07 +/- 1.46246e-06 (99.7% conf)

Finite differences in autometacal are really close to the ngmix (all of them using Bernstein &Gruen 2014 'quintic' interpolation).

Now, into the ~~devil~~ details....

The R11 value differs between ngmix and finite differences in autometacal by 0.0012%. However, this corresponds to a difference of 42.5% in m! Why? And is this a problem? Let's check the values of the measured shear before calibration:

ngmix: 
g1 = 0.003414903118253962
autometacal finitediff:
g1 = 0.0034150328

They differ by 0.0038% (Note the truncation due to single precision.) After calibration we have:

ngmix: 
g1/R11 = 0.010001186106481666
autometacal finitediff:
g1/R11 = 0.010001689

...corresponding to a difference of 0.005%.

However, the remaining m with autometacal finitediff is still further away from zero then 3 sigma.

Now, with realistic noisy (SNR ~ 100, galsim def) galaxies:

-------------------
ngmix results:
R11: 0.343779
m: 0.0188038 +/- 0.265366 (99.7% conf)
c: 0.000885843 +/- 0.00271103 (99.7% conf)
-------------------
finitediff results:
R11: 0.343759
m: 0.0189779 +/- 0.265386 (99.7% conf)
c: 0.00088561 +/- 0.00271115 (99.7% conf)

autometacal results:
R11: 0.344023
m: 0.0181959 +/- 0.265183 (99.7% conf)
c: 0.000884929 +/- 0.00270907 (99.7% conf)

Now R11 differs by 0.006%, but m differs by only 0.9%. The unrealistic high SNR example makes m very sensitive.

Finally, as it is, autometacal gives marginally better results (~3%) in the realistic SNR example (the residual m is slightly less).

EiffL commented 2 years ago

Nice results @andrevitorelli! So, the thing to look at is not necessarily the % difference between the m values, but whether or not you detect significant non zero m.

In the case of ngimx high SNR, you have m: 0.000118535 +/- 0.000112539 which means that for that sample size, the multplicative bias is consistant with zero at 1 sigma. However, in the case of autometacal high SNR, you have m: -0.000592995 +/- 0.000142565 (99.7% conf) so you detect at over 5 sigma level a non zero m

For the snr=100 sample you just cannot conclude anything, the m measurement is one order of mag smaller than the error bar, so it's all consistant with zero m, but at the 10^-1 level. You need a looooot more samples to reduce the error bars in this case.

EiffL commented 2 years ago

Given that your finitediff impl. seems consistent with ngmix, this would seem to indicate that there is a small problem somewher in a gradient.

To make sure you can further reduce the error bars in your high snr test by increasing the sample size.

EiffL commented 2 years ago

This is solved, right? We can close it?

andrevitorelli commented 2 years ago

Yes, def.

CosmoStat / autometacal

Comparison to `ngmix.example.metacal.metacal.py` #36