Finite differences converges to different value than automatic differentiation.

andrevitorelli commented 3 years ago

As it is right now, when doing finite differences, it's clear that the derivative converges as we go to smaller step sizes. However, when using an odd number of pixels (which at the moment is the correct choice, as shear is applied around an integer central pixel, and we cannot properly shift the image without better interpolation), the value it converges to is consistently lower than the one given by automatic differentiation.

I'll post more about here illustrating what we know about this bug, but in general, it seems that when generating the metacalibration image, the current code creates an image with less shear than what we ask of it.

EiffL commented 3 years ago

can you illustrate this with a plot :-) Better yet, a unit test?

EiffL commented 3 years ago

but these are 2 different questions, the question of whether we get the correct gradients, and the question of whether we apply the correct shearing.

andrevitorelli commented 3 years ago

but these are 2 different questions, the question of whether we get the correct gradients, and the question of whether we apply the correct shearing.

Yes, indeed they are. I'm still collecting all the behaviours I observe (I wanted to post it earlier today, but had a lot to do here). What I know by now is:

The gradients by finite differences always converge (ie., with smaller steps, they converge to a precise value)
When using an even stamp_size (which we know to be incorrect), they will always converge to the same value given by automatic differentiation
When using an odd stamp_size, they converge to other values, but not in a fixed rate in comparison to the automatic differentiation value
Neither the median nor the mean for all galaxies in finite diferentiation converge to the same value as the same as automatic differentiation.
The differences between the values from finite/auto derivatives are small for diagonal elements, way larger for off-diagonals.
In general, both methods, in "normal" situations (ie. reasonable level of noise), give smaller R{11} values than the correct one (unlike what we said before, yes, we CAN know the correct value, because our simple measurement method correlates very tightly with the known ellipticities in our simulated images - so we can get R{11} as m1+1 by linear regression.)
Also in general, both methods reach the same results, and both over-correct ellipticities (because they both report response values that are about 5% smaller than what they sould be)

EiffL commented 3 years ago

@andrevitorelli great investigative work :-) But can you either upload your notebook or upload the plots that support your conclusions? It's very useful for other contributors (i.e. me) to be able to look into problems quickly, from a starting point ;-)

Also, what would be the most constructive thing is to build unit tests, which can fail, highlihgting the potential issues with the code. Then we can work on fixes to these unit tests.

andrevitorelli commented 3 years ago

@EiffL Thanks, and yes, I am in the process of creating a unit test that covers the problem precisely. It will just test two things: 1) does the finite difference derivative converge? (it always does, actually, but still) 2) when it does, does it differ from the autodiff value by a large (ie > 1e-3) value?

I was checking what to expect from different examples.

EiffL commented 3 years ago

ok, so, a tiny bit of experimenting reveals the following. The tensorfow addon resampler seems to struggle to get derivatives around integer resampling points.

I've added a test notebook here: https://github.com/CosmoStat/autometacal/blob/u/EiffL/test_grads/notebooks/MetacalGradients.ipynb (Look at the last section, which should highlight the problem)

When the resampling points are at half pixels, the gradients of the resampling step are perfect:

But when at integer pixel values:

(Tthese are residual plots)

Ok, so, several steps from here:

[ ] : Double check on a clean example that the gradients of tfa_image.resampler fail when the warp is at integer values.
[ ] : If so, open an issue on tensorflow_addons, with a MWE reproducing the problem
[ ] : Then 2 options to solve the problem:
- Option 1: dig into the code of tfa, find why the gradient kernel is not workring, fix it
- Option 2: if we used even size images, it would work, BUT in that case we need to use resampling of the image in the FFT step so that we obtain the correct behaviour in the Fourier transform.

Please let me know which option you want to pursue @andrevitorelli, I won't be very available this week, but don't hesiitate to ask me questions if you have any questions on how to proceed :-)

EiffL commented 3 years ago

Made a small demo showing the diffference in gradients when resample points are at integer vs half integer positions: https://gist.github.com/EiffL/9f29a407b91c4de207803b1769189dfa It confirms that gradients are wonky if the sampling points are at integer values.

You can use something similar to create a minimal working example to open an issue on tensorflow_addons.

EiffL commented 3 years ago

sorry ^^' one last comment before I end up offline in the countryside. This is also very correlated to what @Dr-Zero was working on. We are going to replace the interpolation code in tfa anyways. Are we close to a working implementation? Because this would be good to mention as well in a TFA issue, that we will have a fix in the form of a revamp of the interpolation implementation in resampler.

andrevitorelli commented 3 years ago

@EiffL I'll talk to him during this week and work closely. I want to close this whole issue ASAP, preferably during this week, but surely, in a worst-case scenario, until the next one.

andrevitorelli commented 3 years ago

EiffL commented 2 years ago

How are we doing with a PR for TF addons ^^' ?

Dr-Zero commented 2 years ago

Bonjour Francois,

Not much progress, I'm afraid, I had minor surgery and have been recovering for the last two weeks. Here's something I want to discuss: Our code uses internally the SamplingKernelType enumeration from Tensorflow to communicate the selected kernel type. Vitorelli asked me to add a new quintic kernel. That is simple enough, except that since SamplingKernelType is part of the main Tensorflow code, I can't really add a new element to it, not without creating a path to Tensorflow itself. Now, the use of SamplingKernelType is purely internal, the addons package uses strings to identify the kernel type on its public APIs. As such, I believe we should drop the internal use of SamplingKernelType. It wouldn't change any public interface and we would be decoupled from Tensorflow on our Kernel selection.

Thiago.

On Sat, Nov 6, 2021 at 2:06 PM Francois Lanusse @.***> wrote:

How are we doing with a PR for TF addons ^^' ?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/CosmoStat/autometacal/issues/26#issuecomment-962480630, or unsubscribe https://github.com/notifications/unsubscribe-auth/ACAE3TUOVHRYJ65C5HEOVNTUKVOCZANCNFSM5A2YWQMA . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.

EiffL commented 2 years ago

Thanks for the update Thiago, and I hope you are recovering well! Sure, a quintic kernel would be good, but much more importantly we want to merge our modified code into Tensorflow add-ons. For one it will mean that it will be part of the default library that users pip install, they won't have to compile it on their own, and it also reduces the maintenance cost by a lot.

I took a look at your copy of the code, but it's not a fork of tensorflow/addons ? I'm not sure if you can open a PR if you didn't fork the repo from GitHub.

andrevitorelli commented 2 years ago

@Dr-Zero @EiffL I have set up a fork at my own account here: addons, and checked the diff results locally:

diff -qr github/addons/tensorflow_addons/ github/addons_drzero/tensorflow_addons/

Files github/addons/tensorflow_addons/custom_ops/image/BUILD and github/addons_drzero/tensorflow_addons/custom_ops/image/BUILD differ
Only in github/addons_drzero/tensorflow_addons/custom_ops/image/cc/kernels: register_kernel_factory.h
Files github/addons/tensorflow_addons/custom_ops/image/cc/kernels/resampler_ops.cc and github/addons_drzero/tensorflow_addons/custom_ops/image/cc/kernels/resampler_ops.cc differ
Files github/addons/tensorflow_addons/custom_ops/image/cc/kernels/resampler_ops_gpu.cu.cc and github/addons_drzero/tensorflow_addons/custom_ops/image/cc/kernels/resampler_ops_gpu.cu.cc differ
Files github/addons/tensorflow_addons/custom_ops/image/cc/kernels/resampler_ops.h and github/addons_drzero/tensorflow_addons/custom_ops/image/cc/kernels/resampler_ops.h differ
Only in github/addons_drzero/tensorflow_addons/custom_ops/image/cc/kernels: sampling_functions.h
Files github/addons/tensorflow_addons/custom_ops/image/cc/ops/resampler_ops.cc and github/addons_drzero/tensorflow_addons/custom_ops/image/cc/ops/resampler_ops.cc differ
Files github/addons/tensorflow_addons/image/dense_image_warp.py and github/addons_drzero/tensorflow_addons/image/dense_image_warp.py differ
Files github/addons/tensorflow_addons/image/resampler_ops.py and github/addons_drzero/tensorflow_addons/image/resampler_ops.py differ
Files github/addons/tensorflow_addons/image/tests/dense_image_warp_test.py and github/addons_drzero/tensorflow_addons/image/tests/dense_image_warp_test.py differ
Files github/addons/tensorflow_addons/layers/__init__.py and github/addons_drzero/tensorflow_addons/layers/__init__.py differ
Only in github/addons/tensorflow_addons/layers: max_unpooling_2d_v2.py
Only in github/addons/tensorflow_addons/layers/tests: max_unpooling_2d_v2_test.py
Files github/addons/tensorflow_addons/optimizers/lamb.py and github/addons_drzero/tensorflow_addons/optimizers/lamb.py differ
Files github/addons/tensorflow_addons/optimizers/tests/lamb_test.py and github/addons_drzero/tensorflow_addons/optimizers/tests/lamb_test.py differ
Files github/addons/tensorflow_addons/optimizers/tests/weight_decay_optimizers_test.py and github/addons_drzero/tensorflow_addons/optimizers/tests/weight_decay_optimizers_test.py differ
Files github/addons/tensorflow_addons/optimizers/utils.py and github/addons_drzero/tensorflow_addons/optimizers/utils.py differ
Files github/addons/tensorflow_addons/optimizers/weight_decay_optimizers.py and github/addons_drzero/tensorflow_addons/optimizers/weight_decay_optimizers.py differ
Files github/addons/tensorflow_addons/seq2seq/tests/decoder_test.py and github/addons_drzero/tensorflow_addons/seq2seq/tests/decoder_test.py differ
Files github/addons/tensorflow_addons/version.py and github/addons_drzero/tensorflow_addons/version.py differ

Some of them are updates from upstream, and others are our work. I'm parsing through to see which is which to restructure the repo. I'll use git --author to fix the authorship of the changes.

andrevitorelli commented 2 years ago

Interestingly, Mike Jarvis found what @Dr-Zero commented in one of our meetings: .../GalSim/.../src/Interpolant.cpp#L392

Quote:

        // However, it turns out that the second derivative continuity equations (marked * above)
        // are not actually satisfied.  I (MJ) tried to derive a version that does satisfy all
        // the constraints and discovered that the system is over-constrained.  If I keep the
        // second derivative constraints and drop F''''(j) = 0, I get the following:
(...)

        // This is Gary's original version with F''''(j) = 0, but f''(x) is not continuous at
        // x = 1,2,3.

This is not, however, the source of our troubles, as we only use the first derivative.

EiffL commented 2 years ago

Ok this is great! Here is what I recommend you do:

Step I: You create a patch with only the modifications added by Thiago. You can do that by diffing the last commit from Thiago's branch vs the last commit not made by Thiago, which appears to be this one: https://github.com/Dr-Zero/addons/commit/f86b502746d303900061ba33b58135f5730988e6
Step II: On a clean fork of tf addons, up to date with upstream, you apply the patch you have created

This way you won't have to deal with differences that are due to new modifs in tf addons.

EiffL commented 2 years ago

....or maybe easier, you just update your local clone of Thiago's repo to make it up to date with upstream.

andrevitorelli commented 2 years ago

....or maybe easier, you just update your local clone of Thiago's repo to make it up to date with upstream.

Ok, I think this worked, and kept @Dr-Zero's signed edits with it. Can you check, @EiffL , here: addons/new_kernels ?

andrevitorelli commented 2 years ago

@Dr-Zero we need to come up with a PR for the addons repo ASAP for the code to be useful.

andrevitorelli commented 1 year ago

Just to inform that the feature request was posted as an issue in the tfa repo here: addons#2779. They ask for people to do a feature request before submitting new features.

CosmoStat / autometacal

Finite differences converges to different value than automatic differentiation. #26