JuliaGaussianProcesses / KernelFunctions.jl

Julia package for kernel functions for machine learning
https://juliagaussianprocesses.github.io/KernelFunctions.jl/stable/
MIT License
266 stars 32 forks source link

Remove `_get_ν` #452

Closed devmotion closed 2 years ago

devmotion commented 2 years ago

Fix the bug introduced in #425 (see https://github.com/JuliaGaussianProcesses/KernelFunctions.jl/issues/450).

Removes the workaround.

theogf commented 2 years ago

Is it possible to test this?

codecov[bot] commented 2 years ago

Codecov Report

Merging #452 (4c45fe8) into master (35de8d2) will decrease coverage by 0.01%. The diff coverage is 100.00%.

@@            Coverage Diff             @@
##           master     #452      +/-   ##
==========================================
- Coverage   93.18%   93.16%   -0.02%     
==========================================
  Files          52       52              
  Lines        1261     1259       -2     
==========================================
- Hits         1175     1173       -2     
  Misses         86       86              
Impacted Files Coverage Δ
src/TestUtils.jl 94.11% <100.00%> (ø)
src/basekernels/matern.jl 100.00% <100.00%> (ø)

Continue to review full report at Codecov.

Legend - Click here to learn more Δ = absolute <relative> (impact), ø = not affected, ? = missing data Powered by Codecov. Last update 35de8d2...4c45fe8. Read the comment docs.

devmotion commented 2 years ago

The rule was wrong, so it seems it's not covered by the tests? Why was this workaround introduced?

devmotion commented 2 years ago

It seems in #425 the commented out test_ADs(x->MaternKernel(nu=first(x)),[ν]) was replaced with test_ADs(() -> MaternKernel(; nu=ν)). Was this intended? And could this explain why the workaround is not tested?

st-- commented 2 years ago

It seems in #425 the commented out test_ADs(x->MaternKernel(nu=first(x)),[ν]) was replaced with test_ADs(() -> MaternKernel(; nu=ν)). Was this intended? And could this explain why the workaround is not tested?

Yes, that was intended, as discussed on that PR (https://github.com/JuliaGaussianProcesses/KernelFunctions.jl/pull/425#issuecomment-1012036096), where @theogf wrote

I am proposing that we just ban differentiation through \nu and stop testing for it.

And while the change

-test_ADs(x->MaternKernel(nu=first(x)),[ν])
+test_ADs(() -> MaternKernel(; nu=ν))

worked for Forward- and ReverseDiff, the additional workaround in #425 was required for Zygote to also pass the AD test (w.r.t. the input features, not the kernel hyperparameters!). So the workaround is tested by virtue of the Zygote AD test on MaternKernel now also passing.

devmotion commented 2 years ago

So the workaround is tested by virtue of the Zygote AD test on MaternKernel now also passing.

The main problem is that tests passed successfully for https://github.com/JuliaGaussianProcesses/KernelFunctions.jl/pull/452/commits/08437ccc44fa134b86eb447f59d0bdfe651d6acd even though the pullback is clearly incorrect. So I was wondering if the pullback was evaluated at all.

st-- commented 2 years ago

The main problem is that tests passed successfully for 08437cc even though the pullback is clearly incorrect.

What was incorrect about the pullback?

So I was wondering if the pullback was evaluated at all.

It was, I only added it because without Zygote errored, see https://github.com/JuliaGaussianProcesses/KernelFunctions.jl/pull/452#discussion_r849550385

devmotion commented 2 years ago

What was incorrect about the pullback?

It returned the wrong number of outputs (was missing NoTangent() derivative for the function itself).

Without this workaround, the Zygote AD test fails. You can see this in 30385ac and the corresponding build

The problem is just that https://github.com/FluxML/Zygote.jl/blob/95e10218f0187a839ac0539fefcbf39c727bf068/src/lib/array.jl#L30 does not support NotImplemented. This should have been fixed in Zygote, introducing workarounds in KernelFunctions is the wrong approach.

st-- commented 2 years ago

This should have been fixed in Zygote, introducing workarounds in KernelFunctions is the wrong approach.

Yeah, I completely agree, but unfortunately it hasn't been fixed in Zygote, and I think "workaround + AD is working" is still preferable over "no workaround + AD is broken".

devmotion commented 2 years ago

Instead of making a PR to KernelFunctions introducing a workaround you could have made one for Zygote that fixes the issue :thinking:

st-- commented 2 years ago

Instead of making a PR to KernelFunctions introducing a workaround you could have made one for Zygote that fixes the issue :thinking:

I managed to figure out how to fix the issue with a workaround in KernelFunctions. That doesn't mean I would immediately know how to identify & fix the underlying issue upstream. If I did know how to fix it there, I would have, as I have in various other cases. You seem to have an amazing overview as well as deep understanding of a lot of different packages. The ability and time for that is unfortunately not something that everyone has available in the same amount. I'm assuming you're not intending to, but you give the impression of being rather disparaging of other people who're not thinking the same way you are.

devmotion commented 2 years ago

IMO usually it is very helpful to open at least Github issues in cases where it is not immediately clear how to fix the issue. Workarounds can be acceptable if they are fixed upstream eventually, but I assume it is unlikely that they are fixed if nobody makes the Zygote developers aware of it.

Generally, I would have given you this advice in your PR but as desired I don't review your PRs anymore.

st-- commented 2 years ago

@devmotion I've pared it down to a minimal reproducible example: https://github.com/FluxML/Zygote.jl/issues/1204 but I can't comment on anything else, if you've got further thoughts on what the issue is / what's needed to fix it, please comment there.

devmotion commented 2 years ago

Test errors seem unrelated? I didn't change anything related to them, I think.

willtebbutt commented 2 years ago

Test errors seem unrelated? I didn't change anything related to them, I think.

Agreed. Maybe we need to relax a tolerance somewhere?

st-- commented 2 years ago

Test errors seem unrelated? I didn't change anything related to them, I think.

Agreed. Maybe we need to relax a tolerance somewhere?

Yes, I think it's due to the changes in #447; the tests are stochastic (if you just re-run the failed tests, they might pass the next time - I've seen that in #454 as well which has zero code changes). In #447 I reduced the size of the kernel matrix evaluations; the isapprox() call (set with atol=1e-9, rtol=1e-9) is on the arrays as a whole, which does not look at individual elements, but instead checks whether the norm of the difference is within the tolerances, relative to the max of the norm of the individual matrices (see discussion). For the random inputs we're using in the tests, the norm of the kernel matrix is linear in the dimensions. So by reducing the size from ~1000 to ~10 we're reducing the norm by a factor ~100, so yes that suggests to me we should increase the rtol correspondingly.

Alternatively, we could change it to element-wise isapprox comparison? then it's independent of the matrix norm.

devmotion commented 2 years ago

Shouldn't the scale be irrelevant if we use rtol? I think it's problematic that we don't use rtol in most of the tests but only in https://github.com/JuliaGaussianProcesses/KernelFunctions.jl/blob/8e805ef25d745ab391e4ad424e690507e02cd042/src/test_utils.jl#L75-L76. Probably because otherwise tests failed - which IMO, however, would not indicate a problem with rtol per se but that the default already was already too low even before the recent changes. I completely agree with https://discourse.julialang.org/t/unit-test-equality-for-arrays/8138/12?u=devmotion that in general you want rtol since it is completely unclear what a meaningful atol is in general. Specifying a small atol as well mainly helps with comparisons that involve exact 0s which would fail otherwise.

st-- commented 2 years ago

I think it's problematic that we don't use rtol in most of the tests

Yes, I agree. Probably an oversight in the original test code of not realizing that by only setting atol you end up setting rtol=0...

devmotion commented 2 years ago

Apparently https://github.com/JuliaGaussianProcesses/KernelFunctions.jl/pull/455 was not sufficient.

devmotion commented 2 years ago

Bump.

devmotion commented 2 years ago

Bump.

theogf commented 2 years ago

Could we keep the global atol and rtol variables?

devmotion commented 2 years ago

I don't mind. It just seemed unnecessary to have global variables that are used only in a single place, and clearer (ie easier to spot) if the values are just used as default keyword arguments in the function definition.

devmotion commented 2 years ago

@theogf Did you see my last comment? Would you like it to be reverted?

theogf commented 2 years ago

I think keeping the global const is a bit more elegant and easily findable since it's in the same file. But again I do not mind, I'll just accept the PR and do what you want 😅