Closed devmotion closed 3 years ago
what's the preferred way in ChainRules to deal with dependencies that are only needed for computing the derivatives?
Good question! I don't think we have a good answer to this right now.
I think it'd be good to avoid this question arising for a bit longer, by not having the rules for SpecialFunctions.jl defined in ChainRules.jl, and instead have them defined in SpecialFunctions.jl (using ChainRulesCore.jl). Then it would be a question for SpecialFunctions.jl.
But that doesn't answer your question given the status quo. Maybe @oxinabox has something in mind?
Currently, we load rules for SpecialFunctions via Requires.jl's @require
, but I don't know if Requires has any magic that can also help us have extra dependencies conditional on SpecialFunctions? (I don't think so)
one should use the
@thunk
macro to avoid these rather lengthy calculations if one is not interested in the derivatives with respect to the order?
Yep, that's right. There's more details in the Thunk
docs and manual on writing rules, and a few examples of using thunks in other rules e.g. in some LinearAlgebra rules
Currently, we load rules for SpecialFunctions via Requires.jl's @require, but I don't know if Requires has any magic that can also help us have extra dependencies conditional on SpecialFunctions? (I don't think so)
Requires supports multiple requirements. However, I guess it might be problematic to override an existing default rule that just returns NaN
(which would be the default if HypergeometricFunctions hasn't been loaded) with an extended version as soon as HypergeometricFunction is loaded - can ChainRules (and in particular AD backends such as Zygote) handle updates of rules?
However, ideally I guess SpecialFunctions would have to depend on HypergeometricFunctions. It might be quite unintuitive and surprising for a user if the custom adjoints change after loading some package - and it might be quite tricky for a user to figure out that she has to load HypergeometricFunctions if she wants to compute the derivatives with respect to the order (although that could probably be resolved by a good documentation to some extent at least).
hmm. I think the way forward here might be to just move the SpecialFunctions rules to SpecialFunctions.jl (then that package considers the merits of how best to implement derivatives e.g. whether or not to depend on HypergeometricFunctions). Now that ChainRules has users (via Zygote and ForwardDiff2), it might be a good time to raise that conversation over on the SpecialFunctions repo?
I made a PR to the SpecialFunctions repo: https://github.com/JuliaMath/SpecialFunctions.jl/pull/238
Should be transferred to SpecialFunctions (#319).
Sadely can't do a tranfer across orgs
In https://www.tandfonline.com/doi/pdf/10.1080/10652469.2016.1164156, closed-form expressions were derived for the derivatives of Bessel functions with respect to the order. Currently these are not implemented (they are defined as
NaN
). These derivatives are useful, e.g., when working with the Matern kernel (see https://github.com/JuliaGaussianProcesses/KernelFunctions.jl/issues/116#issuecomment-644606541).It feels a bit problematic that these expressions involve hypergeometric functions and hence probably would introduce a dependency on HypergeometricFunctions.jl. Before putting together any PRs, I would like to know what's the preferred way in ChainRules to deal with such dependencies that are only needed for computing the derivatives? I assume as well that maybe one should use the
@thunk
macro to avoid these rather lengthy calculations if one is not interested in the derivatives with respect to the order?