Closed gdalle closed 1 year ago
Would be happy to. Your package looks great! Not trying to create duplicates, just needed something for our lab needs where we run into this scenario quite a bit. I think the main difference is that we use ForwardDiff a lot so this package focuses more on that. Whereas I think yours targets AD packages that are ChainRules compatible (which would be useful to us down the road). Looks like yours also has support for lazy operators which is nice. I also added some functionality for adding custom rules - mainly to support one of our collaborators who needs to call some python code for a sub function - and would be using finite differencing which we'd inject back into the AD chain. Doesn't really have anything to do with implicit differentiation, but reused some of the functionality. Definitely open to working together.
Adding ForwardDiff compatibility is definitely among our short term goals, perhaps with the help of https://github.com/ThummeTo/ForwardDiffChainRules.jl. It would also be interesting to discuss the special needs of your lab, cause that might enlighten us about some user expectations that we might have missed :)
After looking at the current state of both packages I think the primary differences are:
Overall, it seems like ImplicitDifferentiation is designed to be efficient for very large implicit systems, while the default settings for ImplicitAD are more appropriate for smaller systems of equations. That being said, with the right arguments, ImplicitAD can be extended to handle large systems of equations efficiently as well. I believe it is even possible to adopt a theoretically identical approach as ImplicitDifferentiation if the right inputs are provided. ImplicitAD therefore appears to be the more generic of the two packages at the moment, though whether the interface provided by ImplicitDifferentiation or ImplicitAD is better is debatable.
@taylormcd I thought you could use non-iterative linear solvers with LinearOperator.jl. I haven't actually tried either package so not totally sure. Here is another package: https://julianonconvex.github.io/Nonconvex.jl/stable/gradients/implicit/ Looks pretty similar. Not sure what all the differences are. The reality is implicit differentiation is relatively straightforward, so not surprising to find it in a few places, and any one of these three (perhaps there are also others?) could be brought to feature-parity pretty quickly. Though it's not necessarily a bad thing to have multiple packages with different emphases/approaches, at least until things mature more. I wouldn't be surprised if future AD packages bake in equivalent functionality.
I agree that any one of the three could be brought up to feature parity, I just wanted to present a general overview of the current status of the two packages. With regard to the use of LinearOperator.jl, you have to materialize a matrix in order to factorize it and do a non-iterative linear solve. A matrix multiplication linear operator therefore only works for iterative linear solvers.
The implementation in Nonconvex.jl (which ImplicitDifferentiation.jl appears to be based on) seems to be pretty well put together. ForwardDiff support should be possible using the ForwardDiff_frule
macro defined in the same package. ReverseDiff support should be possible using the ReverseDiff.@grad_from_chainrules
macro. Considering these capabilities, I think the only features in this package not provided by Nonconvex.jl is the functionality provided by the implicit_linear
and provide_rule
functions.
Actually it seems like a frule hasn't been defined in Nonconvex so that would need to be implemented before ForwardDiff support is added.
It also seems like ReverseDiff.@grad_from_chainrules
doesn't work on the implementation in NonconvexUtils.jl either.
After looking at the current state of both packages I think the primary differences are:
Thank you for the careful review!
ImplicitAD supports using arbitrary linear solvers with user-defined jacobians
One of our projects is to add an option whereby the forward solver actually returns the Jacobian in addition to the solution, in order to save one call to AD
ImplicitDifferentiation supports only iterative linear solvers, since it doesn't materialize the jacobian.
That's completely correct, and it's one of our main design decisions (which also makes the implementation slightly nightmarish)
Overall, it seems like ImplicitDifferentiation is designed to be efficient for very large implicit systems, while the default settings for ImplicitAD are more appropriate for smaller systems of equations.
Sounds like a good summary, I'll add a link to your package in our docs :)
This package also works with iterative solvers (someone else in our lab is using ImplicitAD this way). It’s just not the default. Have to make use of the keyword arguments.
Getting back to working on this package...I'll add a summary/link to your package later today.
Hi! Main developer of Nonconvex.jl and contributor to ImplicitDifferentiation.jl here. I just found this package on JuliaHub and saw this discussion. Cool package!
To give a bit of history, Nonconvex.jl probably has the oldest implementation of generic implicit AD in Julia (https://discourse.julialang.org/t/ann-differentiable-implicit-functions-in-julia-optimisation-nonlinear-solves-and-fixed-point-iterations/76016). Specific implicit functions had AD rules defined in SciML and other repos before Nonconvex.jl but these were not doing generic implicit AD.
ImplicitDifferentiation (ID) is @gdalle's work which was initially loosely based on the Nonconvex.jl implementation with the goal of being better designed, tested and documented. We collaborate on this project although he deserves most of the credit. I think ID 0.5 now has a wide feature coverage including many of the features highlighted above which were missing a few months ago. It might be worth re-examining if we can join forces and figure out better and faster package designs that work for everyone.
Thanks for reaching out, and great to hear of the continued progress! I agree with your assessment on the discourse thread, at least we've found that approach quite useful.
To update from our end, we've mostly been working on approaches to alleviate memory issues for long time sequences (e.g., long loops, ODEs). We've added some functionality that really sped up some of the problems we've been working on.
Would be happy to collaborate in areas where we can. We have a couple grants tied to ongoing/future work related to this package.
Hey there, and congrats on the package! Could we take some time to reflect on the differences between your work and https://github.com/gdalle/ImplicitDifferentiation.jl, which I recently developed? I feel like they have similar goals, and maybe we could work together to avoid duplicates?