byuflowlab / ImplicitAD.jl

Automates adjoints. Forward and reverse mode algorithmic differentiation around implicit functions (not propagating AD through), as well as custom rules to allow for mixed-mode AD or calling external (non-AD compatible) functions within an AD chain.
MIT License
25 stars 6 forks source link

Differences with ImplicitDifferentiation.jl? #4

Closed gdalle closed 1 year ago

gdalle commented 2 years ago

Hey there, and congrats on the package! Could we take some time to reflect on the differences between your work and https://github.com/gdalle/ImplicitDifferentiation.jl, which I recently developed? I feel like they have similar goals, and maybe we could work together to avoid duplicates?

andrewning commented 2 years ago

Would be happy to. Your package looks great! Not trying to create duplicates, just needed something for our lab needs where we run into this scenario quite a bit. I think the main difference is that we use ForwardDiff a lot so this package focuses more on that. Whereas I think yours targets AD packages that are ChainRules compatible (which would be useful to us down the road). Looks like yours also has support for lazy operators which is nice. I also added some functionality for adding custom rules - mainly to support one of our collaborators who needs to call some python code for a sub function - and would be using finite differencing which we'd inject back into the AD chain. Doesn't really have anything to do with implicit differentiation, but reused some of the functionality. Definitely open to working together.

gdalle commented 2 years ago

Adding ForwardDiff compatibility is definitely among our short term goals, perhaps with the help of https://github.com/ThummeTo/ForwardDiffChainRules.jl. It would also be interesting to discuss the special needs of your lab, cause that might enlighten us about some user expectations that we might have missed :)

taylormcd commented 2 years ago

After looking at the current state of both packages I think the primary differences are:

Overall, it seems like ImplicitDifferentiation is designed to be efficient for very large implicit systems, while the default settings for ImplicitAD are more appropriate for smaller systems of equations. That being said, with the right arguments, ImplicitAD can be extended to handle large systems of equations efficiently as well. I believe it is even possible to adopt a theoretically identical approach as ImplicitDifferentiation if the right inputs are provided. ImplicitAD therefore appears to be the more generic of the two packages at the moment, though whether the interface provided by ImplicitDifferentiation or ImplicitAD is better is debatable.

andrewning commented 2 years ago

@taylormcd I thought you could use non-iterative linear solvers with LinearOperator.jl. I haven't actually tried either package so not totally sure. Here is another package: https://julianonconvex.github.io/Nonconvex.jl/stable/gradients/implicit/ Looks pretty similar. Not sure what all the differences are. The reality is implicit differentiation is relatively straightforward, so not surprising to find it in a few places, and any one of these three (perhaps there are also others?) could be brought to feature-parity pretty quickly. Though it's not necessarily a bad thing to have multiple packages with different emphases/approaches, at least until things mature more. I wouldn't be surprised if future AD packages bake in equivalent functionality.

taylormcd commented 2 years ago

I agree that any one of the three could be brought up to feature parity, I just wanted to present a general overview of the current status of the two packages. With regard to the use of LinearOperator.jl, you have to materialize a matrix in order to factorize it and do a non-iterative linear solve. A matrix multiplication linear operator therefore only works for iterative linear solvers.

The implementation in Nonconvex.jl (which ImplicitDifferentiation.jl appears to be based on) seems to be pretty well put together. ForwardDiff support should be possible using the ForwardDiff_frule macro defined in the same package. ReverseDiff support should be possible using the ReverseDiff.@grad_from_chainrules macro. Considering these capabilities, I think the only features in this package not provided by Nonconvex.jl is the functionality provided by the implicit_linear and provide_rule functions.

taylormcd commented 2 years ago

Actually it seems like a frule hasn't been defined in Nonconvex so that would need to be implemented before ForwardDiff support is added.

taylormcd commented 2 years ago

It also seems like ReverseDiff.@grad_from_chainrules doesn't work on the implementation in NonconvexUtils.jl either.

gdalle commented 2 years ago

After looking at the current state of both packages I think the primary differences are:

Thank you for the careful review!

ImplicitAD supports using arbitrary linear solvers with user-defined jacobians

One of our projects is to add an option whereby the forward solver actually returns the Jacobian in addition to the solution, in order to save one call to AD

ImplicitDifferentiation supports only iterative linear solvers, since it doesn't materialize the jacobian.

That's completely correct, and it's one of our main design decisions (which also makes the implementation slightly nightmarish)

Overall, it seems like ImplicitDifferentiation is designed to be efficient for very large implicit systems, while the default settings for ImplicitAD are more appropriate for smaller systems of equations.

Sounds like a good summary, I'll add a link to your package in our docs :)

andrewning commented 2 years ago

This package also works with iterative solvers (someone else in our lab is using ImplicitAD this way). It’s just not the default. Have to make use of the keyword arguments.

andrewning commented 1 year ago

Getting back to working on this package...I'll add a summary/link to your package later today.

mohamed82008 commented 1 year ago

Hi! Main developer of Nonconvex.jl and contributor to ImplicitDifferentiation.jl here. I just found this package on JuliaHub and saw this discussion. Cool package!

To give a bit of history, Nonconvex.jl probably has the oldest implementation of generic implicit AD in Julia (https://discourse.julialang.org/t/ann-differentiable-implicit-functions-in-julia-optimisation-nonlinear-solves-and-fixed-point-iterations/76016). Specific implicit functions had AD rules defined in SciML and other repos before Nonconvex.jl but these were not doing generic implicit AD.

ImplicitDifferentiation (ID) is @gdalle's work which was initially loosely based on the Nonconvex.jl implementation with the goal of being better designed, tested and documented. We collaborate on this project although he deserves most of the credit. I think ID 0.5 now has a wide feature coverage including many of the features highlighted above which were missing a few months ago. It might be worth re-examining if we can join forces and figure out better and faster package designs that work for everyone.

andrewning commented 1 year ago

Thanks for reaching out, and great to hear of the continued progress! I agree with your assessment on the discourse thread, at least we've found that approach quite useful.

To update from our end, we've mostly been working on approaches to alleviate memory issues for long time sequences (e.g., long loops, ODEs). We've added some functionality that really sped up some of the problems we've been working on.

Would be happy to collaborate in areas where we can. We have a couple grants tied to ongoing/future work related to this package.