breandan / kotlingrad

🧩 Shape-Safe Symbolic Differentiation with Algebraic Data Types
https://breandan.net/public/masters_thesis.pdf#page=49
Apache License 2.0
520 stars 21 forks source link

add SICMUtils to autodiff framework comparison #19

Closed sritchie closed 3 years ago

sritchie commented 3 years ago

Hey all,

I love the work you're doing! I studied the codebase extensively to understand reverse mode AD and how to implement it in a functional way... so thank you for that.

I wanted to add the https://github.com/sicmutils/sicmutils library as a comparison, as it's a JVM library doing similar stuff (multi-stage programming, differentiable programming, higher-order autodiff... autodiff on higher order functions, even!)

Hopefully this is welcome. Thanks again for your wonderful work!

Cheers, Sam

breandan commented 3 years ago

Gee Sam, thanks so much for your positive feedback! As you may have noticed, we drew a lot of inspiration from the LISP community, so it is really I who should be thanking you for carrying on that tradition! Our implementation is heavily inspired by the SICP implementation and Stalin∇, which is a much better reference in the textbook presentation. Although adhering more closely to the symbolic approach, we claim Kotlin∇ can be viewed as either forward or reverse AD depending on evaluation order. Others have written a more nuanced comparison on the topic.

And thanks for your contribution, I will be taking a closer look at SICMUtils to see what we can glean from it!

sritchie commented 3 years ago

@breandan those two papers look wonderful. I'm pre-sold on the idea that symbolic is not a totally different animal at all; an interesting idea (and probably this has been done) would be to make a "symbolic expression" type that accumulated a dictionary of symbol => repeated expression as computation proceeded. Then autodiff on symbols (ie, symbolic differentiation) wouldn't cause any expression bloat.

I'll be the only NEW idea that sicmutils might send your way is is how to differentiate functions that return other functions. I spent a bunch of time in January going back and forth with Sussman to understand this case and how it might go wrong — here's the PDF we produced in case you're interested.

autodiff.pdf

The trick to make it work is a little buried in this literate-programming-style writeup of the SICMUtils implementation of AD, but I'll drop that too: https://samritchie.io/dual-numbers-and-automatic-differentiation/

Here's the specific spot where we introduce the tricky implementation of extracting a tangent from a function return value: https://github.com/sicmutils/sicmutils/blob/master/src/sicmutils/calculus/derivative.cljc#L37

And here are more long form descriptions of test cases and how they break without the fix: https://github.com/sicmutils/sicmutils/blob/master/test/sicmutils/calculus/derivative_test.cljc#L965

A little light reading :)

Cheers, and thanks for the merge!

breandan commented 3 years ago

Thank you for the outstanding references! Just skimmed them so far, but what I've seen looks intriguing and now I'm even more motivated to read about SICMUtils. Your numeric tower looks like a more ambitious version of what we originally planned. A few people wrote to dissuade us from using this pattern for reasons yet unclear, although I remain convinced it is the right approach. I would enjoy reading a more detailed discussion on the design tradeoffs if you get a chance to write about it.

I'm glad you mentioned AD on higher-order functions and will definitely look into your implementation. It's something I've been trying to wrap my head around and I think there could be some interesting applications vis-à-vis program synthesis. The original motivation behind the numeric tower was that we could explore AD on other algebraic structures like trees and graphs. There are some algorithms that look an awful lot like symbolic differentiation for language parsing and I'm keen to explore that direction further.

As we discussed a bit in this thread, I am certain there are more clever things we could be doing to avoid the issue of expression swell that Baydin et al. and others have raised. Your intuition of using a dictionary to record previously visited expressions seems on the right track. I wonder if there is a hash algorithm that preserves algebraic invariants, e.g. for commutative operators you want a tree kernel that is permutation invariant so the expressions (+ x (+ y z)) and (+ y (+ x z)) both hash to the same dictionary key.

I saw you are working on expression simplification and will be following that thread closely. Would also be interested in reading a more detailed discussion about equational reasoning after you've had more time to think about it. Not to discourage you from rolling your own, although there are a number of mature computer algebra systems in case you don't feel like diving deep into term rewriting. In particular, I've heard good things about FriCAS.

A bit of a tangent, but I recently discovered Radul and Sussman's work on propagators and found it very inspiring, although they seem to have missed a lot of good theory on semirings that raises important connections between error and belief propagation on computation graphs. If you get a chance to ask them about it, I would be very curious to hear their thoughts on the connection between propagator networks and algebraic path problems.

Not sure if it's something you might be interested in, but we're running a calculus reading group at our lab this Spring. If you would like to discuss your work and get some feedback, we have few openings and would love to hear about SICMUtils sometime. Also happy to add you to the mailing list if you feel like dropping by - there are a few others who have worked on symbolic differentiation and have many good ideas to share.

sritchie commented 3 years ago

@breandan, I know that I missed the invitation for the spring, but I,m mostly back from summer adventures and slowly spinning up on digital life again. I'd love to chat about some of these threads, especially simplification now that I'm done with that big push. Chatting about higher order AD is always a pleasure too, as this whole thing is weird and wild and still, I think, not explored terribly well.

I'm at sritchie09@gmail.com, maybe we should move to email. But I'd love to chat at some point during the week in the upcoming weeks if you're available!

breandan commented 3 years ago

Hi Sam, it's great to hear from you -- thanks for the reminder! I am still curious to hear about work on sketching algorithms and anxious to learn more about SICMUtils. And sure, happy to continue this discussion by email.