chain rule example? - Githubissues

danbri commented 6 years ago

Do you have any examples of using the chain rule, for (automatic) differentiation?

danbri commented 6 years ago

Context - I saw your note in https://forums.tigsource.com/index.php?topic=62379.0 btw about possible other examples - is that I was wondering whether this library could be interesting to revisit some of the old papers about GA / Clifford Algebra and Neural Networks, now that they're back in fashion (e.g. see playground.tensorflow.org).

Examples:

A Theory of Neural Computation with Clifford Algebras
Back Propagation in a Clifford Algebra (1992)
Geometric Computing with Clifford Algebras (book chapter touching on similar ideas)
A new Selforganizing Neural Network using Geometric Algebra, Eduardo Bayro-Corrochano, Sven Buchholz, Gerald Sommer (1996); Geometric Neural Networks, Eduardo Bayro-Corrochano and Sven Buchholz.
Models of Recurrent Clifford Neural Networks and Their Dynamics, Yasuaki Kuroe (2011).
section 3 of Applications of Clifford’s Geometric Algebra, Eckhard Hitzer, Tohru Nitta and Yasuaki Kuroe; possibly also Hyperbolic Gradient Operator and Hyperbolic Back-Propagation Learning Algorithms by Tohru Nitta, Yasuaki Kuroe (2017); Models of Hopfield-Type Clifford Neural Networks and Their Energy Functions - Hyperbolic and Dual Valued Networks by Yasuaki Kuroe, Shinpei Tanigawa, Hitoshi Iima (2011).
GEOMETRIC ALGEBRA APPROACH TO SOME PROBLEMS OF ROBOT VISION

(edited to add last link, 17-05-2018)

enkimute commented 6 years ago

Hi Dan,

I still have a big todo list of examples that I hope to be adding in a reasonable timeframe. A simple setup demonstrating the idea of back propagation for multivectors in neural networks is on that list.

In general, the chain rule is implied by the use of a degenerate metric (which is still frowned upon and ignored by a large part of the GA community - and few libraries support it). For simple functions its quite easy to see how that works by looking at their Taylor expansion and realizing that everything past the first derivative vanishes.

An example paste-able in the coffeeshop :

Algebra(0,0,1,()=>{

  // Assume two functions
  // f=x^2 and g=(x+1)*x
  var f = (x)=>x*x;
  var g = (x)=>(x+1)*x;

  // Calculate the derivative of f(g(x)) 
  // Using the classic chain rule that gives :

  // (f(g(x)))' = g'(x)  * f'(g(x)) 
  //            = (2x+1) * (2*(x+1)*x)
  //            = (2x+1) * (2x^2 + 2x)
  //            = 4x^3 + 6x^2 + 2x
  // for x=2 that's 4*8+6*4+4 = 60

  // But AD avoids all that work. The chain rule is automatic,
  // and we can simply evaluate the function to find the same
  // value :

  var result = f(g(2+1e0));

  document.write('<BR> g(f(2)) = '+ result);  // prints 36 + 60e0 

  // In the case of Dual numbers (the easiest case) it is easy
  // to see the inner workings of this idea by considering the
  // Taylor expansion of a primitive function and realizing
  // that the degenrate metric causes all coefficients higher than
  // x^2 to go to zero thus leaving just the first derivative.
});

Thanks for the links, I'm sure they'll grow my todo list even more :)

enkimute commented 6 years ago

Ok, Let me know if that closes this issue for you and if there's any other examples you'd like to see covered - please do not hesitate.

danbri commented 6 years ago

Thanks for the quick and comprehensive response. I think you answered my question and I have much still to study on the maths side. It would a shame if your notes got lost for other potential readers, even if this is a niche interest - could you link this issue from somewhere, since closed issues can be hard to find.

enkimute commented 6 years ago

Point taken - I'll extend the Dual Number AD example with some explicit notes on this - I've tried to be minimal in all these examples and I'm sure I've overdone that on more than one. (I'll close this issue once I've done that).

Also, if you're interested in AD, my advice would be to start your journey by looking up some examples on the Dual Number case, its one dimensional, and easy to understand - there's some good info on the web. (and its really cool)

Once you see the crucial role of the degenerate metric (i.e. non-real-numbers-that-square-to-zero) for this application, look at the work of Charles Gunn to move up to the vector case and partial derivatives. (Even tho Gunn's work does not mention AD, it is a rigorous treatment of the degenerate metric in the context of Clifford Algebras). That should make it just as easy to calculate partial derivatives of multivariate functions (as in the ganja.js PGA3D differentiation example).

For higher order derivatives, I'm not aware of any good info, but once you are at that point the example I have with the custom Cayley table should be self-explanatory.

Thanks for opening this issue - don't hesitate to do that again if you have any further questions. Consider it a group effort to help get Clifford/Geometric Algebra the attention it deserves.

Also, fair warning, if you dive into GA, its so elegant, unifying and breath-taking beautiful ... take care to keep shaving and avoid calling it 'my precious' .. ;)

danbri commented 6 years ago

:) I can see the appeal of GA even while my underlying maths is too wobbly to truly appreciate the beauty. And there's the attraction that comes with supporting the underdog too, which I also felt for neural networks back when I first encountered them.

Is http://page.math.tu-berlin.de/~gunn/Documents/Papers/comparison.pdf the best Gunn paper?

I'll take a look at the Dual Number case... (edit: maybe this paper ?)

enkimute commented 6 years ago

That paper focuses on the Euclidean case (which is of the most practical importance and builds on the degenerate metric). If you prefer a more complete albeit more mathematical perspective, his thesis includes all three cases (euclidean, as well as positive and negative curvature).

GA is a bit of the underdog in the math community, and the degenerate metric is a bit of the underdog in GA .. right up your alley ! :)

danbri commented 6 years ago

Sharing another related bookmark here, https://pdfs.semanticscholar.org/5c1f/8c8755394af5e5d94d557392258d04a6bce5.pdf

"A simple setup demonstrating the idea of back propagation for multivectors in neural networks is on that list." ... where does that figure in your 2018 plans? :)

enkimute commented 6 years ago

Hi Dan,

I'm currently working on finishing some examples in physics (uploaded the first simple one, 2-body problem, calc/viz of moon orbital length yesterday). Have a bunch more of those, dealing with simulation and estimation problems in a GA context. (With some extra core ganja features like upgrading arrays to full algebra elements - which allows you to use the same solver code (e.g. RK4) as you would for scalars).

The estimation examples will include finding isometries (pointset A to B, in over-determined case), fitting circles/spheres in the conformal space, reducing motion paths and some motion capture fun.

Next on my list is a demonstration of how GA spaces interact with other techniques like using incidence matrices in R(1,0,0) for the obligatory 'dating-site' example, and then using dual numbers R(0,0,1) in the same context (which is the 'in' to demonstrating back-propagation).

I'm doing my best to keep these examples short and to the point - and extending the core features as I go. (and specifically, will need some more support for matrices with GA elements before I can effectively demonstrate some of these features.).

But you're right - no slacking - I'll do my best to speed things up !

Cheers

danbri commented 6 years ago

Trying to get my head around the Dual numbers connection, I found https://blog.demofox.org/2014/12/30/dual-numbers-automatic-differentiation/ and https://web.archive.org/web/20161018163407/http://www.duaeliststudios.com/automatic-differentiation-with-dual-numbers/ ... hopefully that's a reasonable trail to follow.

enkimute commented 6 years ago

Hi Dan,

Those seem fine - to really get the GA perspective keep in the back of your head that ε (with ε²=0) is just one of the three 'new number types' that are introduced by GA. (the others being i, and j (with i²=-1 and j²=1).

Algebraically, they help you solve quadratic equations (i), do automatic differentiation (ε), and provide in an idempotent (j).

Geometrically, they turn multiplication into rotation (i), translation (ε) and reflection (j).

Once you have a clear view on those its really just mix-and-match to create a numbering system that captures the operations and types your application needs. (you can only 'pick' the metric for the basis generators (e₁, e₂, e₃, ..), those of the derived elements that complete your basis (e₁₂, ..) are implied.

Combine that knowledge with the fact that the even subalgebra's are closed under the geometric product. (i.e. multiplying x with a scalar/bivector combination does not change the grade of x), and you find that a smart selected metric gives you exactly those bivectors you need to cover the special orthogonal group (SE2 or SE3). (in fact every combination of metrics you can think of gives you interesting algebraic spaces that are all going to be particularly well suited for some problems, as I hope the examples in the coffeeshop demonstrate).

Depending on your own mathematical background, it can be interesting to also investigate the matrix isomorphy that exists for all Clifford Algebra's. (specifically, the symmetries that show up are telling, e.g. j elements create symmetric entries, while i elements create skew symmetric entries, etc .. )

To see the matrix form you can use ganja.js :

 Algebra(1,0,0).describe();  // prints to the console ..  (j = double numbers)
 Algebra(0,1,0).describe();  // prints to the console ..  (i = complex numbers)
 Algebra(0,0,1).describe();  // prints to the console ..  (epsilon = dual numbers)

It's all really simple once it clicks :)

danbri commented 6 years ago

One person's simple is another's year-long study project ;) I'll keep plugging away. Thanks again for the detailed notes.

Meanwhile on the NN side, https://arxiv.org/abs/1705.09792 discusses the complexities around just introducing complex numbers. GA I imagine opens up a massive design space of possible techniques, within which it might be easy to get lost?

danbri commented 6 years ago

Ah, hmm something basic did maybe click:

ε (with ε²=0) is just one of the three 'new number types' that are introduced by GA. (the others being i, and j (with i²=-1 and j²=1).

... this is the positive (j², double) / negative(i², complex) / neutral(ε², duals) characterization of the Algebra signature? (also called p, q, r, in the API; is that a widely used convention?)

enkimute commented 6 years ago

Absolutely .. spot on .. (p = number of elements like j, q is number of elements like i, and r is the number of elements like ε) everything else pretty much derives from that. (even the vector space starting point is not really required). For example .. the multiplication rules are trivially derived :

(a + bε)*(c + dε) = ac + adε + cbε + bdε² = ac + (ad+cb)ε

and similar for i and j .. the 'bd' term becomes scalar and the multiplication is closed.

(a+bi) (c + di) = (ac - bd) + (ad + cb)i (a+bj) (c + dj) = (ac + bd) + (ad + cb)j

Next you would write that as a Cayley table :

a*b	b=1	b=ε
a=1	1	ε
a=ε	ε	0

again, similar for i and j but the lower right corner would have -1 or 1. A Cayley table contains a in the rows, b in the columns and the result of ab in the table body. For all associative algebras, the Cayley table will be a latin square, and is trivially reorganised to the matrix form (which has the product ab as rows, a in the table body and b as the columns.) For the dual numbers (and without headers) :

matrix form of (x + yε)

x	0
y	x

matrix form of (x + yi)

x	-y
y	x

matrix form of (x + yj)

x	y
y	x

For one dimensional spaces (using just one of i,j,ε) the multiplication remains commutative, when you move to more dimensional spaces you generally lose that property .. or rather you win anti-commutativity. For a space called the minkowski plane, (R_1,1) the matrixes for i and j above can be combined using plain matrix multiplication and you would find that :

ij = -ji

Furthermore, when you multiply the above matrix form for i with that of j, you get a new type of matrix with a new form that is not listed above. specifically you get a matrix that looks like :

x	0
0	-x

Multiplying that matrix again with i produces j and vice-versa (so in other words no new symmetries beyond that one are created, in a 2D space, you get one extra 'type' of matrix) .. also note how it looks like a scalar matrix (and is also called the pseudo-scalar).

The rest is just more of the same ;) (plus I'm sure I'm at the point where I'm confusing you more lol).

(if not, at this point, try taking successieve powers of i, j and (1+ε) matrices .. you'll see a different pattern emerge for each that should hint you towards both the geometric meaning and the definition of inverses and their relation to the conjugation and other *morphisms.)

Ah well.. I've got a few mins :) - to see the geometric meaning, draw some 2D images of these numbers plotting a on the x-axis and b on the y-axis for (a + bx)

Element	Element²	Element³	Element⁴	Element⁵	Geometric meaning
0+i	-1+0i	0-i	1+0i	0+i	Rotate
0+j	1+0j	0+j	1+0j	0+j	Reflect
1+ε	1+2ε	1+3ε	1+4ε	1+5ε	Translate

So while taking ever-increasing powers on the real numbers gives you exponential growth, on these special numbers it gives you quite something different. Looking at the multiplication with i you see that it represents a 90 degree counterclockwise rotation. A trivial follow up question would be what is the inverse of this multiplication ? From the geometric interpretation the answer is obvious .. a clockwise 90 degree rotation should do the trick. .. and indeed multiplying with -i rotates us back nicely .. just as we would expect form (i * -i) = 1. The operation of negating the i part (or j or ε) is called conjugation and it will be the inverse operation (for numbers who's norm is 1 - which all the numbers in my little table are - and why I used 1+ε)

For non-unit-norm numbers the conjugation brings us only half-way .. it will remove the i,j,ε part .. but we still have to do the normal 1 over operation to get the full multiplicative inverse.

so to find the inverse

1/(a+bi)

we multiply both nominator and denominator with the conjugate :

(a-bi)/((a+bi)(a-bi)) = (a-bi)/(a²+b²) = 1/(a²+b²) * (a-bi)

that removes the i,j,ε part from the denominator which makes it real and easy to invert..

With that we see that (apart from when the denominator goes to zero), all our elements have a multiplicative inverse. (something that remains true in all of GA)

From here, the next space to examine would be R_3,0,0, which is a space with three elements (e₁,e₂,e₃) that are all like 'j' .. the interesting new thing however are the bivectors you get for free in R_3,0,0. They are e₁₂, e₁₃ and e₂₃ and even tho we picked three j elements, these bivectors all automatically square to -1. Together with the scalar they are isomorph to the quaternions and allow you to combine rotations without headaches. (plus make you realize a quaternion is 4 numbers, of which 1 is 0-dimensional (the scalar), and three are 2-dimensional (the bivectors), and they represent exactly the rotations in the three basic planes of 3D space - please do shoot people that mention a four dimensional sphere when you ask about quaternions.)

Ok, out of time, but at least now I'm sure you're more confused :D

Cheers 👍

enkimute commented 6 years ago

Ok,

Finally got around to writing a backprop example - it's in the coffeeshop and I've written a bit more background info in an observable notebook.

Cheers :)

micahscopes commented 4 years ago

Something similar I'm interested in is doing automatic differentiation with a given Clifford algebra, so in other words doing a kind of "tensor product of algebras" of a given Clifford algebra with the dual algebra. This is pretty straight forward to accomplish with operator overloading in typed languages, and I have taken this approach in my own experiments generating GLSL code. Basically, if you have the ability to generate Cl(R, p, q, r) as a Clifford algebra with signature p,q,r over a ring R, you could construct something like Cl(Cl(float, 0,0,1), p,q,r)) which would hypothetically allow you to do automatic differentiation over arbitrary GA computations!

enkimute commented 4 years ago

Hi micahscopes,

While direct sums of Algebra's are currently implemented in ganja.js :

// create a 4-coefficient algebra that is the direct sum of the dual and complex numbers
Algebra(0,0,1).sum(Algebra(0,1))

creating algebra's over a different field or ring is not yet 'nicely' supported but it is possible and I've used it in the rotor estimation example :

https://enkimute.github.io/ganja.js/examples/coffeeshop.html#ga3d_rotor_estimation

In that example the underlying ring are the multivariate dual numbers, which is not a Clifford Algebra. Ganja.js however does also allow creating multivariate dual numbers by providing a negative r. (yes that's a hack.)

So

Algebra(0,0,-3)

creates an algebra where each element has 4 coefficients, 1 real and 3 unrelated dual ones. (i.e. they have a zero outer product between them). In contrast :

Algebra(0,0,3)

creates a Clifford algebra with three Grassmanians, each element now has 8 coefficients, 1 real, 3 dual vectors, and their outer products.

In the context of automatic differentiation, the multivariate dual numbers make more sense and allow you to do AD on arbitrary multivector functions (as in the estimation example).

Cheers!

micahscopes commented 4 years ago

Okay, so I've been working on my own algebraic code generation tools. Here is some GLSL output for the dual algebra. I designed this tool specifically with composable algebras (in the "tensor product of algebras" sense) in mind. In the dual algebra example, the "base ring" elements are GLSL floats, but the tool allows you to use any GLSL type as the the base ring.

(Here is the generator code for a finite module over some ring... then an R-algebra is just one of those finite module generators with a multiplication function, see here.)

Basically I'm leaning heavily on GLSL's type system and operator overloading for all of this to be possible. I'm not sure how I'd do it without that? It'd probably be a lot of effort!

I haven't advertised my tool because I'm not happy with the function definition workflow. Don't get me wrong, I'm really pleased with it in a lot of ways, but it's just kinda clunky :man_shrugging:! Between you and me... python is kinda clunky :wink: :shhh: ! As an example, here's how geometric algebras are implemented: https://github.com/micahscopes/alglbraic/blob/glslify/alglbraic/algebras/clifford_algebra.py#L21-L137

enkimute commented 4 years ago

Ah that looks like fun .. since you are using GLSL, have you considered 'lending' the anticommutative properties of the outer product from the cross-product ?

Without operator overloading and types it is a bit more fiddly, but alas in javascript I did not have much of a choice.

I wont tell anyone python is clunky. 🍺

enkimute / ganja.js

chain rule example? #5