Tensor Derivatives - Githubissues

thomasahle commented 9 months ago

I would like to write an article about tensor derivatives, such as this derivative of the Hessian chain rule:

Is there already an article about this that I should contribute to? Or should I start one from scratch? Also, I'm not sure if my notation of using x -> f for function applications, f(x), and -- for tensor contractions A -- x is standard. If there's a better notation, I can switch it out.

emstoudenmire commented 9 months ago

Hi Thomas, thanks for suggesting a contribution. I'm interested, but have a few questions about this topic and the notation. Mainly:

do you know if this notation relates to Penrose tensor diagram notation and how? I'm open to having other diagrammatic notations on the site, but I'm not familiar with particular one so I'd like to understand it better.
it's not totally clear to me that this computation is primarily about tensors i.e. linear functions on vector spaces. While derivatives are related to tensors, but the original object here being differentiated looks here to be the composition of two general functions. Could you please explain more about which objects here are tensors?

Oh I just saw what you wrote at the end. Correct, the notation used here is not standard in the tensor network field (meaning the one in quantum physics and in applied math) though papers in that field do fairly often introduce non-standard notations as long as they are clearly defined. Happy to discuss more.

thomasahle commented 9 months ago

The derivative notation (with circles around tensors) is directly from Penrose: https://en.wikipedia.org/wiki/Penrose_graphical_notation#Covariant_derivative
I think tensor networks are the only good way to write up the hessian chain rule. It's possible to do that without anything but tensors, since it's simply a mix of (higher order) hessians and jacobians. See e.g. Yaroslaw's work on optimizing the contraction of these tensors: https://community.wolfram.com/groups/-/m/t/2437093 .

However to actually derive the tensors using the chain rule, I think you need to show the function application as well, which is why I added them to my notation. If you know / can think of any better way to do this, I would consider it a great win!

thomasahle commented 9 months ago

I have a bunch more examples of derivations using this notation here: TensorDerivatives.pdf Though it is not so well documented at this point.

emstoudenmire commented 9 months ago

I see, interesting. Ok I'm convinced then that this material does fit with the site. Here are some requests about the writeup:

Would you please add a little bit to the page about tensor diagram notation explaining the derivative notation i.e. just how it works and what the extra index is for? (I got it well now from the Wikipedia page. Good to know that was also by Penrose, and I did know a bit about birdtracks too.)
When you write it up, could you please add some brief context of why taking the gradient of these particular functions comes up, and in what fields or applications? For example, it's not clear to me off hand why A(x)_ij x_j is a common type of function pattern that one wishes to study. Does it come from general relativity?

Lastly, you might like this recent article by some people in my field. I'm sure it's rediscovering some things in the more introductory part of the article, but by the end they pull off some impressive calculations. I think the notation there is related, but with thick lines representing plugging in continuous variables instead of lines with arrows at the end: https://journals.aps.org/prresearch/abstract/10.1103/PhysRevResearch.5.013156

emstoudenmire commented 9 months ago

Oh and, no there's not an article about this topic so you should start one from scratch. Feel free to make a new section of the site, though we could discuss how to organize it.

I'm open to however you want to make the figures. Usually I make mine in the Keynote presentation software, using a border size of 4 pixels for the shapes and lines, and then I just take screenshots to make the images. Primitive I know, but just thought I'd share that. I'm hoping in the future for some tensor diagramming software that will also generate high-quality images as output.

thomasahle commented 9 months ago

One challenge I'm having is what notation to use for function application. In the above diagram I used arrows along tensor dimensions (instead of simple edges), but sometimes you may want to take a function of a scalar. Like the division in softmax, softmax(x) = exp(x)/sum(exp(x)). This is causing me trouble, because a tensor graph, that represents a scalar, doesn't have any free edges. I could just put an arrow coming out of some arbitrary node, but that seems confusing. I could also put a circle around the graph, and have an arrow coming out of that. Any other ideas?

TensorNetwork / tensornetwork.org

Tensor Derivatives #31