Closed IanQS closed 1 year ago
Hi,
Thanks for the excellent video idea! :) I overlooked the softmax because I only do only little classification myself. It is definitely worth a video.
For reference, the forward pass for the softmax is
$$ y_i = \frac{e^{x_i}}{\sum_j e^{x_j}} $$
Then its Jacobian is given by
$$ \frac{\partial y_i}{\partial x_k} = yi \delta{ik} - y_i y_k $$
Importantly without summation over $i$. For the derivation, see, e.g., here (I will then do this derivation in detail in the video.)
As such, the pushforward becomes
$$ \dot{y}_i = yi \delta {ik} \dot{x}_k - y_i y_k \dot{x}_k = y_i \dot{x}_i - y_i y_k \dot{x}_k $$
or in symbolic notation
$$ \dot{\underline{y}} = \underline{y} \odot \dot{\underline{x}} - \underline{y} (\underline{y}^T \dot{\underline{x}}) $$
$$ \bar{x}_k = \bar{y}_i yi \delta {ik} - \bar{y}_i y_i y_k = \bar{y}_k y_k - \bar{y}_i y_i y_k $$
or in symbolic notation
$$ \bar{\underline{x}} = \bar{\underline{y}} \odot \underline{y} - \underline{y} (\bar{\underline{y}}^T \underline{y}) $$
For anyone reading, the video on the pushforward is now online: https://youtu.be/J7hK1Ba20yA
Video on the pullback to be released next week.
Hi there!
Thank you so much for your fantastic content! I've been an avid subscriber for a bit. I'm working on a simple automatic differentiation library in rust, and I'm in the phase of implementing a softmax. I know that I could google it or find the implementation in a library. Still, I've enjoyed your videos and your work and I was wondering if you could do the derivation in a video?