Closed jgranek closed 6 years ago
Q is a projector that takes features from intermediate time steps and forwards them to the classifier. It can be any linear operator.
It might well be that Jmv is not being used in the CIFAR example. First, note that Jmv is the Jacobian and not its transpose (used in backprop).In first order methods like SGD you wouldn't use Jacobians much. However, they'll be used in second-order methods and also to test derivatives.
@eldadHaber @lruthotto What is Q? It seems like its a projection matrix of some kind..? It can be either an operator (Identity) or a 2D Array?
Also - Jmv doesn't seem to get used anywhere in the CIFAR10 example...?