JuliaDiff / SparseDiffTools.jl

Fast jacobian computation through sparsity exploitation and matrix coloring
MIT License
240 stars 42 forks source link

Repeated HVPs With Sub-Sampling #171

Closed RS-Coop closed 2 years ago

RS-Coop commented 2 years ago

This question is not necessarily specific to this package, but it seemed like the best place to ask given your exisiting resources for computing HVPs. Apologies if I should be directing this elsewhere, but thanks in advance for the help!

My goal is as follows: given a minibatch of inputs I want to compute the gradients of a function with respect to some parameters at these inputs, and I want to build an HVP operator from a sub-sample of the minibatch inputs -- which I can use multiple times. Ideally I would like to compute the gradients first, and then use a sub-sample of these gradients to build the HVP operator.

It seems that existing approaches (like what I have seen here) recompute the gradients each time the HVP operator is applied. It is also not clear how to accomplish the sub-sampling.

It seems what I am trying to do should be possible, but I am just getting a little lost in the implementation details.

Thanks again!

ChrisRackauckas commented 2 years ago

yeah, maybe there needs to be an update! function or something to handle when the gradients are updated.

RS-Coop commented 2 years ago

Well it seems like the key issue is how to use pre-computed gradients obtained via backward mode AD in the forward mode AD to then compute a hvp. When mixing ForwardDiff and Zygote we have to specify the vector in the hvp before hand. Do you know if ReverseDiff (or Zygote's secret forward mode AD) would help here?

ChrisRackauckas commented 2 years ago

I don't think that's possible since you have to pushforward the v in the forward pass of the gradient in order to do forward-over-reverse. That seems pretty fundamental to the method.