Closed Giodiro closed 2 years ago
I second this. In many applications, the input to the kernel comes from the output of a parameterized function with parameters \theta. It would be great if we could compute gradients wrt \theta.
Is this already implemented/planned to be implemented?
For now this is not implemented. Differentiating through the kernel would not be hard in itself thanks to KeOps, so if you had an already trained falkon model, then you could differentiate through its predictions. Would this be useful for your use-case?
In practice the current model is trained with conjugate gradients, and we cannot simply use such algorithm to differentiate with respect to arbitrary, parametrized feature transformations of the data :(
My use-case involves differentiating through the predictions of an already trained model. So I might be able to use KeOps then. Thanks for tip!
This could allow optimization of kernel parameters with autograd.
Steps: