EigenPro / EigenPro3

20 stars 7 forks source link

Customizing Output of Model (k @ theta) #4

Closed phanatchakrit closed 1 year ago

phanatchakrit commented 1 year ago

I am trying to modify the output of an ffm function. If I understand correctly, the model prediction is represented by the equation grad = (k @ theta). I am considering changing this to grad = torch.sigmoid(k @ theta) or perhaps something more complex like grad = (k @ theta) - torch.sigmoid(k @ theta) before completing the gradient equation with grad = grad - y_batch_all[0].

Is it mathematically correct and sensible to make this modification and still using the same weight updating function? Are there any potential issues or considerations I should be aware of when making this change? Also, with respect to PyTorch's autodiff, does my modification need to be differentiable to preserve the gradient in the torch variable?

Thank you

parthe commented 1 year ago

Hi @phanatchakrit

What is an ffm function?

the model prediction is K @ theta whereas the gradient (for the square loss) is grad = K @ theta - y (See equation (9) of this paper)

We don't currently support other loss functions, otherwise the gradient would look different. Which would still be fine, but our preconditioner is designed for the square loss and not other losses. So it may not be mathematically sound.

We don't use autodiff by torch at any point in our package, so it is not a concern. You can run the entire package inside a with torch.no_grad(): block and everything should still work fine.

Hope this answers your questions. Feel free to respond here, but I will close out this comment.