Closed phanatchakrit closed 1 year ago
Hi @phanatchakrit
What is an ffm
function?
the model prediction is K @ theta
whereas the gradient (for the square loss) is grad = K @ theta - y
(See equation (9) of this paper)
We don't currently support other loss functions, otherwise the gradient would look different. Which would still be fine, but our preconditioner is designed for the square loss and not other losses. So it may not be mathematically sound.
We don't use autodiff by torch at any point in our package, so it is not a concern. You can run the entire package inside a with torch.no_grad():
block and everything should still work fine.
Hope this answers your questions. Feel free to respond here, but I will close out this comment.
I am trying to modify the output of an ffm function. If I understand correctly, the model prediction is represented by the equation grad = (k @ theta). I am considering changing this to grad = torch.sigmoid(k @ theta) or perhaps something more complex like grad = (k @ theta) - torch.sigmoid(k @ theta) before completing the gradient equation with grad = grad - y_batch_all[0].
Is it mathematically correct and sensible to make this modification and still using the same weight updating function? Are there any potential issues or considerations I should be aware of when making this change? Also, with respect to PyTorch's autodiff, does my modification need to be differentiable to preserve the gradient in the torch variable?
Thank you