Open wiseodd opened 1 day ago
I believe #201, although not implemented at the moment, would be a benefit of a pure LLLaplace?
Thanks @aleximmer, that's a good point.
I believe that can also be done even if LLLaplace
is just a wrapper for the gradient-switch-off above. We still know how to get feature \phi(x)
in this case, and so that fast functional variance can also be computed.
I also think we can even maintain the user-facing API of LLLaplace; just the internal implementation is different: Instead of we're implementing special functions for LLLA, like last_layer_jacobians
, we can just call Laplace
with a proper gradient-switch-on-off.
Let me know if I missed anything!
We can achieve the same thing by switching off the grads of all but the last layer.
The major pros are:
@runame @aleximmer any reasons you could think of why don't we just make
LLLaplace
as a simple wrapper for the above?