aleximmer / Laplace

Laplace approximations for Deep Learning.
https://aleximmer.github.io/Laplace
MIT License
472 stars 73 forks source link

Should `LLLaplace` be deprecated? #254

Open wiseodd opened 1 day ago

wiseodd commented 1 day ago

We can achieve the same thing by switching off the grads of all but the last layer.

model = ...

for mod_name, mod in model.named_modules():
    if mod_name != "last_layer":  # or whatever
        for p in mod.parameters():
            p.requires_grad = False

la = Laplace(model, ..., subset_of_weights="all", ...)

The major pros are:

@runame @aleximmer any reasons you could think of why don't we just make LLLaplace as a simple wrapper for the above?

aleximmer commented 1 day ago

I believe #201, although not implemented at the moment, would be a benefit of a pure LLLaplace?

wiseodd commented 1 day ago

Thanks @aleximmer, that's a good point.

I believe that can also be done even if LLLaplace is just a wrapper for the gradient-switch-off above. We still know how to get feature \phi(x) in this case, and so that fast functional variance can also be computed.

I also think we can even maintain the user-facing API of LLLaplace; just the internal implementation is different: Instead of we're implementing special functions for LLLA, like last_layer_jacobians, we can just call Laplace with a proper gradient-switch-on-off.

Let me know if I missed anything!