subnetwork with Kronecker covariance

aleximmer / Laplace

Laplace approximations for Deep Learning.

https://aleximmer.github.io/Laplace

MIT License

458 stars 71 forks source link

subnetwork with Kronecker covariance #189

Closed ruili-pml closed 4 months ago

ruili-pml commented 4 months ago

Hi,

For now the library only supports diag or full covariance for subnetwork Laplace, but I need to use Kronecker covariance. I was wondering is it hard to extend the library into support it? If so, could you point me to somewhere I can start with (it seems extending curvature.py should do the work), thank you.

Best, Rui

ruili-pml commented 4 months ago

https://github.com/aleximmer/Laplace/blob/7af7e30b3875657a47eb248c99769071521b7d65/laplace/curvature/curvature.py#L68

Also during going through the backend implementation, I believe there's a typo here. The Jacobian's dimension should be [batch, output, params]?

wiseodd commented 4 months ago

You can achieve that by switching off the gradients of the params you don't want to compute the Hessian of.

See this example where we have a foundation model and LoRA on top. Then we do Laplace (KronLaplace works) only on the latter: https://github.com/aleximmer/Laplace/blob/main/examples/huggingface_example.md

Thanks also for catching the typo. Would you mind opening a quick PR?

ruili-pml commented 4 months ago

Thank you for pointing me to the example! I'll check it to see if I can do the same for my network.

I opened the PR

ruili-pml commented 4 months ago

Just to double check, after I turn the gradient off on the part of the network that I want to treat deterministically, I can then just use the library for fitting the Hessian&prior precision, making prediction and etc right?

Also, out of curiosity, if this is the case, then why isn't this how the diag and full subnetwork laplace is implemented? From my understanding in the current implementation, the Jacobian of output wrt all weights are first computed, then the subnetwork part is retrieved from the whole Jacobian. In this case, the computation of Jacobian on the deterministic weights are not necessarily?

wiseodd commented 4 months ago

Yes, that's correct. That should work. I used this already for my recent paper:

The addition of this "switch off grads to do subnetwork Laplace" is very recent, specifically to support parameter-efficient finetuning of foundation models: https://github.com/aleximmer/Laplace/pull/144.

The SubnetLaplace interface, meanwhile, hasn't been touched since forever and thus might not keep up to the latest standard. We might sunset support for this in the future.

ruili-pml commented 4 months ago

great, thanks a lot!