google-deepmind / kfac-jax

Second Order Optimization and Curvature Estimation with K-FAC in JAX.
Apache License 2.0
249 stars 23 forks source link

Incorrectly pytree recognition by KFAC optimizer #273

Open Uernd opened 1 month ago

Uernd commented 1 month ago

First of all, thanks for the contribution of KFAC team !

While using KFAC optimizer to optimize an ANN, I noticed that the KFAC optimizer seems have some trouble to understand the structure of parameter tree if the parameter is used more than once while constructing the neural network.

If the original ANN denoted as f(params, inputs), then if we simply use a modified ANN as F(params, inputs) = f(params, inputs) + f(params, inputs),the program will throw an error. I have tried functools.partial to fix the parameters, but it seems the program will get stuck somehow. If I use vmap, some of the parameters would be labelled as 'orphan' and by experiments, this would affect the optimization process.

I wonder if there is already some methods to avoid these issues? Would you consider update the optimizer to fix this bug?

Thanks again for the well-designed optimizer !

james-martens commented 2 weeks ago

For K-FAC the optimizer doesn't currently support the parameters being used in more than once in the graph. This doesn't rule out RNNs and transformers since they usually only use the parameter once in the graph, just with an operation that has a time dimension. As of a week or two ago, the behavior when finding such a parameter is to automatically register it as "generic", which will resort to a crude curvature approximation. If you use the TNT feature in the code, you can get generically-registered layers to do something more useful.