Adds a tutorial for BackPACK's retain_graph option. It shows
how to distribute the GGN diagonal computation of an auto-
encoder architecture over multiple backward passes to reduce
peak memory.
This use case recently came up in a discussion with @wiseodd
on Laplace approximations for auto-encoders (or any large
output neural network with square loss).
Adds a tutorial for BackPACK's
retain_graph
option. It shows how to distribute the GGN diagonal computation of an auto- encoder architecture over multiple backward passes to reduce peak memory.This use case recently came up in a discussion with @wiseodd on Laplace approximations for auto-encoders (or any large output neural network with square loss).