OOM eventually when using create_graph=True with BatchL2Grad

f-dangel / backpack

BackPACK - a backpropagation package built on top of PyTorch which efficiently computes quantities other than the gradient.

https://backpack.pt/

MIT License

555 stars 55 forks source link

OOM eventually when using create_graph=True with BatchL2Grad #238

Open crowsonkb opened 2 years ago

crowsonkb commented 2 years ago

I was trying to use my second-order optimizer ESGD-M with BatchL2Grad in order to collect information on within-batch gradient variance to estimate stochastic noise (think OpenAI's gradient noise scale paper), and I kept OOMing after maybe six epochs of MNIST training. ESGD-M does a Hessian-vector product internally (not using Backpack stuff, just autograd) so it needs the user to specify create_graph=True. I assume when I use it with Backpack, something is leaking references to past computational graphs, normally these graphs are garbage collected without issue.

Thank you, Katherine Crowson

crowsonkb commented 2 years ago

Apparently if I tell ESGD-M to do a Hessian-vector product every step instead of every ten for compute efficiency, I don't OOM anymore. Normally the graphs made with create_graph=True are freed on their own if ESGD-M doesn't do an HVP that step but Backpack is hanging onto them somewhere?

f-dangel commented 2 years ago

Hi,

thanks for your report. From your description I think BackPACK's memory cleanup should be triggered during the backward pass.

Maybe you can try to explicitly disable BackPACK's hooks during the HVP using

from backpack import disable

with disable():
   # HVP

Otherwise, it would be great to reproduce this issue in an MWE and track down the memory leak.

Best, Felix