Maximum memory reached on second set of generation

ai-safety-foundation / sparse_autoencoder

Sparse Autoencoder for Mechanistic Interpretability

https://ai-safety-foundation.github.io/sparse_autoencoder/

MIT License

180 stars 39 forks source link

Maximum memory reached on second set of generation #68

Closed HoagyC closed 10 months ago

HoagyC commented 11 months ago

When training on an RTX4090 using 1M saved activations, I get an out-of-memory error but only while generating the second set of activations. Intuitively it feels like it shouldn't cost any additional memory above the first generate-train cycle and so there must be some lingering memory use, though this might be unavoidable.

alan-cooney commented 11 months ago

Memory management is complex in PyTorch so it's hard to know if this is a genuine issue or just some tensor hanging around that hadn't been cleaned up yet. https://github.com/ai-safety-foundation/sparse_autoencoder/pull/66 may help see more of what is going on

alan-cooney commented 10 months ago

Closing for now as it seems fine on my tests