Closed HoagyC closed 10 months ago
Memory management is complex in PyTorch so it's hard to know if this is a genuine issue or just some tensor hanging around that hadn't been cleaned up yet. https://github.com/ai-safety-foundation/sparse_autoencoder/pull/66 may help see more of what is going on
Closing for now as it seems fine on my tests
When training on an RTX4090 using 1M saved activations, I get an out-of-memory error but only while generating the second set of activations. Intuitively it feels like it shouldn't cost any additional memory above the first generate-train cycle and so there must be some lingering memory use, though this might be unavoidable.