Closed ArthurConmy closed 11 months ago
Sadly, I'm seeing this run out of RAM even on V100s for all setups I've tried so far
Are you running out of memory on train or validation?
The demos use the Anthropic Towards Monosemanticity defaults, which include a very large batch size (8k). Coupled with the fact that we're training there on a larger residual stream (d_model=512), it is pretty resource hungry so I'm not surprised a single V100 (esp if 16GB) can't take it. We can add a note for a suggestion to reduce the batch size - or just put a default size that is a bit smaller (but we'll need to do a new sweep to find sensible defaults if so for l1 coefficient).
Sadly, I'm seeing this run out of RAM even on V100s for all setups I've tried so far