Can SAE support the training of larger models?

EleutherAI / sae

Sparse autoencoders

MIT License

299 stars 37 forks source link

Can SAE support the training of larger models? #17

Closed ypw-lbj closed 1 month ago

ypw-lbj commented 1 month ago

Hello,

Regarding the support for training larger models like those with 72 billion parameters, since a single GPU's VRAM isn't sufficient to accommodate such large models, are there any development plans in consideration to address this issue?

Alternatively, could you provide a potential solution to tackle this problem, which I could attempt to implement myself?

Thank you.

norabelrose commented 1 month ago

I'd recommend using 4-bit quantization and GPUs with 80 GB VRAM, like A100s. We could also try to add FSDP but for some reason the last time I tried to add this to sae it caused an increase, not a decrease, in memory so idk what's going on there.