GPU VRAM Usage during training

abertsch72 / unlimiformer

Public repo for the NeurIPS 2023 paper "Unlimiformer: Long-Range Transformers with Unlimited Length Input"

MIT License

1.05k stars 77 forks source link

Hi,

Thanks for your great work! I have some questions regarding the GPU usage when training with LLaMa 2:

What is the peak usage of the VRAM when training the Unlimiformer using the long-range training methods in both 8k and 16k settings?
Since the complexity is linear during training, training in 16k is around double VRAM than 8k if I understand correctly. So if I want to train Unilimiformer in 80k, it would be 10 times more VRAM usage than 8k?
I saw in a previous issue that currently Unilimiformer could only be trained in a single GPU, so the training length will be limited on the max single GPU RAM, say 80GB for A100 GPU. So I am curious is the 16k training length is the max possible length for now?

Thanks!

abertsch72 / unlimiformer