Precise details on GPUs and Memory needed for Inference

bytedance / SALMONN

SALMONN: Speech Audio Language Music Open Neural Network

https://bytedance.github.io/SALMONN/

Apache License 2.0

978 stars 75 forks source link

Precise details on GPUs and Memory needed for Inference #51

Closed SaraAlthubaiti closed 1 month ago

SaraAlthubaiti commented 2 months ago

Hi,

Could you please provide more precise details on how many GPUs and how much memory are needed for running the inference? For training, I'm assuming based on the readme that it's one A100-SXM-80GB. The same is mentioned for inference, but I don't think it will need all of that. More specific details would be great.

Thanks for your time and great work!

TCL606 commented 2 months ago

If you're using a 13B model and not quantizing it, I think 80G of memory is necessary to avoid OOM, especially if you want to use SALMONN to generate longer responses (e.g., 1000 tokens or more). However, if you only need to use SALMONN to generate shorter outputs (like short captions), I feel like you need about 60G of memory, but I really haven't tested it precisely.