Open shang-zhu opened 7 months ago
Thanks for your interest in our work!
The memory required depends on two things:
The file size of the model varies on how large the model is. Llama2-7B-Chat requires about 30GB of storage. Llama2-13B-Chat requires about 50GB of storage. Llama2-70B-Chat requires about 150GB of storage.
As a general recipe, I'd guess (amount of memory for the model) + (2-3 GBs per layer you'd like to apply Unlimiformer at) will get you pretty close to the amount needed, but this depends on how long your inputs are and whether you choose flat or trained indices.
I would like to ask about the input of about 100,000 tokens. Using the llama2-13b model, how long does it take to run it on an H100 graphics card?
Hi, I successfully ran the inferences with Llama-2-7b and unlimiformer but ran into memory errors when jumped to larger models. What are the minimum GPU memory requirements for running 13b and 70b models? Thank you!