Closed oobabooga closed 7 months ago
Hello!
Hi! @oobabooga, we have just released quantized models on Hugging Face(including LLama-2 70B). Check the Readme.md for details. Please refer to the notebooks (for streaming or generation) for examples on how to use them. Hope this helps!
Thanks @Vahe1994, that's very helpful. I'll try to test the models later.
Hello,
I have two basic questions:
1) Do you have any data on how long it takes to quantize a 70b model using 24GB VRAM (assuming that's possible)? 2) Do you plan to release prequantized models on Hugging Face? Having llama-2-70b for comparison with other methods would be useful.