Closed nicezic closed 1 year ago
you've configured to load in 16 bit, but you have only half the amount of VRAM needed to load it in the first place. It's likely running on CPU, or offloading to system ram.
When using the reference software, set load_in_8bit = True
to have better odds of loading properly, or use prebuilt user-ready software like https://github.com/oobabooga/text-generation-webui (this can load in 4-bit which will fit find on your GPU)
I use GTX3070Ti 8G VRAM, and Ryzen 32Core.
Is it normal to take a long time ( about 15min )to generate an answer?
My params are ..
Is there a way to speed up to generation?