Being on an NVIDIA T4, Is it possible to utilize xformers, and use exllamav2 as the loader for (mistral flavor of your choice)GPTQ 4bit 32gs ... I have a feeling it would perform blazingly fast with minimal degradation and great context... But you've spent more time on this...
Being on an NVIDIA T4, Is it possible to utilize xformers, and use exllamav2 as the loader for (mistral flavor of your choice)GPTQ 4bit 32gs ... I have a feeling it would perform blazingly fast with minimal degradation and great context... But you've spent more time on this...