Open johanno opened 6 months ago
I am pretty sure exllama only works for gpu models. "A standalone Python/C++/CUDA implementation of Llama for use with 4-bit GPTQ weights, designed to be fast and memory-efficient on modern GPUs".
Hi. Yes I am sorry for that but exllama is a GPU only binding.
Expected Behavior
use cpu on hugging face
Current Behavior
Steps to Reproduce
select hugging face select WizardCoder-Python-7B-V1.0-GPTQ select cpu in hugging face settings
Possible Solution
no idea
Context
can't use gpu since 8 GB aren't enough for most good models
Screenshots
If applicable, add screenshots to help explain the issue.