-
I'm developing AI assistant for fiction writer. As openai API gets pretty expensive with all the inference tricks needed, I'm looking for a good local alternative for most of inference, saving gpt4 ju…
-
# Feature Description
Please provide a detailed written description of what you were trying to do, and what you expected `llama.cpp` to do as an enhancement.
# Motivation
It sounds like it's …
-
root@C.8174303:~/KoboldAI$ ./play.sh --model models/Aurora-Nights-103B-v1.0-5.0bpw-h6-exl2 --model_backend "ExLlama V2" --model_parameters help
Colab Check: False, TPU: False
INFO | __main__::…
-
ExLlama has implemented very optimized CUDA kernels. We should import the kernels to see just how efficient it could be in AWQ.
https://github.com/turboderp/exllama/blob/master/exllama_ext/exllama_…
-
## Expected Behavior
When I upload a document, I should be able to refer to it in the chat and prompt the AI to perform tasks with the contents.
## Current Behavior
I upload a document and the A…
-
When converting [nemolita-21b](https://huggingface.co/win10/nemolita-21b), which is a merged model, the `convert.py` runs into this error:
```shell
Traceback (most recent call last):
File "/hom…
-
Hello everyone
Im trying to setup exllama in an Azure ML compute and I followed the instructions here https://github.com/turboderp/exllama, but unfortunately Im getting an error when trying to call…
-
### Feature request
Integration of new 4bit kernels
https://github.com/IST-DASLab/marlin
### Motivation
provide faster Inference than awq/exllama for batch sizes upto 32
### Your contribut…
-
I'm sorry I am unable to find relevant doc on Internet on how to load all modules on GPU.
I got this error message from my code:
```
Found modules on cpu/disk. Using Exllama backend requires al…
-
I'm using ExLLama with the Oobabooga text-generation UI. With the model: TheBloke_llama2_70b_chat_uncensored-GPTQ
The model works great, but using ExLLama as a loader the model talks to itself, gen…