-
### Problem
Tabby API currently only handles text. Many vision models have released. Exllama dev supports qwen2-vl
### Solution
Support vision through openAI api. Hopefully in text completion too.
…
-
llamacpp isn't fast enough, and ggml isn't the only quantization method for local use. Support like [tabbyapi](https://github.com/theroyallab/tabbyAPI) or even better yet, [aphrodite-engine](https://g…
-
Hi, thanks for the guidance project, I really like it!
I've been wondering for sometime about the possibility of extending guidance to support more open source projects, such as ExLlama (https://gi…
-
## Expected Behavior
Be able to use exllama binding
## Current Behavior
Imposible use exllama binding
I get the error:
Traceback (most recent call last):
File "E:\data_test\bindings_zo…
-
i ran the script below and tried to increase the context size of my model and i got this error
what do you think is the possible issue
### Steps/code to reproduce the bug:
```python
…
-
### Feature request
Currently tgi exllama version cannot support act order with sharded GPU,
### Motivation
I tested the commit:3b013cd53c7d413cf99ca04c7c28dd5c95117c0d of exllama,
command
``…
-
This whole project is my attempt to figure out API for exllama, [something which a lot of people are hacking on](https://github.com/turboderp/exllama/issues/13), but nothing's really there yet. I expe…
-
### Describe the bug
Exllama v2 crashes when starting to load in the third gpu. No matter if the order is 3090,3090,A4000 or A4000,3090,3090, when I try to load the Mistral Large 2407 exl2 3.0bpw it …
-
When using the latest version every model i try to load with exllama il get either an error or just bad results (nonsense)
It used to work great at prior versions.
## Steps to Reproduce
Using l…
-
found your message on exllama for fastapi : https://github.com/turboderp/exllama/issues/37
what am i supposed to do, use your exllama version with the fastapi you created, or can i run it with the …