exllama Search Results - Githubissues

turboderp/exllamav2 #571

Curious about Exllama+TP

How hard would it be to write an inference engine based on exllama that supported tensor parallel, using the existing building blocks? Assume the quantized weight tensors would need to be split acr…

grimulkan updated 10 hours ago

maitrix-org/llm-reasoners #95

RuntimeError while running RAP for gsm8k dataset using exlla…

**Following the readme.md, I tried to run RAP for gsm8k using exllama, with the recommended instruction:** `CUDA_VISIBLE_DEVICES=0,1 python examples/RAP/gsm8k/inference.py --base_lm exllama --exlla…

Zeyuan-Liu updated 1 week ago

theroyallab/tabbyAPI #200

[BUG] Lack of documentation/unable to use JSON/Grammar

### OS Windows ### GPU Library CUDA 12.x ### Python version 3.12 ### Describe the bug Hi thanks the project support EBNF grammar and JSON Schema, however, I am unable to use it I believe it is…

fullstackwebdev updated 7 hours ago

c0sogi/llama-api #21

exllama GPU split

It's not clear from the documentation how to split VRAM over multiple GPUs with exllama.

atisharma updated 10 months ago

turboderp/exllama #192

Exllama tutorials?

I'm new to exllama, are there any tutorials on how to use this? I'm trying this with the llama-2 70b model.

NickDatLe updated 1 year ago

surcyf123/dataset_enrichment #39

Need to switch to exllama, everything I'm reading about is how exllama is better. At least for production we will need to switch. Speed is everything at the inference volume we expect. Note to try VLL…

surcyf123 updated 1 year ago

vllm-project/vllm #296

Feature request：support ExLlama

ExLlama (https://github.com/turboderp/exllama) It's currently the fastest and most memory-efficient executor of models that I'm aware of. Is there an interest from the maintainers in adding this sup…

alanxmay updated 2 months ago

Glavin001/Expertise-by-AI #4

Add ExLlama inference support

- https://github.com/turboderp/exllama - https://github.com/oobabooga/text-generation-webui/blob/c7058afb402bd381d1983837b779c106217120b3/modules/exllama.py

Glavin001 updated 1 year ago

turboderp/exllamav2 #457

undefined symbol: _ZN3c104cuda9SetDeviceEi

----> 4 gptq_model = exllama_set_max_input_length(gptq_model, max_input_length=7504) /usr/local/lib/python3.10/dist-packages/auto_gptq/utils/exllama_utils.py in exllama_set_max_input_length(model, …

icivi updated 4 months ago

casper-hansen/AutoAWQ #579

hello, i have UserWarning: AutoAWQ could not load GEMM kerne…

/root/anaconda3/envs/chatglm3_v2/lib/python3.10/site-packages/awq/modules/linear/exllama.py:12: UserWarning: AutoAWQ could not load ExLlama kernels extension. Details: libcudart.so.12: cannot open sha…

wy200507030 updated 1 week ago

1000+ results
for exllama