awq Search Results - Githubissues

1000+ results
for awq

Best match

Best match Most commented Newest Recently updated Least commented Oldest Least recently updated

NVIDIA/TensorRT-LLM #2445

Build Qwen2-72B-Instruct model by INT4-AWQ quantization fail…

### System Info Ubuntu 20.04 NVIDIA A100 nvcr.io/nvidia/tritonserver:24.10-trtllm-python-py3 and 24.07 TensorRT-LLM v0.14.0 and v0.11.0 ### Who can help? @Tracin ### Information - [x] The offici…

wangpeilin updated 2 days ago
2
QwenLM/Qwen2-VL #532

is there a way to reduce vRAM usage & speed up vLLM inferenc…

Hi, I tried both the qwen2-vl-7b bf16 & awq and honestly I'm not seeing any speed improvement. the awq is ~6GB however after running in vLLM it ends up taking the same space in vRAM eventually (~22G…

mehamednews updated 1 week ago
1
NVIDIA/TensorRT-LLM #2487

int4 not faster than fp16 and fp8

### System Info x86_64, Debian 11, L4 GPU ### Who can help? _No response_ ### Information - [ ] The official example scripts - [ ] My own modified scripts ### Tasks - [ ] An officially supporte…

ShuaiShao93 updated 3 days ago
8
Intelligent-Computing-Lab-Yale/TesseraQ #1

Missing run_awq_llama.sh

Hi, thanks for your work. However, I cannot find the 'run_awq_llama.sh', am i missing sth?

RanchiZhao updated 3 weeks ago
5
microsoft/onnxruntime-genai #1087

Token Streaming in only returns first token correctly.

I successfully quantized the mistralai/Mistral-Nemo-Instruct-2407 model to ONNX using the following command: `python awq-quantized-model.py --model_path mistralai/Mistral-Nemo-Instruct-2407 --quant_p…

gbuuu updated 5 days ago
4
InternLM/lmdeploy #2766

[Feature] W4A8-FP8 support in AWQ quantization

### Motivation as we all know that lmdelopy runs fastest in awq w4a16, however, as fp8 is used in lots of place. so i wonder, if developers has any plan to develop a fastest w4a8-fp8 kernel in lmdepl…

yongchaoding updated 1 week ago
2
oobabooga/text-generation-webui #6460

Can't load awq model

### Describe the bug I installed text generation webui and downloaded the model(TheBloke_Yarn-Mistral-7B-128k-AWQ) and I can't run it. I chose Transofmer as Model loader. I tried installing autoawq b…

nNote1377 updated 5 days ago
3
QwenLM/Qwen2-VL #522

cannot do awq quantization on qwen 2vl 7b

Hi there, I was struggling on how to implement quantization on autoawq as you mentioned in home page. I was trying to quantize 7b qwen2 vl but no matter I use 2 A100 80Gb vram, I still get cuda oom…

lebronjamesking updated 2 weeks ago
4
QwenLM/Qwen2.5 #1090

[Bug]: vllm infer Qwen2.5-32B-Instruct-AWQ with 2 * Nvidia-L…

### Model Series Qwen2.5 ### What are the models used? Qwen2.5-32B-Instruct-AWQ ### What is the scenario where the problem happened? inference with vllm ### Is this a known issue? - [X] I have …

RoyaltyLJW updated 2 days ago
16
intel/neural-compressor #1980

how to evaluate AWQ ?

https://github.com/intel/neural-compressor/blob/master/docs/source/quantization_weight_only.md#examples how to set eval_func? https://github.com/intel/neural-compressor/blob/master/examples/3…

chunniunai220ml updated 3 months ago
7

上一页 1...1 2 3 4 5 6 7...100 下一页

1000+ results for awq

1000+ results
for awq