awq Search Results - Githubissues

NVlabs/VILA #150

Evaluation of AWQ models

When I try to evaluate the quantized AWQ models using the video evalaution script, I'm getting FileNotFoundError. ``` FileNotFoundError: No such file or directory: "/hfhub/hub/models--Efficient-La…

surya00060 updated 1 week ago

OpenNMT/CTranslate2 #1821

Possible issue with AWQ library when using AWQ models with C…

Just FYI, I think that the ```autoawq``` library only supports up to a certain version of the ```torch``` library pursuant to this message, which I received after (1) installing ```autoawq``` and (2) …

BBC-Esq updated 1 day ago

mit-han-lab/nunchaku #27

A100 RuntimeError: CUDA error: no kernel image is available …

When run example.py, hit RuntimeError: CUDA error: no kernel image is available for execution on the device (at /home/ubuntu/nunchaku/src/kernels/awq/gemv_awq.cu:311) I'm on Lambda A100 GPU insta…

marvin-0042 updated 1 week ago

intel/intel-extension-for-pytorch #736

How to enable support for AWQ ?

### Describe the issue I am trying to enable AWQ support with IPEX repo in CPU. IPEX 2.5.0 ⁠[release](https://github.com/intel/intel-extension-for-pytorch/releases) states that it has the supp…

Pradeepa99 updated 6 hours ago

sgl-project/sglang #1964

[Feature] Is AWQ W4Afp8 supported?

### Checklist - [x] 1. If the issue you raised is not a feature but a question, please raise a discussion at https://github.com/sgl-project/sglang/discussions/new/choose Otherwise, it will be closed.…

vkc1vk updated 2 weeks ago

casper-hansen/AutoAWQ_kernels #34

awq_ext is not defined

Hello The module `awq_ext` can't be load: ![image](https://github.com/user-attachments/assets/449d25dd-064a-4220-8529-2d237d23e0d3) I'm using *CUDA 11.8*, it's work with: `transformers`, `uns…

Yarflam updated 1 month ago

NVIDIA/TensorRT-LLM #652

AWQ performs worse than llm-awq

If I am not mistaken, the awq implemented in ammo uses a default alpha_step = 0.1 to search the parameter. However, the model quantized by ammo have a larger performance reduction than [AWQ](https://g…

spongezz updated 1 week ago

NVIDIA/TensorRT-LLM #2445

Build Qwen2-72B-Instruct model by INT4-AWQ quantization fail…

### System Info Ubuntu 20.04 NVIDIA A100 nvcr.io/nvidia/tritonserver:24.10-trtllm-python-py3 and 24.07 TensorRT-LLM v0.14.0 and v0.11.0 ### Who can help? @Tracin ### Information - [x] The offici…

wangpeilin updated 9 hours ago

QwenLM/Qwen2-VL #532

is there a way to reduce vRAM usage & speed up vLLM inferenc…

Hi, I tried both the qwen2-vl-7b bf16 & awq and honestly I'm not seeing any speed improvement. the awq is ~6GB however after running in vLLM it ends up taking the same space in vRAM eventually (~22G…

mehamednews updated 6 days ago

NVIDIA/TensorRT-LLM #2487

int4 not faster than fp16 and fp8

### System Info x86_64, Debian 11, L4 GPU ### Who can help? _No response_ ### Information - [ ] The official example scripts - [ ] My own modified scripts ### Tasks - [ ] An officially supporte…

ShuaiShao93 updated 1 day ago

1000+ results for awq

1000+ results
for awq