awq Search Results - Githubissues

1000+ results
for awq

Best match

Best match Most commented Newest Recently updated Least commented Oldest Least recently updated

NVIDIA/TensorRT-LLM #2330

Why AWQ 4bit takes more ram than we expected?

I have a model gemma 2 9B. I quantized this with AWQ-4bit. Size of model is 5.9GB. I set the kv_cache_free_gpu_mem_fraction to 0.01 and run triton on one A100. But triton takes 10748MiB of ram. I expe…

Alireza3242 updated 1 week ago
5
h2oai/h2ogpt #1851

Clean install, nothing boots when i do 'python generate.py'

When i try to run the start script i get this error message ``` (h2ogpt) C:\Users\domin\Documents\aiGen\h2ogpt>python generate.py Fontconfig error: Cannot load default config file: No such file: …

Domsmasher1 updated 1 month ago
3
NVIDIA/TensorRT-LLM #713

AssertionError: The value updated is not the same shape as t…

I used awq to build the codellama-13b quantized npz model file to tensorrt format, but encountered this error. My command was as follows: python build.py --model_dir /app/models/CodeLlama-13b-hf/ \…

shatealaboxiaowang updated 1 week ago
7
ray-project/kuberay #2354

[Bug] Unable to launch vLLM with llama3.1:70B

### Search before asking - [X] I searched the [issues](https://github.com/ray-project/kuberay/issues) and found no similar issues. ### KubeRay Component ray-operator, apiserver ### What happened …

jradikk updated 1 week ago
20
microsoft/onnxruntime-genai #1071

[C#] Regression on 0.5.0 with DML

**Describe the bug** C# Version 0.5.0 broke DML models, such as microsoft--Phi-3-mini-4k-instruct-onnx directml-int4-awq-block-128. The model loads, but the Generator's constructor throws an Access vi…

azchohfi updated 3 days ago
14
InternLM/lmdeploy #2422

[Docs] AWQ / GPTQ 部分

### 📚 The doc issue 文档里面提到打开 search-scale 和 batch-size 可以提高精度，想问一下打开和默认关闭 search-scale 是有什么区别呢，我看了一下代码，我的理解是 search-scale 使用了 grid-search 类似论文中的 AWQ，而默认关闭是走的是 SmoothQuant 么，还是减去了网格搜索的过程，默认 scale = 0.…

Skyseaee updated 2 months ago
1
vllm-project/vllm #10686

[Bug]: v0.6.4.post1 Qwen2-VL-7B-Instruct-AWQ crash：shape mis…

### Your current environment The output of `python collect_env.py` ```text Collecting environment information... PyTorch version: 2.5.1+cu124 Is debug build: False CUDA used to build PyTorch…

wciq1208 updated 3 days ago
2
OpenBMB/MiniCPM-V #643

[BUG] <title> Inference error. Replacing the LLM part with L…

### 是否已有关于该错误的issue或讨论？ | Is there an existing issue / discussion for this? - [X] 我已经搜索过已有的issues和讨论 | I have searched the existing issues / discussions ### 该问题是否在FAQ中有解答？ | Is there an existing ans…

CCRss updated 1 week ago
7
microsoft/onnxruntime-genai #1089

awq example runs into error with llama 3.2 3b due to embeddi…

**Describe the bug** When I run the example from examples/python/awq-quantized-model.md, but switching out phi-3 for llama-3.2-3b, I get an error message stating that `AttributeError: 'NoneType' objec…

tranlm updated 1 week ago
2
NVIDIA/TensorRT-LLM #1693

Can int4_awq and w4a8_awq support deepseek?

### System Info CPU x86_64 GPU NVIDIA L20 TensorRT branch: v0.8.0 CUDA: NVIDIA-SMI 535.154.05 Driver Version: 535.154.05 CUDA Version: 12.3 ### Who can help? @Tracin ### Information - [X…

activezhao updated 4 months ago
11

上一页 1...4 5 6 7 8 9 10...100 下一页

1000+ results for awq

1000+ results
for awq