auto-quant Search Results

1000+ results
for auto-quant

Best match

Best match Most commented Newest Recently updated Least commented Oldest Least recently updated

vllm-project/vllm #7517

[Bug]: AutoAWQ marlin methods error

### Your current environment vllm 0.5.4 ### 🐛 Describe the bug autoawq marlin must with no zero point， but vllm： ```python def query_marlin_supported_quant_types(has_zp: bool, …

MichoChan updated 1 month ago
6
vllm-project/vllm #2339

awq compression of llama 2 70b chat got bad result

I use awq to quantize llama 2 70b-chat by: ``` CUDA_VISIBLE_DEVICES="1,2,3,4,5,6,7" python quantize_llama.py ``` the codes of quantize_llama.py： ``` from awq import AutoAWQForCausalLM from tr…

fancyerii updated 2 months ago
3
NVIDIA/TensorRT-LLM #2158

KeyError: 'llava_llama'

Hi TensorRT-LLM team, Your work is incredible. By following the READme file for [multi-modeling](https://github.com/NVIDIA/TensorRT-LLM/blob/main/examples/multimodal/README.md), we were sucess to run…

tiend1 updated 4 days ago
3
OpenNMT/CTranslate2 #1776

How to use 4-bit AWQ?

In reviewing the updated ```docs``` I notice a few things that prompted some questions... 1) Neither AWQ/Int-4/```int32_float16``` are mentioned in the "Quantize on model conversion" nor "Quantize…

BBC-Esq updated 1 week ago
9
vllm-project/vllm #4744

[Usage]: Vllm AutoAWQ with 4-GPU doesnt utilize GPU

### Your current environment ... ### How would you like to use vllm I have downloaded a model. Now on my 4 GPU instance I attempt to quantize it using AutoAWQ. Whenever I run the script below I ge…

danielstankw updated 3 months ago
1
huggingface/optimum #1742

Mixtral-8x7B-Instruct-v0.1-GPTQ AssertionError

### System Info ```shell Name: optimum Version: 1.18.0.dev0 Name: transformers Version: 4.36.0 Name: auto-gptq Version: 0.6.0.dev0+cu118 CUDA Version: 11.8 Python 3.8.17 ``` ### Who can help…

paolovic updated 1 month ago
7
casper-hansen/AutoAWQ #479

Vllm AutoAWQ with 4-GPU doesnt utilize GPU

I have downloaded a model. Now on my 4 GPU instance I attempt to quantize it using AutoAWQ. Whenever I run the script below I get 0% GPU utilization. Can anyone assist why can this be happening? …

danielstankw updated 4 months ago
1
LlamaFamily/Llama-Chinese #75

AutoGPTQForCausalLM.from_quantized 加载官方4bit量化模型报错：NameError:…

AutoGPTQForCausalLM.from_quantized 加载官方4bit量化模型（[Llama2-Chinese-13b-Chat-4bit](https://huggingface.co/FlagAlpha/Llama2-Chinese-13b-Chat-4bit/tree/main)）报错：NameError: name 'autogptq_cuda_256' is not de…

gpww updated 1 year ago
1
bitsandbytes-foundation/bitsandbytes #1210

AttributeError: 'NoneType' object has no attribute 'cquantiz…

### System Info I am using a Tesla T4 16 gb ### Reproduction import torch from transformers import AutoTokenizer, AutoModelForCausalLM, BitsAndBytesConfig base_model_id = "mistralai/Mistral-7B-…

wissamee updated 1 month ago
9
MeetKai/functionary #226

medium based on llama 3.1

would be really nice to have a functionary version of llama 3.1 70b/8b!

themrzmaster updated 6 days ago
8

上一页 1...1 2 3 4 5 6 7...100 下一页

1000+ results for auto-quant

1000+ results
for auto-quant