AutoGPTQ AutoGPTQ issues

AutoGPTQ / AutoGPTQ

An easy-to-use LLMs quantization package with user-friendly apis, based on GPTQ algorithm.

MIT License

4.05k stars 416 forks source link

issues

Newest

Newest Most commented Recently updated Oldest Least commented Least recently updated

[FEATURE] Why del new_example["labels"]

#703 RanchiZhao opened 1 hour ago
0
Can't get my CUDA_VERSION after I set CUDA_VERSION environment variable

#702 LinghuC2333 opened 1 day ago
0
Fix upstream regression when there's no HPU device

#701 HolyFalafel closed 1 day ago
0
Add support for Gemma2 models.

#700 markoarnauto opened 3 days ago
0
Buffers in Marlin setting

#699 yaldashbz closed 1 day ago
0
How to gather all quantized weights after quantization with AutoGPTQ?

#698 yaldashbz closed 3 days ago
0
[FEATURE] pass in attention mask and input ids for calibration dataset on huggingface's GPTQconfig

#697 RanchiZhao opened 4 days ago
1
The inference speed is very slow after the model is quantized.

#696 chenyunsai opened 5 days ago
1
CUDA extension not installed

#694 yaldashbz opened 1 week ago
3
Short model cls names

#693 Qubitium closed 1 week ago
0
[BUG] do not install auto-gpt for 910B in aarch

#692 luoan7248 opened 1 week ago
0
Cleanup

#691 Qubitium closed 1 week ago
0
Supporting uint4 inference of pre-quantized models in HPU

#689 HolyFalafel closed 6 days ago
4
Req triton exllama

#688 Qubitium closed 2 weeks ago
0
V3 normalize models

#687 Qubitium closed 2 weeks ago
0
[BUG]Replace `"python"` with `sys.executable` in setup.py

#686 AnirudhRahul opened 2 weeks ago
0
[Issue] wheel package for CUDA 12.1

#685 sudhanshu746 opened 3 weeks ago
0
[FEATURE] ChatGLM Support Added

#684 Qubitium closed 3 days ago
1
ADD ChatGLM model support

#683 Qubitium closed 2 weeks ago
4
add the support of the openbmb/minicpm

#682 LDLINGLINGLING opened 3 weeks ago
5
[BUG]

#681 yuyu990116 opened 3 weeks ago
0
[BUG] Not able to install on Ubuntu 22.04 (subprocess-exited-with-error )

#680 mishraaditya595 opened 1 month ago
2
How to get a dequantized model?

#679 mxjmtxrm opened 1 month ago
0
How to install auto-gptq in GCC 8.5.0 environment?

#678 StephenSX66 closed 4 weeks ago
0
[BUG] Quantitative model Yi-1.5-9b-16K does not produce text output.

#677 maxin9966 opened 1 month ago
1
added 5,6,7 bit quantization support

#676 thoorpukarnakar opened 1 month ago
2
[FEATURE] Added code support to 5,6,7 bits quantization can you please add me as contributor I will create a new pull request

#675 thoorpukarnakar opened 1 month ago
4
Question about data shape difference between quantization and forward

#674 sleepwalker2017 opened 1 month ago
0
How to select between different kernels?

#673 sleepwalker2017 opened 1 month ago
0
[FEATURE] Add marlin24 support

#672 Qubitium opened 1 month ago
0
[FEATURE] Models that support MOE do GPTQ

#671 CallmeZhangChenchen closed 1 month ago
0
[BUG] Following the quant_with_alpaca.py example but keep getting "You shouldn't move a model that is dispatched using accelerate hooks." and the model is never saved.

#670 murtaza-nasir opened 1 month ago
2
[BUG] Cannot install from source

#669 victoryeo opened 1 month ago
0
Fix transformers 4.38.0 seq_len

#668 randoentity opened 1 month ago
0
Target modules [] not found in the base model. Please check the target modules and try again.

#667 RicardoHalak opened 1 month ago
0
[BUG] ROCm installation and building broken

#666 xangelix opened 1 month ago
0
[BUG] ARM installation error

#665 DavidePaglieri opened 1 month ago
0
[FEATURE] ADD SUPPORT DeepSeek-V2

#664 Xu-Chen closed 1 month ago
1
[Question] Differences in quantization logic compared to AWQ

#663 wenhuach21 opened 1 month ago
0
[FEATURE] Support BitBLAS Backend for QuantLinear

#662 LeiWang1999 opened 2 months ago
6
[BUG]safetensors_rust.SafetensorError: Error while deserializing header: MetadataIncompleteBuffer

#661 chuangzhidan closed 3 days ago
1
Support QBits kernel for CPU device

#660 PenghuiCheng opened 2 months ago
4
[BUG/DEPRECATION] Remove fused attention/mlp

#659 Qubitium closed 2 weeks ago
2
[DEPRECATION] Remove triton v1

#658 Qubitium closed 2 weeks ago
0
Llama-3 8B Instruct quantized to 8 Bit spits out gibberish in transformers `model.generate()` but works fine in vLLM?

#657 davidgxue opened 2 months ago
6
[USABILITY] Warn users if quantization using insufficient nsamples

#656 Qubitium closed 2 weeks ago
0
[DEPRECATION] Discussion on Fused attention and QiGEN

#655 Qubitium opened 2 months ago
5
[BUG] Fix H100 crash/compat with Marlin

#654 Qubitium closed 4 days ago
1
[FEATURE] Backport vllm expanded Marlin kernel to autogptq.

#653 Qubitium opened 2 months ago
1
[PR Ready for Review] [FEATURE] Extend Support for Phi-3

#652 davidgxue opened 2 months ago
0