issues
search
AutoGPTQ
/
AutoGPTQ
An easy-to-use LLMs quantization package with user-friendly apis, based on GPTQ algorithm.
MIT License
4.05k
stars
416
forks
source link
issues
Newest
Newest
Most commented
Recently updated
Oldest
Least commented
Least recently updated
[FEATURE] Why del new_example["labels"]
#703
RanchiZhao
opened
1 hour ago
0
Can't get my CUDA_VERSION after I set CUDA_VERSION environment variable
#702
LinghuC2333
opened
1 day ago
0
Fix upstream regression when there's no HPU device
#701
HolyFalafel
closed
1 day ago
0
Add support for Gemma2 models.
#700
markoarnauto
opened
3 days ago
0
Buffers in Marlin setting
#699
yaldashbz
closed
1 day ago
0
How to gather all quantized weights after quantization with AutoGPTQ?
#698
yaldashbz
closed
3 days ago
0
[FEATURE] pass in attention mask and input ids for calibration dataset on huggingface's GPTQconfig
#697
RanchiZhao
opened
4 days ago
1
The inference speed is very slow after the model is quantized.
#696
chenyunsai
opened
5 days ago
1
CUDA extension not installed
#694
yaldashbz
opened
1 week ago
3
Short model cls names
#693
Qubitium
closed
1 week ago
0
[BUG] do not install auto-gpt for 910B in aarch
#692
luoan7248
opened
1 week ago
0
Cleanup
#691
Qubitium
closed
1 week ago
0
Supporting uint4 inference of pre-quantized models in HPU
#689
HolyFalafel
closed
6 days ago
4
Req triton exllama
#688
Qubitium
closed
2 weeks ago
0
V3 normalize models
#687
Qubitium
closed
2 weeks ago
0
[BUG]Replace `"python"` with `sys.executable` in setup.py
#686
AnirudhRahul
opened
2 weeks ago
0
[Issue] wheel package for CUDA 12.1
#685
sudhanshu746
opened
3 weeks ago
0
[FEATURE] ChatGLM Support Added
#684
Qubitium
closed
3 days ago
1
ADD ChatGLM model support
#683
Qubitium
closed
2 weeks ago
4
add the support of the openbmb/minicpm
#682
LDLINGLINGLING
opened
3 weeks ago
5
[BUG]
#681
yuyu990116
opened
3 weeks ago
0
[BUG] Not able to install on Ubuntu 22.04 (subprocess-exited-with-error )
#680
mishraaditya595
opened
1 month ago
2
How to get a dequantized model?
#679
mxjmtxrm
opened
1 month ago
0
How to install auto-gptq in GCC 8.5.0 environment?
#678
StephenSX66
closed
4 weeks ago
0
[BUG] Quantitative model Yi-1.5-9b-16K does not produce text output.
#677
maxin9966
opened
1 month ago
1
added 5,6,7 bit quantization support
#676
thoorpukarnakar
opened
1 month ago
2
[FEATURE] Added code support to 5,6,7 bits quantization can you please add me as contributor I will create a new pull request
#675
thoorpukarnakar
opened
1 month ago
4
Question about data shape difference between quantization and forward
#674
sleepwalker2017
opened
1 month ago
0
How to select between different kernels?
#673
sleepwalker2017
opened
1 month ago
0
[FEATURE] Add marlin24 support
#672
Qubitium
opened
1 month ago
0
[FEATURE] Models that support MOE do GPTQ
#671
CallmeZhangChenchen
closed
1 month ago
0
[BUG] Following the quant_with_alpaca.py example but keep getting "You shouldn't move a model that is dispatched using accelerate hooks." and the model is never saved.
#670
murtaza-nasir
opened
1 month ago
2
[BUG] Cannot install from source
#669
victoryeo
opened
1 month ago
0
Fix transformers 4.38.0 seq_len
#668
randoentity
opened
1 month ago
0
Target modules [] not found in the base model. Please check the target modules and try again.
#667
RicardoHalak
opened
1 month ago
0
[BUG] ROCm installation and building broken
#666
xangelix
opened
1 month ago
0
[BUG] ARM installation error
#665
DavidePaglieri
opened
1 month ago
0
[FEATURE] ADD SUPPORT DeepSeek-V2
#664
Xu-Chen
closed
1 month ago
1
[Question] Differences in quantization logic compared to AWQ
#663
wenhuach21
opened
1 month ago
0
[FEATURE] Support BitBLAS Backend for QuantLinear
#662
LeiWang1999
opened
2 months ago
6
[BUG]safetensors_rust.SafetensorError: Error while deserializing header: MetadataIncompleteBuffer
#661
chuangzhidan
closed
3 days ago
1
Support QBits kernel for CPU device
#660
PenghuiCheng
opened
2 months ago
4
[BUG/DEPRECATION] Remove fused attention/mlp
#659
Qubitium
closed
2 weeks ago
2
[DEPRECATION] Remove triton v1
#658
Qubitium
closed
2 weeks ago
0
Llama-3 8B Instruct quantized to 8 Bit spits out gibberish in transformers `model.generate()` but works fine in vLLM?
#657
davidgxue
opened
2 months ago
6
[USABILITY] Warn users if quantization using insufficient nsamples
#656
Qubitium
closed
2 weeks ago
0
[DEPRECATION] Discussion on Fused attention and QiGEN
#655
Qubitium
opened
2 months ago
5
[BUG] Fix H100 crash/compat with Marlin
#654
Qubitium
closed
4 days ago
1
[FEATURE] Backport vllm expanded Marlin kernel to autogptq.
#653
Qubitium
opened
2 months ago
1
[PR Ready for Review] [FEATURE] Extend Support for Phi-3
#652
davidgxue
opened
2 months ago
0
Next