issues
search
OpenGVLab
/
OmniQuant
[ICLR2024 spotlight] OmniQuant is a simple and powerful quantization technique for LLMs.
MIT License
619
stars
48
forks
source link
issues
Newest
Newest
Most commented
Recently updated
Oldest
Least commented
Least recently updated
question about let
#85
mxjmtxrm
opened
23 hours ago
0
[Model Request] MiniCPM
#84
RanchiZhao
opened
6 days ago
0
The llama-1-65b model seems unstable in this code
#83
Xingrun-Xing
opened
2 weeks ago
2
Questions about quantization
#82
mxjmtxrm
closed
4 weeks ago
0
Questions about quantization
#81
mxjmtxrm
opened
4 weeks ago
0
How to accelerate the inference speed with real_quant
#80
j2kim99
opened
1 month ago
3
Which bug do you fix for auto_gptq
#79
BaohaoLiao
opened
1 month ago
1
Some questions about the results of weight only quantification in the paper
#78
everloom
closed
2 months ago
0
Questions regarding Infusing Omniquant into MLC
#77
BuildBackBuehler
opened
2 months ago
3
OPT-30B
#76
Arthur-Ling
opened
2 months ago
0
Llama-3-8B
#75
hsb1995
opened
2 months ago
4
Is activation get quantized on-the-fly?
#74
XA23i
closed
2 months ago
5
Why is the compressed file one file instead of the pre trained weights, where there are many files for training the mode
#73
hsb1995
opened
2 months ago
1
TypeError: FalconRotaryEmbedding.forward() missing 1 required positional argument: position_ids
#72
luchangli03
opened
2 months ago
0
AttributeError: 'FalconAttention' object has no attribute 'maybe_rotary'
#71
luchangli03
opened
2 months ago
1
W4A4 in llama2-7b
#70
chenzx921020
closed
2 months ago
4
When reproducing evaluation results for Llama-2-13b w4a4, I got nan
#69
NewDriverLee
closed
1 week ago
4
KeyError: 'Cache only has 0 layers, attempted to access layer with index 0'
#68
zfstr
closed
2 months ago
2
Other Task
#67
hsb1995
opened
3 months ago
1
seq_len is deprecated and unused in transformers>=4.38.0
#66
Lokshaw-Chau
closed
2 months ago
1
Checksums didn't match for dataset source files
#65
hsb1995
closed
2 months ago
7
RuntimeError: Trying to backward through the graph a second time (or directly access saved tensors after they have already been freed).
#64
zkf331
closed
4 months ago
3
OPT Model Reproduction Discrepancies
#63
fantasysee
closed
4 months ago
2
CUDA extension not installed
#62
Arthur-Ling
closed
2 months ago
2
Difference between fake quant and real quant
#61
YihengBrianWu
closed
2 months ago
1
reproduce evaluation results
#60
oujieww
opened
5 months ago
9
How to properly evaluate W6A6 models using checkpoint from the mode zoo
#59
ChengZhang-98
opened
5 months ago
2
[WIP][quantize] add gptq post-quantization
#58
xingchensong
opened
5 months ago
0
AutoGPTQ or AutoGPTQ-bugfix?
#57
Alvant
opened
5 months ago
7
[quantizer] add Odyssey-style symmetric quantization
#56
xingchensong
closed
5 months ago
2
License
#55
fakerybakery
closed
5 months ago
2
[datautils] fix c4 dataset
#54
xingchensong
closed
5 months ago
2
The ckpt of Quantized OPT model is not be found
#53
liuxy1103
opened
5 months ago
6
Quantize Llama-2-Chat Models with Weights and Activation-Quantization
#52
DRXD1000
closed
5 months ago
2
[Llama-2-7B-chat] ppl of w4a8 is nan
#51
xingchensong
closed
6 months ago
4
How to use AutoGPTQ to achieve real quantization?
#50
AboveParadise
closed
4 months ago
3
Bugfix/attention mask and implementation
#49
Alvant
closed
6 months ago
1
[fix] 'QuantLlamaDecoderLayer' object has no attribute 'model_attn'
#48
xingchensong
closed
6 months ago
1
[fix] attention_mask may appear None for newer versions of LLaMA
#47
xingchensong
closed
6 months ago
1
attention_mask may appear None for newer versions of LLaMA?
#46
Alvant
closed
6 months ago
3
[Model Request] upstage/SOLAR-10.7B-v1.0
#45
joseph777111
closed
3 months ago
1
TypeError: QuantLlamaDecoderLayer.forward() got an unexpected keyword argument 'padding_mask'
#44
xianwujie
closed
6 months ago
1
Fix GPU memory leak in training loop
#43
mutichung
closed
6 months ago
1
Update omniquant.py
#42
brisker
closed
6 months ago
4
general question about LLM kv-cache quantization
#41
brisker
closed
6 months ago
1
[Model Request] Mixtral-8x7B-v0.1
#40
joseph777111
closed
6 months ago
3
AttributeError: 'Attention' object has no attribute 'W_pack'
#39
yrf200112
opened
7 months ago
0
potential bug about matmul quantization process?
#38
brisker
closed
4 months ago
1
Quantize LLAMA-2-7b-chat to W4A4
#37
nmyuchen
opened
7 months ago
4
Update omniquant.py
#36
brisker
closed
7 months ago
1
Next