OpenGVLab OmniQuant issues

OpenGVLab / OmniQuant

[ICLR2024 spotlight] OmniQuant is a simple and powerful quantization technique for LLMs.

MIT License

619 stars 48 forks source link

issues

Newest

Newest Most commented Recently updated Oldest Least commented Least recently updated

question about let

#85 mxjmtxrm opened 23 hours ago
0
[Model Request] MiniCPM

#84 RanchiZhao opened 6 days ago
0
The llama-1-65b model seems unstable in this code

#83 Xingrun-Xing opened 2 weeks ago
2
Questions about quantization

#82 mxjmtxrm closed 4 weeks ago
0
Questions about quantization

#81 mxjmtxrm opened 4 weeks ago
0
How to accelerate the inference speed with real_quant

#80 j2kim99 opened 1 month ago
3
Which bug do you fix for auto_gptq

#79 BaohaoLiao opened 1 month ago
1
Some questions about the results of weight only quantification in the paper

#78 everloom closed 2 months ago
0
Questions regarding Infusing Omniquant into MLC

#77 BuildBackBuehler opened 2 months ago
3
OPT-30B

#76 Arthur-Ling opened 2 months ago
0
Llama-3-8B

#75 hsb1995 opened 2 months ago
4
Is activation get quantized on-the-fly?

#74 XA23i closed 2 months ago
5
Why is the compressed file one file instead of the pre trained weights, where there are many files for training the mode

#73 hsb1995 opened 2 months ago
1
TypeError: FalconRotaryEmbedding.forward() missing 1 required positional argument: position_ids

#72 luchangli03 opened 2 months ago
0
AttributeError: 'FalconAttention' object has no attribute 'maybe_rotary'

#71 luchangli03 opened 2 months ago
1
W4A4 in llama2-7b

#70 chenzx921020 closed 2 months ago
4
When reproducing evaluation results for Llama-2-13b w4a4, I got nan

#69 NewDriverLee closed 1 week ago
4
KeyError: 'Cache only has 0 layers, attempted to access layer with index 0'

#68 zfstr closed 2 months ago
2
Other Task

#67 hsb1995 opened 3 months ago
1
seq_len is deprecated and unused in transformers>=4.38.0

#66 Lokshaw-Chau closed 2 months ago
1
Checksums didn't match for dataset source files

#65 hsb1995 closed 2 months ago
7
RuntimeError: Trying to backward through the graph a second time (or directly access saved tensors after they have already been freed).

#64 zkf331 closed 4 months ago
3
OPT Model Reproduction Discrepancies

#63 fantasysee closed 4 months ago
2
CUDA extension not installed

#62 Arthur-Ling closed 2 months ago
2
Difference between fake quant and real quant

#61 YihengBrianWu closed 2 months ago
1
reproduce evaluation results

#60 oujieww opened 5 months ago
9
How to properly evaluate W6A6 models using checkpoint from the mode zoo

#59 ChengZhang-98 opened 5 months ago
2
[WIP][quantize] add gptq post-quantization

#58 xingchensong opened 5 months ago
0
AutoGPTQ or AutoGPTQ-bugfix?

#57 Alvant opened 5 months ago
7
[quantizer] add Odyssey-style symmetric quantization

#56 xingchensong closed 5 months ago
2
License

#55 fakerybakery closed 5 months ago
2
[datautils] fix c4 dataset

#54 xingchensong closed 5 months ago
2
The ckpt of Quantized OPT model is not be found

#53 liuxy1103 opened 5 months ago
6
Quantize Llama-2-Chat Models with Weights and Activation-Quantization

#52 DRXD1000 closed 5 months ago
2
[Llama-2-7B-chat] ppl of w4a8 is nan

#51 xingchensong closed 6 months ago
4
How to use AutoGPTQ to achieve real quantization?

#50 AboveParadise closed 4 months ago
3
Bugfix/attention mask and implementation

#49 Alvant closed 6 months ago
1
[fix] 'QuantLlamaDecoderLayer' object has no attribute 'model_attn'

#48 xingchensong closed 6 months ago
1
[fix] attention_mask may appear None for newer versions of LLaMA

#47 xingchensong closed 6 months ago
1
attention_mask may appear None for newer versions of LLaMA?

#46 Alvant closed 6 months ago
3
[Model Request] upstage/SOLAR-10.7B-v1.0

#45 joseph777111 closed 3 months ago
1
TypeError: QuantLlamaDecoderLayer.forward() got an unexpected keyword argument 'padding_mask'

#44 xianwujie closed 6 months ago
1
Fix GPU memory leak in training loop

#43 mutichung closed 6 months ago
1
Update omniquant.py

#42 brisker closed 6 months ago
4
general question about LLM kv-cache quantization

#41 brisker closed 6 months ago
1
[Model Request] Mixtral-8x7B-v0.1

#40 joseph777111 closed 6 months ago
3
AttributeError: 'Attention' object has no attribute 'W_pack'

#39 yrf200112 opened 7 months ago
0
potential bug about matmul quantization process?

#38 brisker closed 4 months ago
1
Quantize LLAMA-2-7b-chat to W4A4

#37 nmyuchen opened 7 months ago
4
Update omniquant.py

#36 brisker closed 7 months ago
1