Hsu1023 / DuQuant

[NeurIPS 2024 OralπŸ”₯] DuQuant: Distributing Outliers via Dual Transformation Makes Stronger Quantized LLMs.
https://arxiv.org/abs/2406.01721
MIT License
93 stars 4 forks source link
large-language-models llm post-training-quantization quantization

DuQuant: Distributing Outliers via Dual Transformation Makes Stronger Quantized LLMs

[![arXiv](https://img.shields.io/badge/DuQuant-2406.01721-b31b1b.svg?logo=arXiv)](https://arxiv.org/abs/2406.01721) [![Website](https://img.shields.io/badge/🎀%20Project-Website-blue)](https://duquant.github.io) [![License](https://img.shields.io/badge/βš–οΈ%20Code%20License-MIT-yellow)](https://github.com/Hsu1023/DuQuant/blob/main/LICENSE)

Welcome to the official code repository for "DuQuant: Distributing Outliers via Dual Transformation Makes Stronger Quantized LLMs (NeurIPS 2024, Oral)".

πŸ” For more details, please refer to the project page: https://duquant.github.io/.

πŸ“° News

πŸ‘€ Introduction

duquant

πŸ”§ Installation

conda create -n duquant python=3.10 -y
conda activate duquant
git clone https://github.com/Hsu1023/DuQuant.git
pip install --upgrade pip 
pip install -r requirements.txt

βš™οΈ Usage

1. Preprocessing

python get_rot.py # need to be run only once for all models
python generate_act_scale_shift.py --model PATH_OF_MODEL # need to be run only once for each model (path can be hugging-face hub path or relative path)

2. Quantization

The bash script for DuQuant can be found in run.sh. You can choose the model to be quantized by providing model path after --model order. To evaluate DuQuant + lwc method, you can run run_lwc.sh script. In addition, you can add --save_dir to save the quantized models, and use --resume to reload the saved models.

Explanation of arguments:

3. Model Zoo

Currently, we support LLaMA series (LLaMA 1, 2 and 3), Vicuna series, and Mistral models.

Models 7B/8B 13B 30B 65B/70B
LLaMA1 βœ… βœ… βœ… βœ…
LLaMA2 βœ… βœ… --- βœ…
LLaMA3 βœ… --- --- βœ…
Vicuna-v1.5 βœ… βœ… --- ---
Mistral βœ… --- --- ---

πŸ“œ Result

πŸ“‚ Contact

For immediate queries or further information, please open an issue or contact xuhb20@mails.tsinghua.edu.cn or haokun.lin@cripac.ia.ac.cn.

πŸ™ Acknowledgement

This repo is built upon the following projects:

We thank the authors for their code.

πŸ“ Citation

We kindly request that you cite our work if you utilize the code or reference our findings in your research:


@article{lin2024duquant,
  title={DuQuant: Distributing Outliers via Dual Transformation Makes Stronger Quantized LLMs},
  author={Lin, Haokun and Xu, Haobo and Wu, Yichen and Cui, Jingzhi and Zhang, Yingtao and Mou, Linzhan and Song, Linqi and Sun, Zhenan and Wei, Ying},
  journal={arXiv preprint arXiv:2406.01721},
  year={2024}
}