URL

https://arxiv.org/abs/2310.08659
Affiliations
- Yixiao Li, N/A
- Yifan Yu, N/A
- Chen Liang, N/A
- Pengcheng He, N/A
- Nikos Karampatziakis, N/A
- Weizhu Chen, N/A
- Tuo Zhao, N/A
  Abstract
- Quantization is an indispensable technique for serving Large Language Models (LLMs) and has recently found its way into LoRA fine-tuning. In this work we focus on the scenario where quantization and LoRA fine-tuning are applied together on a pre-trained model. In such cases it is common to observe a consistent gap in the performance on downstream tasks between full fine-tuning and quantization plus LoRA fine-tuning approach. In response, we propose LoftQ (LoRA-Fine-Tuning-aware Quantization), a novel quantization framework that simultaneously quantizes an LLM and finds a proper low-rank initialization for LoRA fine-tuning. Such an initialization alleviates the discrepancy between the quantized and full-precision model and significantly improves generalization in downstream tasks. We evaluate our method on natural language understanding, question answering, summarization, and natural language generation tasks. Experiments show that our method is highly effective and outperforms existing quantization methods, especially in the challenging 2-bit and 2/4-bit mixed precision regimes. The code is available on https://github.com/yxli2123/LoftQ.
  Translation (by gpt-4o-mini)
量子化は、大規模言語モデル（LLMs）を提供するための不可欠な技術であり、最近ではLoRAファインチューニングにも応用されるようになっています。本研究では、量子化とLoRAファインチューニングが事前学習済みモデルに同時に適用されるシナリオに焦点を当てます。このような場合、フルファインチューニングと量子化およびLoRAファインチューニングアプローチの間で、下流タスクにおけるパフォーマンスに一貫したギャップが観察されることが一般的です。これに対処するために、LoftQ（LoRAファインチューニングを考慮した量子化）という新しい量子化フレームワークを提案します。このフレームワークは、LLMを同時に量子化し、LoRAファインチューニングのための適切な低ランク初期化を見つけます。この初期化は、量子化されたモデルとフル精度モデルとの間の不一致を軽減し、下流タスクにおける一般化を大幅に改善します。私たちの手法を自然言語理解、質問応答、要約、自然言語生成タスクで評価しました。実験の結果、私たちの手法は非常に効果的であり、特に難易度の高い2ビットおよび2/4ビット混合精度の条件下で既存の量子化手法を上回ることが示されました。コードはhttps://github.com/yxli2123/LoftQで入手可能です。
Summary (by gpt-4o-mini)
LoftQという新しい量子化フレームワークを提案し、LLMにおける量子化とLoRAファインチューニングを同時に適用。これにより、量子化モデルとフル精度モデルの不一致を軽減し、下流タスクの一般化を改善。自然言語理解や質問応答などのタスクで、特に難易度の高い条件下で既存手法を上回る性能を示した。

AkihikoWatanabe / paper_notes

LoftQ: LoRA-Fine-Tuning-Aware Quantization for Large Language Models, Yixiao Li+, N/A, arXiv'23 #1407

URL

Affiliations

Abstract

Translation (by gpt-4o-mini)

Summary (by gpt-4o-mini)