CASIA-IVA-Lab / FLAP

[AAAI 2024] Fluctuation-based Adaptive Structured Pruning for Large Language Models
https://arxiv.org/abs/2312.11983
Apache License 2.0
30 stars 4 forks source link
aaai-2024 compression llama llm pruning pruning-algorithms structured-pruning vicuna


FLAP

[AAAI 2024] Fluctuation-based Adaptive Structured Pruning for Large Language Models

image

Introduction

Fluctuation-based Adaptive Structured Pruning for Large Language Models [arXiv]
Yongqi An, Xu Zhao, Tao yu, Ming Tang, Jinqiao Wang
Institute of Automation, Chinese Academy of Sciences

Why FLAP:

Supported LLMs:

Table of Contents

Quick Start

Installation

Installation instructions can be found in INSTALL.md.

Minimal Example

bash script/llama_7b.sh $GPU_ID

This script would compress the LLaMA-7B model with ~20\% parameters pruned by FLAP. All the pre-trained models and the dataset would be automatically downloaded, so you do not need to manually download the resource. When running this script for the first time, it will require some time to download the model and the dataset.

Configuration Instruction

Pruning

LLaMA-7B pruning with ~20% parameters pruned:

python main.py \
    --model decapoda-research/llama-7b-hf \
    --prune_method flap \
    --pruning_ratio 0.2 \
    --remove_heads -1 \
    --metrics WIFV \
    --structure AL-AM \
    --nsamples 1024 \
    --save_model "llm_weights/flap_p0.2_WIFV_ALAM_llama_7b/" \
    --eval \

Arguments:

After pruning and post-training, we follow lm-evaluation-harness for evaluation.

Language Modeling Evaluation

A brief quantitative language modeling performance for LLaMA-family:


Zero-shot Evaluation

A brief quantitative zero-shot performance results for LLaMA-7B:


More results can be found in the paper.

Acknowledgement

Citation

If you find this project useful, please cite

@misc{an2023fluctuationbased,
      title={Fluctuation-based Adaptive Structured Pruning for Large Language Models}, 
      author={Yongqi An and Xu Zhao and Tao Yu and Ming Tang and Jinqiao Wang},
      year={2023},
      eprint={2312.11983},
      archivePrefix={arXiv},
      primaryClass={cs.CL}
}