ModelTC / llmc

[EMNLP 2024 Industry Track] This is the official PyTorch implementation of "LLMC: Benchmarking Large Language Model Quantization with a Versatile Compression Toolkit".
https://arxiv.org/abs/2405.06001
Apache License 2.0
321 stars 34 forks source link

使用wanda对Llama-2-7b-hf模型进行压缩,为什么模型大小没有变化? #30

Closed ChengShuting closed 2 months ago

ChengShuting commented 3 months ago

config: base: seed: &seed 42 model: type: Llama path: /data/Llama-2-7b-hf torch_dtype: auto calib: name: pileval download: True path: calib data path n_samples: 128 bs: -1 seq_len: 512 preproc: general seed: *seed eval: eval_pos: [transformed] name: [wikitext2] download: True path: eval data path bs: 1 seq_len: 2048 sparse: method: Wanda weight: sparsity: 0.5 sparsity_out: True save: save_fp: False save_trans: True save_path: ./save3 原模型: image 压缩后: image

Harahan commented 3 months ago

Wanda is an unstructured pruning strategy, which currently brings only theoretical storage savings. You can try the quantization algorithms with save_lightllm and utilize our Lightllm for inference.