Vahe1994 / AQLM

Official Pytorch repository for Extreme Compression of Large Language Models via Additive Quantization https://arxiv.org/pdf/2401.06118.pdf and PV-Tuning: Beyond Straight-Through Estimation for Extreme LLM Compression https://arxiv.org/abs/2405.14852
Apache License 2.0
1.17k stars 176 forks source link

How model_seqlen affects quantization quality #46

Closed VirtualRoyalty closed 6 months ago

VirtualRoyalty commented 8 months ago

Hi! Thanks for such a useful tool! I have a question about model_seqlen:

As I can see default value in main.py is 4096. What if I'll use a smaller values e.g. 1024 when quantizing MoE mixtral model? Will it affect the quality of quantized model? Or quality on greater than 1024 contexts? Will it significantly speedup process of quantization?

Thanks in advance!

    parser.add_argument(
        "--model_seqlen",
        type=int,
        default=4096,
        help="Model seqlen and calibration data context length.",
    )
BlackSamorez commented 8 months ago

Hi! It is recommended to use the seq_len the model you're quantizing was trained on (4096 for Llama-2, 8192 for mistral/mixtral). To reduce the number of samples, speeding up computations, you should decrease --nsamples instead. However, it doesn't have that large impact on the quantization time anyway.

VirtualRoyalty commented 8 months ago

@BlackSamorez Thanks for the answer!

I am trying to quantize finetuned version of mixtral and I had no such long samples (8192) in the training set.

Then should I decrease max_epochs and finetune_max_epochs instead (in order to speedup the process)?

Godofnothing commented 8 months ago

@VirtualRoyalty you may try and see how shorter sequences affect the quality. When I was tuning Mixtral, i used 7k instead of 8k to fit into memory and this seems to work fine. However, 1k is much shorter than 8k, so I cannot say apriori, whether it matters much.

VirtualRoyalty commented 8 months ago

@Godofnothing Thanks, good point!

github-actions[bot] commented 7 months ago

This issue is stale because it has been open for 30 days with no activity.

github-actions[bot] commented 6 months ago

This issue was closed because it has been inactive for 14 days since being marked as stale.