ModelTC / llmc

This is the official PyTorch implementation of "LLMC: Benchmarking Large Language Model Quantization with a Versatile Compression Toolkit".
https://arxiv.org/abs/2405.06001
Apache License 2.0
228 stars 25 forks source link

failed to save quantizationed model #97

Open LiMa-cas opened 3 hours ago

LiMa-cas commented 3 hours ago
save:
    save_trans: True
    save_lightllm : False
    save_fake: False
    save_path: /extra_data/mali36/llmc/models/

when I used the above config, I get a 16G model, when I used the following config, I get a 29G model, but the AWQ-model is 5G, could u help me ?

    save_trans: False
    save_lightllm : True
    save_fake: False
    save_path: /extra_data/mali36/llmc/models/
gushiqiao commented 2 hours ago

save_trans

save:
    save_trans: True
    save_lightllm : False
    save_fake: False
    save_path: /extra_data/mali36/llmc/models/

when I used the above config, I get a 16G model, when I used the following config, I get a 29G model, but the AWQ-model is 5G, could u help me ?

    save_trans: False
    save_lightllm : True
    save_fake: False
    save_path: /extra_data/mali36/llmc/models/

The model size saved by save_trans should be the same as the original model size. We are still troubleshooting save_lighllm, so you can first try save_vllm:True and use the vllm engine for inference. You can refer to this document: https://llmc-en.readthedocs.io/en/latest/backend/vllm.html