support use_cpu_to_save_cuda_mem_for_catcher for vlm quantization

ModelTC / llmc

[EMNLP 2024 Industry Track] This is the official PyTorch implementation of "LLMC: Benchmarking Large Language Model Quantization with a Versatile Compression Toolkit".

https://arxiv.org/abs/2405.06001

Apache License 2.0

326 stars 34 forks source link

support use_cpu_to_save_cuda_mem_for_catcher for vlm quantization #224

Closed helloyongyang closed 1 day ago