OpenGVLab / OmniQuant

[ICLR2024 spotlight] OmniQuant is a simple and powerful quantization technique for LLMs.
MIT License
663 stars 50 forks source link

Model File Formats: .pth, .bin vs. GGUF #20

Open sebvannistel opened 11 months ago

sebvannistel commented 11 months ago

Hello,

I've been exploring the OmniQuant repository and am impressed with the quantization techniques provided for Large Language Models (LLMs). I noticed that the pre-trained models are available in .pth and .bin file formats from huggingface

I was wondering why these models are not available in the GGUF format, which is considered more efficient for handling large models. Is there a specific reason for this choice of file formats? Am I missing something here?

I am sure there is a reason for that I am probably just missing something.