Modify exllama to load unrenamed gptq quantized models

0cc4m / KoboldAI

GNU Affero General Public License v3.0

150 stars 31 forks source link

Modify exllama to load unrenamed gptq quantized models #66

Closed pi6am closed 1 year ago

pi6am commented 1 year ago

Read config.json and enable exllama loading if the model has a quantization_config with quant_methdod of gptq. Note that this implementation is limited and only supports model.safetensors. That said, this supports loading popular gptq quantized models without renaming or symlinking the model file.