Read config.json and enable exllama loading if the model has a quantization_config with quant_methdod of gptq. Note that this implementation is limited and only supports model.safetensors. That said, this supports loading popular gptq quantized models without renaming or symlinking the model file.
Read config.json and enable exllama loading if the model has a
quantization_config
withquant_methdod
ofgptq
. Note that this implementation is limited and only supports model.safetensors. That said, this supports loading popular gptq quantized models without renaming or symlinking the model file.