Hello,
While applying quantization on Llama model, we first convert weights downloaded from Meta and then use huggingface converter and then apply huggingface compatible AWQ quantization.
Is there a quantization tool specific to AMD, where the dependency on huggingface is removed?
Are you trying to convert PyTorch model to ONNX model? Then yes, today we use Hugging Face converter. I will check about the possible other option for future and update.
Hello, While applying quantization on Llama model, we first convert weights downloaded from Meta and then use huggingface converter and then apply huggingface compatible AWQ quantization.
Is there a quantization tool specific to AMD, where the dependency on huggingface is removed?
Thanks, Ashima