Open MingLin-home opened 8 months ago
Hi! We haven't encountered this problem before. Could you please post your config.json for both models? And which linear layer does the shape mismatch exist?
Hi! We haven't encountered this problem before. Could you please post your config.json for both models? And which linear layer does the shape mismatch exist?
Sorry for the late reply. My config.json:
{
"qkv": "per-tensor",
"out": "per-token",
"fc1": "per-tensor",
"fc2": "per-token"
}
BTW, I am able to load the export Llama-2 70B model into vllm-w8a8 now. So I suspect this issue only exists in AutoSmoothQuant repo.
@MingLin-home Hi, have you tackled this issue? I had the same problem.
Hello! Thanks for the nice work!
I want to quantize Llama-2-70B. I was able to export the quantized model without any error. However, when I test the model:
I encounter this error:
BTW, I was able to convert and load Llama-2-7b model without any error. Any idea how to fix it? Looks like group attention related.
Many thanks ahead!