Closed ethnzhng closed 5 months ago
llama-2-7B with tp size 4 does not satisfy the limitation of int4 awq when awq_block_size is 128. You can set --awq_block_size 64
during quantizing the checkpoint. Similar issues for other tests. We might not be able to run 7B with TP8 due to the limitation.
System Info
Who can help?
@Tracin
Information
Reproduction
Run
quantize.py
usingint4_awq
withtp_size 4, 8
for Llama 2 7B, or withtp_size 8
for 13Be.g.
Expected behavior
Quantization is successful
actual behavior
Llama 2 7B
tp=4
tp=8
Llama 2 13B
tp=8
additional notes
Llama 3 8B is able to be quantized without error under the same conditions (int4_awq & tp = 8).