Closed kamalkraj closed 6 months ago
@kamalkraj I found that this works if you use the release/0.5.0
branch instead of the main
branch.
The issue seems to be that this line https://github.com/NVIDIA/TensorRT-LLM/blob/release/0.5.0/tensorrt_llm/models/quantized/ammo.py#L84 is removed in the main
branch, which saves the state dict using torch directly if we're using int4_awq
.
I tried adding this line back to the main branch and it worked again for me.
Thanks @eycheung
The issue is fixed in latest main branch. Close this bug.
The instruction for quantization seems incorrect. - https://github.com/NVIDIA/TensorRT-LLM/tree/main/examples/llama#awq
using the below command
This result in a folder rather than a single file
Using the next command
results in error
Update
quant_ckpt_path
command run successfully without any errorBut running summarization produce produce score of
0