Open dongxuemin666 opened 4 months ago
Hi, could you share your command please?
I have the same problem!
Tensorrtllm v0.8.0,Use "INT8 KV cache + per-channel weight-only" for llama7B
I have the same problem!
Tensorrtllm v0.8.0,Use "INT8 KV cache + per-channel weight-only" for llama7B
I kown the problem. You should check which device your model running on,cpu or gpu.
System Info
Linux
Who can help?
No response
Information
Tasks
examples
folder (such as GLUE/SQuAD, ...)Reproduction
1 Use SmoothQuant to quantization llava model 2 Use "INT8 KV cache + per-channel weight-only" to quantization llava model
Expected behavior
Run through successfully
actual behavior
Error encountered like below: RuntimeError: "addmm_implcpu" not implemented for 'Half'
additional notes
Is these methods are not suppported for llava?