SmoothQuant test of llava error

dongxuemin666 commented 4 months ago

Linux

No response

[ ] An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
[ ] My own task or dataset (give details below)

1 Use SmoothQuant to quantization llava model 2 Use "INT8 KV cache + per-channel weight-only" to quantization llava model

Run through successfully

Error encountered like below: RuntimeError: "addmm_implcpu" not implemented for 'Half'

Is these methods are not suppported for llava?

Tracin commented 3 months ago

Hi, could you share your command please?

felixslu commented 3 months ago

I have the same problem！

Tensorrtllm v0.8.0，Use "INT8 KV cache + per-channel weight-only" for llama7B

felixslu commented 3 months ago

I have the same problem！

Tensorrtllm v0.8.0，Use "INT8 KV cache + per-channel weight-only" for llama7B

I kown the problem. You should check which device your model running on,cpu or gpu.

NVIDIA / TensorRT-LLM