Open liguodongiot opened 1 month ago
Per-tensor quantization on kv cache can cause large error. Do you have cmmlu results without kv cache quantized?
This issue is stale because it has been open 30 days with no activity. Remove stale label or comment or this will be closed in 15 days."
hi, I found trt-llm kv cache quant lead to model accuracy loss serious, but vllm and lmdeploy only less loss.