Open 256256mjw opened 3 months ago
internlm2 was implemented by https://github.com/NVIDIA/TensorRT-LLM/pull/1392. The original implementation didn't enable such feature like internlm.
This issue is stale because it has been open 30 days with no activity. Remove stale label or comment or this will be closed in 15 days."
Why is there no --int8_kv_cache option when I want to use convert_checkpoint.py to build int8_kv_cache internlm2-chat-20b model? convert_checkpoint.py is in /TensorRT-LLM/examples/internlm2/convert_checkpoint.py