Closed wxsms closed 6 months ago
No unfortunately it's not supported exporting to TRT LLM checkpoint yet, but our quantization works. If your need the support urgently, I recommend you check the python code of the modelopt.torch.export after the pip installation and modify that package to support it yourself. Also feel free to share the diff (patch) here so we can help incorporate into the next release.
Thanks. May I ask, is there any plans on this for now?
There might be a plan for the coming releases
I manage to finish the checkpoint export by simply adding "unknown:Starcoder2ForCausalLM": "GPTForCausalLM"
to MODEL_NAME_TO_HF_ARCH_MAP
in tensorrt_llm_utils.py
, where the unknown
is added by tensorrt_llm/quantization/quantize_by_modelopt.py
. And the engine built from that checkpoint works. Thanks for your guidence.
thanks @wxsms I will add it to the next release
I've try latest version of mpt with a starcoder2 model to perform a FP8 quant:
which failed as:
pip list:
hardware: 4090x2