Open Lokiiiiii opened 9 months ago
Where can i get this code
Hi @Lokiiiiii
Thanks for the comments and proposal. Since TRT-LLM evolves so quickly, adding features and optimization techniques on almost every release. The current focus is trying to provide best performance. We may consider this feature compability in the future. You could try to use model cache to see if it can solve the needs for you. Also, TRT-LLM is investing the quick compilation.
Thanks.
Tao
Introduction
I'm proposing a caching strategy for TRT-LLM to streamline the process of re-compiling engines after fine-tuning. This strategy aims to significantly reduce build times and improve overall efficiency. I invite the community to validate and provide feedback on the following approach. I will contribute a PR based on the feedback I get.
Requirements for Improved Caching
Proposed Solution
refit_if_cached()
function between network construction and engine building phases.Anticipated Benefits
Points for Community Discussion
I look forward to the community's insights and suggestions to refine and enhance this proposal.