llm-inference is a platform for publishing and managing llm inference, providing a wide range of out-of-the-box features for model deployment, such as UI, RESTful API, auto-scaling, computing resource management, monitoring, and more.
To fix max input token size for warmup. w/ DS, there is "max_tokens" localed "initializer/max_tokens", it will conflict with "max_input_words", prefer "max_tokens" if both existed
To fix max input token size for warmup. w/ DS, there is "max_tokens" localed "initializer/max_tokens", it will conflict with "max_input_words", prefer "max_tokens" if both existed