Open rbgo404 opened 2 months ago
Please share the configuration in the TensorRT-LLM end. What are the parameters modification required in the model's config.pbtxt
Hey @rbgo404 You can deploy the tensorRT-based LLM model by following the steps here https://nvidia.github.io/GenerativeAIExamples/latest/local-gpu.html#using-local-gpus-for-a-q-a-chatbot
This notebook interacts with the model deployed behind llm-inference-server
container which should get started up if you follow the steps above.
Let me know if you have any questions once you go through these steps!
Hi, I followed the instruction but still has problem starting llm-inference-server. I'm currently using Tesla M60 and llama-2-13b-chat
I have gone through the notebooks but couldn't able to stream the tokens from the TensorRTLLM. Here's the issue:![image](https://github.com/NVIDIA/GenerativeAIExamples/assets/150957746/b8d50a66-5acc-4fa0-a206-a36b6f8eb418)
Code used: