When will deepseek models be supported ?

NVIDIA / TensorRT-LLM

TensorRT-LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and build TensorRT engines that contain state-of-the-art optimizations to perform inference efficiently on NVIDIA GPUs. TensorRT-LLM also contains components to create Python and C++ runtimes that execute those TensorRT engines.

https://nvidia.github.io/TensorRT-LLM

Apache License 2.0

8.55k stars 970 forks source link

When will deepseek models be supported ? #856

Open shatealaboxiaowang opened 9 months ago

shatealaboxiaowang commented 9 months ago

I recently tried to build the Magicoder-DS-6.7B model (fine-tuned on Deepseek coder). The build worked, but the output was problematic. request is: curl -X POST localhost:8035/v2/models/ensemble/generate -d '{"text_input": "import numpy", "max_tokens": 20, "bad_words": "", "stop_words":""}' output is: {"cum_log_probs":0.0,"model_name":"ensemble","model_version":"1","output_log_probs":[0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0],"sequence_end":false,"sequence_id":0,"sequence_start":false,"text_output":"<｜begin▁of▁sentence｜>import numpy py py py py py py py py py py py py py py py py py py py py"}

Clearly unreasonable output, have you encountered a similar situation? How to solve it?

byshiue commented 9 months ago

Please share the end to end steps to reproduce your issue.

shatealaboxiaowang commented 9 months ago

Please share the end to end steps to reproduce your issue.

thx，have fixed it.

viningz commented 9 months ago

Please share the end to end steps to reproduce your issue.

thx，have fixed it.

大佬，请问你是怎么解决这个生成的乱码问题呢？deepseek我用VLLM和tensorrt-llm都遇到乱码的问题。

thanhtung901 commented 8 months ago

Please share the end to end steps to reproduce your issue.

thx，have fixed it.

How did you fixed it, I had the same issue

bowencarry commented 7 months ago

My friend, I have the same problem, with deepseek models, the whole model conversion process is no problem, but loading and generating will keep repeating a single token, how did you solve it?