Open thanhtung901 opened 5 months ago
We tried to run 1.3b-base on TensorRT LLM with fp16 enabled, but got incorrect completion output.
Can you guide me?
examples/llama
.examples/llama/README.md
.We have not yet resolved the issue with the wrong outputs in fp16. Any feedback about inference results is welcome.
- Install TensorRT-LLM or build it from source.
- Clone the TensorRT-LLM project and goto
examples/llama
.- Follow the instructions in
examples/llama/README.md
.- Replace the model name in commands with deepseek-coder
We have not yet resolved the issue with the wrong outputs in fp16. Any feedback about inference results is welcome.
Hi @chenxu2048 Have u resolved the problem of deepseek?
Hi @chenxu2048 Have u resolved the problem of deepseek?
No, we chose vLLM finally. Same error occurred in TensorRT 8.6, TensorRT 9.0 and TensorRT-LLM, but we have no way to debug it.
Hi @chenxu2048 Have u resolved the problem of deepseek?
No, we chose vLLM finally. Same error occurred in TensorRT 8.6, TensorRT 9.0 and TensorRT-LLM, but we have no way to debug it.
@chenxu2048 OK, thanks for your reply, we have no choice but to wait TensorRT-LLM.
Hi @chenxu2048 Have u resolved the problem of deepseek?
No, we chose vLLM finally. Same error occurred in TensorRT 8.6, TensorRT 9.0 and TensorRT-LLM, but we have no way to debug it.
@chenxu2048 OK, thanks for your reply, we have no choice but to wait TensorRT-LLM.
@activezhao Maybe you can try bf16 instead of fp16.
Hi @chenxu2048 Have u resolved the problem of deepseek?
No, we chose vLLM finally. Same error occurred in TensorRT 8.6, TensorRT 9.0 and TensorRT-LLM, but we have no way to debug it.
@chenxu2048 OK, thanks for your reply, we have no choice but to wait TensorRT-LLM.
@activezhao Maybe you can try bf16 instead of fp16.
@chenxu2048 In fact, I have tried, but it still not worked. Have u tried bf16?
python convert_checkpoint.py --model_dir /data/deepseek-coder-6.7b-base/ \
--output_dir /data/trt-deepseek-coder-6.7b-base-tp2 \
--dtype bfloat16 \
--tp_size 2 \
--workers 2
trtllm-build --checkpoint_dir /data/trt-deepseek-coder-6.7b-base-tp2 \
--output_dir /data/trt-engines-deepseek-coder-6.7b-base/2-gpu/ \
--gemm_plugin bfloat16 \
--gpt_attention_plugin bfloat16 \
--max_batch_size 64
But the result is still abnormal
{"task_id": "HumanEval/0", "completion": "\n\n\n\n\n\n", "language": "python"}
{"task_id": "HumanEval/1", "completion": "\n\n\n\n\n\n", "language": "python"}
{"task_id": "HumanEval/2", "completion": "\n\n\n\n\n\n", "language": "python"}
{"task_id": "HumanEval/3", "completion": "\n\n\n\n\n\n", "language": "python"}
{"task_id": "HumanEval/4", "completion": "\n\n\n\n\n\n", "language": "python"}
{"task_id": "HumanEval/5", "completion": "\n\n\n\n\n\n", "language": "python"}
{"task_id": "HumanEval/6", "completion": "\n\n\n\n\n\n", "language": "python"}
Hi @chenxu2048 Have u resolved the problem of deepseek?
No, we chose vLLM finally. Same error occurred in TensorRT 8.6, TensorRT 9.0 and TensorRT-LLM, but we have no way to debug it.
@chenxu2048 OK, thanks for your reply, we have no choice but to wait TensorRT-LLM.
@activezhao Maybe you can try bf16 instead of fp16.
@chenxu2048 In fact, I have tried, but it still not worked. Have u tried bf16?
python convert_checkpoint.py --model_dir /data/deepseek-coder-6.7b-base/ \ --output_dir /data/trt-deepseek-coder-6.7b-base-tp2 \ --dtype bfloat16 \ --tp_size 2 \ --workers 2 trtllm-build --checkpoint_dir /data/trt-deepseek-coder-6.7b-base-tp2 \ --output_dir /data/trt-engines-deepseek-coder-6.7b-base/2-gpu/ \ --gemm_plugin bfloat16 \ --gpt_attention_plugin bfloat16 \ --max_batch_size 64
But the result is still abnormal
{"task_id": "HumanEval/0", "completion": "\n\n\n\n\n\n", "language": "python"} {"task_id": "HumanEval/1", "completion": "\n\n\n\n\n\n", "language": "python"} {"task_id": "HumanEval/2", "completion": "\n\n\n\n\n\n", "language": "python"} {"task_id": "HumanEval/3", "completion": "\n\n\n\n\n\n", "language": "python"} {"task_id": "HumanEval/4", "completion": "\n\n\n\n\n\n", "language": "python"} {"task_id": "HumanEval/5", "completion": "\n\n\n\n\n\n", "language": "python"} {"task_id": "HumanEval/6", "completion": "\n\n\n\n\n\n", "language": "python"}
No, we didn't.
Has anyone tried running the deepseek_coder model using tensorrt_llm?