inference with tensorrt_llm

thanhtung901 commented 5 months ago

Has anyone tried running the deepseek_coder model using tensorrt_llm?

chenxu2048 commented 5 months ago

We tried to run 1.3b-base on TensorRT LLM with fp16 enabled, but got incorrect completion output.

thanhtung901 commented 5 months ago

Can you guide me?

chenxu2048 commented 5 months ago

Install TensorRT-LLM or build it from source.
Clone the TensorRT-LLM project and goto examples/llama.
Follow the instructions in examples/llama/README.md.
Replace the model name in commands with deepseek-coder

We have not yet resolved the issue with the wrong outputs in fp16. Any feedback about inference results is welcome.

activezhao commented 3 months ago

Install TensorRT-LLM or build it from source.

Clone the TensorRT-LLM project and goto examples/llama.

Follow the instructions in examples/llama/README.md.

Replace the model name in commands with deepseek-coder

We have not yet resolved the issue with the wrong outputs in fp16. Any feedback about inference results is welcome.

Hi @chenxu2048 Have u resolved the problem of deepseek?

chenxu2048 commented 3 months ago

Hi @chenxu2048 Have u resolved the problem of deepseek?

No, we chose vLLM finally. Same error occurred in TensorRT 8.6, TensorRT 9.0 and TensorRT-LLM, but we have no way to debug it.

activezhao commented 3 months ago

Hi @chenxu2048 Have u resolved the problem of deepseek?

No, we chose vLLM finally. Same error occurred in TensorRT 8.6, TensorRT 9.0 and TensorRT-LLM, but we have no way to debug it.

@chenxu2048 OK, thanks for your reply, we have no choice but to wait TensorRT-LLM.

chenxu2048 commented 3 months ago

Hi @chenxu2048 Have u resolved the problem of deepseek?

No, we chose vLLM finally. Same error occurred in TensorRT 8.6, TensorRT 9.0 and TensorRT-LLM, but we have no way to debug it.

@chenxu2048 OK, thanks for your reply, we have no choice but to wait TensorRT-LLM.

@activezhao Maybe you can try bf16 instead of fp16.

activezhao commented 3 months ago

Hi @chenxu2048 Have u resolved the problem of deepseek?

No, we chose vLLM finally. Same error occurred in TensorRT 8.6, TensorRT 9.0 and TensorRT-LLM, but we have no way to debug it.

@chenxu2048 OK, thanks for your reply, we have no choice but to wait TensorRT-LLM.

@activezhao Maybe you can try bf16 instead of fp16.

@chenxu2048 In fact, I have tried, but it still not worked. Have u tried bf16？

python convert_checkpoint.py --model_dir /data/deepseek-coder-6.7b-base/ \
                            --output_dir /data/trt-deepseek-coder-6.7b-base-tp2 \
                            --dtype bfloat16 \
                            --tp_size 2 \
                            --workers 2

trtllm-build --checkpoint_dir /data/trt-deepseek-coder-6.7b-base-tp2 \
            --output_dir /data/trt-engines-deepseek-coder-6.7b-base/2-gpu/  \
            --gemm_plugin bfloat16 \
            --gpt_attention_plugin bfloat16 \
            --max_batch_size 64

But the result is still abnormal

{"task_id": "HumanEval/0", "completion": "\n\n\n\n\n\n", "language": "python"}
{"task_id": "HumanEval/1", "completion": "\n\n\n\n\n\n", "language": "python"}
{"task_id": "HumanEval/2", "completion": "\n\n\n\n\n\n", "language": "python"}
{"task_id": "HumanEval/3", "completion": "\n\n\n\n\n\n", "language": "python"}
{"task_id": "HumanEval/4", "completion": "\n\n\n\n\n\n", "language": "python"}
{"task_id": "HumanEval/5", "completion": "\n\n\n\n\n\n", "language": "python"}
{"task_id": "HumanEval/6", "completion": "\n\n\n\n\n\n", "language": "python"}

chenxu2048 commented 3 months ago

Hi @chenxu2048 Have u resolved the problem of deepseek?

No, we chose vLLM finally. Same error occurred in TensorRT 8.6, TensorRT 9.0 and TensorRT-LLM, but we have no way to debug it.

@chenxu2048 OK, thanks for your reply, we have no choice but to wait TensorRT-LLM.

@activezhao Maybe you can try bf16 instead of fp16.

@chenxu2048 In fact, I have tried, but it still not worked. Have u tried bf16？

python convert_checkpoint.py --model_dir /data/deepseek-coder-6.7b-base/ \
                            --output_dir /data/trt-deepseek-coder-6.7b-base-tp2 \
                            --dtype bfloat16 \
                            --tp_size 2 \
                            --workers 2

trtllm-build --checkpoint_dir /data/trt-deepseek-coder-6.7b-base-tp2 \
            --output_dir /data/trt-engines-deepseek-coder-6.7b-base/2-gpu/  \
            --gemm_plugin bfloat16 \
            --gpt_attention_plugin bfloat16 \
            --max_batch_size 64

But the result is still abnormal

{"task_id": "HumanEval/0", "completion": "\n\n\n\n\n\n", "language": "python"}
{"task_id": "HumanEval/1", "completion": "\n\n\n\n\n\n", "language": "python"}
{"task_id": "HumanEval/2", "completion": "\n\n\n\n\n\n", "language": "python"}
{"task_id": "HumanEval/3", "completion": "\n\n\n\n\n\n", "language": "python"}
{"task_id": "HumanEval/4", "completion": "\n\n\n\n\n\n", "language": "python"}
{"task_id": "HumanEval/5", "completion": "\n\n\n\n\n\n", "language": "python"}
{"task_id": "HumanEval/6", "completion": "\n\n\n\n\n\n", "language": "python"}

No, we didn't.

deepseek-ai / DeepSeek-Coder

inference with tensorrt_llm #102