NVIDIA / TensorRT-LLM

TensorRT-LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and build TensorRT engines that contain state-of-the-art optimizations to perform inference efficiently on NVIDIA GPUs. TensorRT-LLM also contains components to create Python and C++ runtimes that execute those TensorRT engines.
https://nvidia.github.io/TensorRT-LLM
Apache License 2.0
8.11k stars 896 forks source link

Can I build a engine for Llama model, with dtype bfloat16 and int8 weight ? #730

Open ztxz16 opened 8 months ago

ztxz16 commented 8 months ago

Thanks for this excellent project! I can generate a bfloat16 model or an int8 weight model,but wehn I tried the following commands:

python ./examples/llama/build.py --model_dir ./Mixtral-8x7B-Instruct-v0.1/ \ --use_inflight_batching \ --enable_context_fmha \ --use_gemm_plugin \ --world_size 2 \ --tp_size 2 \ --dtype bfloat16 \ --use_gpt_attention_plugin bfloat16 \ --use_gemm_plugin bfloat16 \ --use_weight_only \ --output_dir engine

I get an error message: " [12/23/2023-16:05:41] [TRT-LLM] [I] Loading weights from HF LLaMA... Traceback (most recent call last): File "/workspace/github/TensorRT-LLM/./examples/llama/build.py", line 906, in build(0, args) File "/workspace/github/TensorRT-LLM/./examples/llama/build.py", line 850, in build engine = build_rank_engine(builder, builder_config, engine_name, File "/workspace/github/TensorRT-LLM/./examples/llama/build.py", line 690, in build_rank_engine load_from_hf_llama(tensorrt_llm_llama, File "/workspace/github/TensorRT-LLM/examples/llama/weight.py", line 352, in load_from_hf_llama print(torch.tensor(v)) TypeError: can't convert np.ndarray of type numpy.void. The only supported types are: float64, float32, float16, complex64, complex128, int64, int32, int16, int8, uint8, and bool. "

Please tell me how to solve this problem? Thanks!

juney-nvidia commented 8 months ago

@ztxz16

Hi,

can you share the branch and github commit you are using to hit with this issue?

Thanks June

ztxz16 commented 8 months ago

@ztxz16

Hi,

can you share the branch and github commit you are using to hit with this issue?

Thanks June

main branch, commit id: a75618df24e97ecf92b8899ca3c229c4b8097dda

jdemouth-nvidia commented 8 months ago

The engineers working on Mixtral are on holiday this week - they'll be back in January. Is it a blocking issue?