NVIDIA / TensorRT-LLM

TensorRT-LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and build TensorRT engines that contain state-of-the-art optimizations to perform inference efficiently on NVIDIA GPUs. TensorRT-LLM also contains components to create Python and C++ runtimes that execute those TensorRT engines.
https://nvidia.github.io/TensorRT-LLM
Apache License 2.0
8.16k stars 902 forks source link

Not able to find build.py file in examples\llama directory #1254

Closed maazaahmed closed 1 week ago

maazaahmed commented 6 months ago

System Info

Processor 13th Gen Intel(R) Core(TM) i9-13900KF 3.00 GHz Installed RAM 32.0 GB (31.8 GB usable) System type 64-bit operating system, x64-based processor NVIDIA RTX 4070

Who can help?

image

As per the instruction in https://github.com/NVIDIA/trt-llm-rag-windows (attached ss of Readme) to build TRT Engine for llama we need to run build.py file with the model.pt passing as an argument but I am unable to local build.py file in the directory of llama.

Information

Tasks

Reproduction

  1. Clone the Repo
  2. Download .pt and llama model
  3. go into examples/llama file and try to find build.py file.

Expected behavior

Build.py should be there

actual behavior

not able to build

additional notes

Not sure if I am the only one missing the build.py file or is there any other way.

### Tasks
- [ ] https://github.com/NVIDIA/TensorRT-LLM/issues/1045
jonny2027 commented 6 months ago

@maazaahmed I believe the instructions for main/0.8 have now changed. Check the readme for changes to building

examples/llama/README.md

maazaahmed commented 6 months ago

Build the LLaMA 7B model using a single GPU and FP16.

python convert_checkpoint.py --model_dir ./tmp/llama/7B/ \ --output_dir ./tllm_checkpoint_1gpu_fp16 \ --dtype float16

trtllm-build --checkpoint_dir ./tllm_checkpoint_1gpu_fp16 \ --output_dir ./tmp/llama/7B/trt_engines/fp16/1-gpu \ --gemm_plugin float16

@jonny2027 I am looking for 13B and I can see this is compatible with both 13B and 7B but in the arguments '--model_dir' refering to the weights of llama?

and first we need to get the checkpoints from Hugging face so those checkpoints where we gonna refer to them while running convert_checkpoint.py?

Lucashien commented 3 weeks ago

Did you find the build.py? I also have the same problem on running llama3 with tensorRT

Thanks

byshiue commented 1 week ago

build.py is put in tensorrt_llm/commands/build.py.