Closed maazaahmed closed 1 week ago
@maazaahmed I believe the instructions for main/0.8 have now changed. Check the readme for changes to building
examples/llama/README.md
python convert_checkpoint.py --model_dir ./tmp/llama/7B/ \ --output_dir ./tllm_checkpoint_1gpu_fp16 \ --dtype float16
trtllm-build --checkpoint_dir ./tllm_checkpoint_1gpu_fp16 \ --output_dir ./tmp/llama/7B/trt_engines/fp16/1-gpu \ --gemm_plugin float16
@jonny2027 I am looking for 13B and I can see this is compatible with both 13B and 7B but in the arguments '--model_dir' refering to the weights of llama?
and first we need to get the checkpoints from Hugging face so those checkpoints where we gonna refer to them while running convert_checkpoint.py?
Did you find the build.py? I also have the same problem on running llama3 with tensorRT
Thanks
build.py
is put in tensorrt_llm/commands/build.py
.
System Info
Processor 13th Gen Intel(R) Core(TM) i9-13900KF 3.00 GHz Installed RAM 32.0 GB (31.8 GB usable) System type 64-bit operating system, x64-based processor NVIDIA RTX 4070
Who can help?
As per the instruction in https://github.com/NVIDIA/trt-llm-rag-windows (attached ss of Readme) to build TRT Engine for llama we need to run build.py file with the model.pt passing as an argument but I am unable to local build.py file in the directory of llama.
Information
Tasks
examples
folder (such as GLUE/SQuAD, ...)Reproduction
Expected behavior
Build.py should be there
actual behavior
not able to build
additional notes
Not sure if I am the only one missing the build.py file or is there any other way.