TensorRT-LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and build TensorRT engines that contain state-of-the-art optimizations to perform inference efficiently on NVIDIA GPUs. TensorRT-LLM also contains components to create Python and C++ runtimes that execute those TensorRT engines.
I am converting a Mixtral8x7B with tensor parallelism using conversion script from llama folder :
python convert_checkpoint.py --model_dir ./Mixtral-8x7B-v0.1 \ --output_dir ./tllm_checkpoint_mixtral_2gpu \ --dtype float16 \ --world_size 2 \ --tp_size 2
An error appears about world_size argument :
convert_checkpoint.py: error: unrecognized arguments: --world_size 2
Should I remove this argument?
Many thanks for your help