NVIDIA / TensorRT-LLM

TensorRT-LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and build TensorRT engines that contain state-of-the-art optimizations to perform inference efficiently on NVIDIA GPUs. TensorRT-LLM also contains components to create Python and C++ runtimes that execute those TensorRT engines.
https://nvidia.github.io/TensorRT-LLM
Apache License 2.0
8.11k stars 896 forks source link

An error occurred in MPI_Init_thread when running sqlcoder #934

Closed 2496289471 closed 7 months ago

2496289471 commented 7 months ago

https://huggingface.co/defog/sqlcoder2 Can sqlcoder2 (based on starcoder) be run directly? I ran the code according to starcoder's example, but the following error message appeared:

python3 ../run.py --engine_dir sqlcoder_outputs_tp1 --tokenizer_dir /home/admin/code/sqlcoder2 --input_text "how many students in the school?"---max_output_len 200 --no_add_special_tokens [TensorRT-LLM][WARNING] Parameter version cannot be read from json: [TensorRT-LLM][WARNING] [json.exception.out_of_range.403] key 'version' not found [TensorRT-LLM][WARNING] Parameter pipeline_parallel cannot be read from json: [TensorRT-LLM][WARNING] [json.exception.out_of_range.403] key 'pipeline_parallel' not found [TensorRT-LLM][WARNING] Parameter mlp_hidden_size cannot be read from json: [TensorRT-LLM][WARNING] [json.exception.out_of_range.403] key 'mlp_hidden_size' not found [TensorRT-LLM][WARNING] [json.exception.type_error.302] type must be number, but is null [TensorRT-LLM][WARNING] Optional value for parameter max_num_tokens will not be set. [TensorRT-LLM][INFO] Initializing MPI with thread mode 1 An error occurred in MPI_Init_thread on a NULL communicator MPI_ERRORS_ARE_FATAL (processes in this communicator will now abort, and potentially your MPI job) [localhost.localdomain:204388] Local abort before MPI_INIT completed completed successfully, but am not able to aggregate error messages, and not able to guarantee that all other processes were killed!

byshiue commented 7 months ago

Please follow the issue template to file issue, thank you for cooperation.

2496289471 commented 7 months ago

Please follow the issue template to file issue, thank you for cooperation.

thank you, this is the new issue #995

byshiue commented 7 months ago

Close this one.