An error occurred in MPI_Init_thread when running sqlcoder

2496289471 commented 10 months ago

System Info

nvidia A100 80G centos7 x86_64

Who can help?

@ncomly-nvidia @kaiyux @juney-nvidia

Information

[X] The official example scripts
[ ] My own modified scripts

Tasks

[X] An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
[ ] My own task or dataset (give details below)

Reproduction

python hf_gpt_convert.py --model starcoder -i ./sqlrcoder -o ./c-model/sqlcoder --tensor-parallelism 1 --storage-type float16

python3 build.py \ --model_dir ./c-model/sqlcoder/1-gpu \ --remove_input_padding \ --use_gpt_attention_plugin \ --enable_context_fmha \ --use_gemm_plugin \ --parallel_build \ --output_dir sqlcoder_outputs_tp1 \

python ../run.py --engine_dir sqlcoder_outputs_tp1 --tokenizer_dir ./sqlcoder --input_text "input text" --max_output_len 200 --no_add_special_tokens

Expected behavior

output sql

actual behavior

1706490601024

additional notes

Whether it supports the sqlcoder series model, vllm can run sqlcoder directly as starcoder. I'm wondering if this error is related to the model itself?

byshiue commented 10 months ago

The error happens at MPI initialization. Do you use the docker image used in document? If not, might you take a try?

nv-guomingz commented 1 week ago

HI @2496289471 do u still have further issue or question now? If not, we'll close it soon.

NVIDIA / TensorRT-LLM