Open EASTERNTIGER opened 3 months ago
same problem,seems that this argument has been removed #2008
Hi, @OptimusV5 @EASTERNTIGER Could you try to remove the this line: https://github.com/NVIDIA/TensorRT-LLM/blob/main/tensorrt_llm/_ipc_utils.py#L42
@EASTERNTIGER @OptimusV5 This bug is known and has been fixed in both the main branch and v0.12, you can validate it with the main branch now or wait for the v0.12 release.
@EASTERNTIGER @OptimusV5 seems fixed here: https://github.com/NVIDIA/TensorRT-LLM/blob/main/cpp/tensorrt_llm/runtime/ipcUtils.cpp#L47, please update your code and verify
Hi, I tried to convert T5 model to tensorrt. I have a 4 GPUs devices.In the python convert_checkpoint.py step,I set tp_size=4,pp_size=1.Then I got tensorrt model successfully.However,when I use command :mpirun --allow-run-as-root -np 4 python3 run.py ,I got those errors
when I set tp_size=1,pp_size=1 in the python convert_checkpoint.py step,I can run python3 run.py successfully. So how can I fixed this problem?It seems to be related with GPU setting,but I don't know how to do that. I also found a similar issue but when I added --use_custom_all_reduce disable in trtllm-build,it showed unrecognized arguments