Open zhangts20 opened 1 day ago
please pass --dataset
to the commands and verify your engine by gptManagerBenchmark first.
please pass
--dataset
to the commands and verify your engine by gptManagerBenchmark first.
I have pass --dataset to disaggServerBenchmark, and the llama2-7b-tp1 and llama2-7b-tp2 of gptManagerBenchmark is ok.
Could you comment out
for (int sig : {SIGABRT, SIGSEGV})
{
__sighandler_t previousHandler = nullptr;
if (forwardAbortToParent)
{
previousHandler = std::signal(sig,
[](int signal)
{
#ifndef _WIN32
pid_t parentProcessId = getppid();
kill(parentProcessId, SIGKILL);
#endif
MPI_Abort(MPI_COMM_WORLD, EXIT_FAILURE);
});
}
else
{
previousHandler = std::signal(sig, [](int signal) { MPI_Abort(MPI_COMM_WORLD, EXIT_FAILURE); });
}
TLLM_CHECK_WITH_INFO(previousHandler != SIG_ERR, "Signal handler setup failed");
}
``` in `cpp/tensorrt_llm/common/mpiUtils.cpp`
and try compile it and run .
Which container do you use?
Could you comment out
for (int sig : {SIGABRT, SIGSEGV}) { __sighandler_t previousHandler = nullptr; if (forwardAbortToParent) { previousHandler = std::signal(sig, [](int signal) { #ifndef _WIN32 pid_t parentProcessId = getppid(); kill(parentProcessId, SIGKILL); #endif MPI_Abort(MPI_COMM_WORLD, EXIT_FAILURE); }); } else { previousHandler = std::signal(sig, [](int signal) { MPI_Abort(MPI_COMM_WORLD, EXIT_FAILURE); }); } TLLM_CHECK_WITH_INFO(previousHandler != SIG_ERR, "Signal handler setup failed"); } ``` in `cpp/tensorrt_llm/common/mpiUtils.cpp` and try compile it and run . Which container do you use?
Thanks, I will try it. I install tensorrt_llm from source in my own container, and the env info is as mentioned above.
Maybe you can try in the docker image built with instruction https://nvidia.github.io/TensorRT-LLM/installation/build-from-source-linux.html#building-a-tensorrt-llm-docker-image
We only have tested disaggServer in docker image base on nvcr.io/nvidia/pytorch:24.10-py3
.
Could you comment out
for (int sig : {SIGABRT, SIGSEGV}) { __sighandler_t previousHandler = nullptr; if (forwardAbortToParent) { previousHandler = std::signal(sig, [](int signal) { #ifndef _WIN32 pid_t parentProcessId = getppid(); kill(parentProcessId, SIGKILL); #endif MPI_Abort(MPI_COMM_WORLD, EXIT_FAILURE); }); } else { previousHandler = std::signal(sig, [](int signal) { MPI_Abort(MPI_COMM_WORLD, EXIT_FAILURE); }); } TLLM_CHECK_WITH_INFO(previousHandler != SIG_ERR, "Signal handler setup failed"); } ``` in `cpp/tensorrt_llm/common/mpiUtils.cpp` and try compile it and run . Which container do you use?
I rebuild tensorrt_llm, and now the error like this (There have an error about permissions):
[40a8c9673b05:1447853] Read -1, expected 33554432, errno = 14
[40a8c9673b05:1447850] *** Process received signal ***
[40a8c9673b05:1447850] Signal: Segmentation fault (11)
[40a8c9673b05:1447850] Signal code: Invalid permissions (2)
[40a8c9673b05:1447850] Failing at address: 0x9c5c12400
[40a8c9673b05:1447850] [ 0] /lib/x86_64-linux-gnu/libc.so.6(+0x42520)[0x7f39b8619520]
[40a8c9673b05:1447850] [ 1] /lib/x86_64-linux-gnu/libc.so.6(+0x1a67cd)[0x7f39b877d7cd]
[40a8c9673b05:1447850] [ 2] /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_btl_vader.so(+0x3244)[0x7f39600ec244]
[40a8c9673b05:1447850] [ 3] /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_pml_ob1.so(mca_pml_ob1_send_request_schedule_once+0x1b6)[0x7f3960048556]
[40a8c9673b05:1447850] [ 4] /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_pml_ob1.so(mca_pml_ob1_recv_frag_callback_ack+0x201)[0x7f3960046811]
[40a8c9673b05:1447850] [ 5] /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_btl_vader.so(mca_btl_vader_poll_handle_frag+0x95)[0x7f39600f0ae5]
[40a8c9673b05:1447850] [ 6] /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_btl_vader.so(+0x7e24)[0x7f39600f0e24]
[40a8c9673b05:1447850] [ 7] /lib/x86_64-linux-gnu/libopen-pal.so.40(opal_progress+0x34)[0x7f39b8a3f714]
[40a8c9673b05:1447850] [ 8] /lib/x86_64-linux-gnu/libopen-pal.so.40(ompi_sync_wait_mt+0xbd)[0x7f39b8a4c38d] [40a8c9673b05:1447850] [ 9] /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_pml_ob1.so(mca_pml_ob1_mprobe+0x52d)[0x7f39600432fd]
[40a8c9673b05:1447850] [10] /lib/x86_64-linux-gnu/libmpi.so.40(MPI_Mprobe+0xd7)[0x7f39b8b440e7]
[40a8c9673b05:1447850] [11] /xxx/TensorRT-LLM/cpp/build/tensorrt_llm/libtensorrt_llm.so(_ZNK12tensorrt_llm3mpi7MpiComm6mprobeEiiPP14ompi_message_tP20ompi_status_public_t+0x2a)[0x7f39be09be6a]
[40a8c9673b05:1447850] [12] /xxx/TensorRT-LLM/cpp/build/tensorrt_llm/libtensorrt_llm.so(_ZN12tensorrt_llm8executor8Executor4Impl19leaderRecvReqThreadEv+0x133)[0x7f39c03c4e23]
[40a8c9673b05:1447850] [13] /xxx/TensorRT-LLM/cpp/tensorrt_llm/kernels/decoderMaskedMultiheadAttention/decoderXQAImplJIT/nvrtcWrapper/x86_64-linux-gnu/libtensorrt_llm_nvrtc_wrapper.so(+0x32e7930)[0x7f39bbee793
0]
[40a8c9673b05:1447850] [14] /lib/x86_64-linux-gnu/libc.so.6(+0x94ac3)[0x7f39b866bac3] [40a8c9673b05:1447850] [15] /lib/x86_64-linux-gnu/libc.so.6(+0x126850)[0x7f39b86fd850]
[40a8c9673b05:1447850] *** End of error message ***
[40a8c9673b05:1447854] Read -1, expected 16777216, errno = 14
[40a8c9673b05:1447851] *** Process received signal ***
[40a8c9673b05:1447851] Signal: Segmentation fault (11)
[40a8c9673b05:1447851] Signal code: Invalid permissions (2)
[40a8c9673b05:1447851] Failing at address: 0x9a2d12600
[40a8c9673b05:1447851] [ 0] /lib/x86_64-linux-gnu/libc.so.6(+0x42520)[0x7fb0f5419520]
[40a8c9673b05:1447851] [ 1] /lib/x86_64-linux-gnu/libc.so.6(+0x1a67cd)[0x7fb0f557d7cd]
[40a8c9673b05:1447851] [ 2] /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_btl_vader.so(+0x3244)[0x7fb09c30a244]
[40a8c9673b05:1447851] [ 3] /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_pml_ob1.so(mca_pml_ob1_send_request_schedule_once+0x1b6)[0x7fb09c165556]
[40a8c9673b05:1447851] [ 4] /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_pml_ob1.so(mca_pml_ob1_recv_frag_callback_ack+0x201)[0x7fb09c163811]
[40a8c9673b05:1447851] [ 5] /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_btl_vader.so(mca_btl_vader_poll_handle_frag+0x95)[0x7fb09c30eae5]
[40a8c9673b05:1447851] [ 6] /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_btl_vader.so(+0x7e24)[0x7fb09c30ee24]
[40a8c9673b05:1447851] [ 7] /lib/x86_64-linux-gnu/libopen-pal.so.40(opal_progress+0x34)[0x7fb0f583f714]
[40a8c9673b05:1447851] [ 8] /lib/x86_64-linux-gnu/libopen-pal.so.40(ompi_sync_wait_mt+0xbd)[0x7fb0f584c38d]
[40a8c9673b05:1447851] [ 9] /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_pml_ob1.so(mca_pml_ob1_mprobe+0x52d)[0x7fb09c1602fd]
[40a8c9673b05:1447851] [10] /lib/x86_64-linux-gnu/libmpi.so.40(MPI_Mprobe+0xd7)[0x7fb0f59440e7]
[40a8c9673b05:1447851] [11] [40a8c9673b05:1447855] Read -1, expected 16777216, errno = 14
[40a8c9673b05:1447852] *** Process received signal ***
[40a8c9673b05:1447852] Signal: Segmentation fault (11)
[40a8c9673b05:1447852] Signal code: Invalid permissions (2)
[40a8c9673b05:1447852] Failing at address: 0x9a2d12600
[40a8c9673b05:1447852] [ 0] /lib/x86_64-linux-gnu/libc.so.6(+0x42520)[0x7fc1fee19520]
[40a8c9673b05:1447852] [ 1] /lib/x86_64-linux-gnu/libc.so.6(+0x1a67cd)[0x7fc1fef7d7cd]
[40a8c9673b05:1447852] [ 2] /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_btl_vader.so(+0x3244)[0x7fc1a598e244]
[40a8c9673b05:1447852] [ 3] /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_pml_ob1.so(mca_pml_ob1_send_request_schedule_once+0x1b6)[0x7fc1a53e8556]
[40a8c9673b05:1447852] [ 4] /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_pml_ob1.so(mca_pml_ob1_recv_frag_callback_ack+0x201)[0x7fc1a53e6811]
[40a8c9673b05:1447852] [ 5] /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_btl_vader.so(mca_btl_vader_poll_handle_frag+0x95)[0x7fc1a5992ae5]
[40a8c9673b05:1447852] [ 6] /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_btl_vader.so(+0x7db1)[0x7fc1a5992db1]
[40a8c9673b05:1447852] [ 7] /xxx/TensorRT-LLM/cpp/build/tensorrt_llm/libtensorrt_llm.so(_ZNK12tensorrt_llm3mpi7MpiComm6mprobeEiiPP14ompi_message_tP20ompi_status_public_t+0x2a)[0x7fb0fae9be6a]
[40a8c9673b05:1447851] [12] /lib/x86_64-linux-gnu/libopen-pal.so.40(opal_progress+0x34)[0x7fc1ff23f714]
[40a8c9673b05:1447852] [ 8] /lib/x86_64-linux-gnu/libopen-pal.so.40(ompi_sync_wait_mt+0xbd)[0x7fc1ff24c38d]
[40a8c9673b05:1447852] [ 9] /lib/x86_64-linux-gnu/libmpi.so.40(ompi_request_default_wait+0x24b)[0x7fc1ff3192db]
[40a8c9673b05:1447852] [10] /lib/x86_64-linux-gnu/libmpi.so.40(ompi_coll_base_bcast_intra_generic+0x5ea)[0x7fc1ff36d40a]
[40a8c9673b05:1447852] [11] /lib/x86_64-linux-gnu/libmpi.so.40(ompi_coll_base_bcast_intra_pipeline+0xd1)[0x7fc1ff36e6c1]
[40a8c9673b05:1447852] [12] /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_coll_tuned.so(ompi_coll_tuned_bcast_intra_dec_fixed+0x40)[0x7fc1a534b640]
[40a8c9673b05:1447852] [13] /lib/x86_64-linux-gnu/libmpi.so.40(MPI_Bcast+0x121)[0x7fc1ff32d881]
[40a8c9673b05:1447852] [14] /xxx/TensorRT-LLM/cpp/build/tensorrt_llm/libtensorrt_llm.so(_ZN12tensorrt_llm8executor8Executor4Impl19leaderRecvReqThreadEv+0x133)[0x7fb0fd1c4e23]
[40a8c9673b05:1447851] [13] /xxx/TensorRT-LLM/cpp/build/tensorrt_llm/libtensorrt_llm.so(_ZNK12tensorrt_llm3mpi7MpiComm5bcastEPvmNS0_7MpiTypeEi+0x47)[0x7fc20489d7b7]
[40a8c9673b05:1447852] [15] /xxx/TensorRT-LLM/cpp/build/tensorrt_llm/libtensorrt_llm.so(_ZN12tensorrt_llm8executor8Executor4Impl16getNewReqWithIdsEiSt8optionalIfE+0x68b)[0x7fc206bb787b]
[40a8c9673b05:1447852] [16] /xxx/TensorRT-LLM/cpp/tensorrt_llm/kernels/decoderMaskedMultiheadAttention/decoderXQAImplJIT/nvrtcWrapper/x86_64-linux-gnu/libtensorrt_llm_nvrtc_wrapper.so(+0x32e7930)[0x7fb0f8ce793
0]
[40a8c9673b05:1447851] [14] /lib/x86_64-linux-gnu/libc.so.6(+0x94ac3)[0x7fb0f546bac3]
[40a8c9673b05:1447851] [15] /xxx/TensorRT-LLM/cpp/build/tensorrt_llm/libtensorrt_llm.so(_ZN12tensorrt_llm8executor8Executor4Impl16fetchNewRequestsEiSt8optionalIfE+0x59)[0x7fc206bc5949]
[40a8c9673b05:1447852] [17] /lib/x86_64-linux-gnu/libc.so.6(+0x126850)[0x7fb0f54fd850]
[40a8c9673b05:1447851] *** End of error message ***
/xxx/TensorRT-LLM/cpp/build/tensorrt_llm/libtensorrt_llm.so(_ZN12tensorrt_llm8executor8Executor4Impl13executionLoopEv+0x3bd)[0x7fc206bc7f5d]
[40a8c9673b05:1447852] [18] /xxx/TensorRT-LLM/cpp/tensorrt_llm/kernels/decoderMaskedMultiheadAttention/decoderXQAImplJIT/nvrtcWrapper/x86_64-linux-gnu/libtensorrt_llm_nvrtc_wrapper.so(+0x32e7930)[0x7fc2026e793
0]
[40a8c9673b05:1447852] [19] /lib/x86_64-linux-gnu/libc.so.6(+0x94ac3)[0x7fc1fee6bac3]
[40a8c9673b05:1447852] [20] /lib/x86_64-linux-gnu/libc.so.6(+0x126850)[0x7fc1feefd850]
[40a8c9673b05:1447852] *** End of error message ***
Maybe you can't start the trtllm executor
in orchestrator mode in your container environment.
Could you run executorExampleAdvanced
in examples/cpp/executor
with orchestartor
mode?
If your mpi is based on UCX
,please set env UCX_MEMTYPE_CACHE=n
Please make sure your mpi enable cuda aware.
I highly recommend using and docker image based nvcr.io/nvidia/pytorch:24.10-py3
.
Maybe you can't start the
trtllm executor
in orchestrator mode in your container environment. Could you runexecutorExampleAdvanced
inexamples/cpp/executor
withorchestartor
mode? If your mpi is based onUCX
,please set envUCX_MEMTYPE_CACHE=n
Please make sure your mpi enable cuda aware. I highly recommend using and docker image basednvcr.io/nvidia/pytorch:24.10-py3
.
Thanks, I have executed executorExampleAdvanced successfully.
./build/executorExampleAdvanced --engine_dir /data/models/llm/trtllm_0.16.0.dev2024112600/llama2-7b-tp1 --input_tokens_csv_file ./inputTokens.csv --use_orchestrator_mode --worker_executable_path ../../../cpp/build/tensorrt_llm/executor_worker/executorWorker
The log of output:
[TensorRT-LLM][INFO] Engine version 0.16.0.dev2024112600 found in the config file, assuming engine(s) built by new builder API.
[TensorRT-LLM][INFO] Initializing MPI with thread mode 3
[TensorRT-LLM][INFO] Initialized MPI
[TensorRT-LLM][INFO] Initializing MPI with thread mode 3
[TensorRT-LLM][INFO] Initialized MPI
[TensorRT-LLM][INFO] Engine version 0.16.0.dev2024112600 found in the config file, assuming engine(s) built by new builder API.
[TensorRT-LLM][INFO] Refreshed the MPI local session
[TensorRT-LLM][INFO] MPI size: 1, MPI local size: 1, rank: 0 [TensorRT-LLM][INFO] MPI size: 1, MPI local size: 1, rank: 0
[TensorRT-LLM][INFO] Rank 0 is using GPU 0
[TensorRT-LLM][INFO] TRTGptModel maxNumSequences: 2048
[TensorRT-LLM][INFO] TRTGptModel maxBatchSize: 2048
[TensorRT-LLM][INFO] TRTGptModel maxBeamWidth: 1
[TensorRT-LLM][INFO] TRTGptModel maxSequenceLen: 2048
[TensorRT-LLM][INFO] TRTGptModel maxDraftLen: 0
[TensorRT-LLM][INFO] TRTGptModel mMaxAttentionWindowSize: (2048) * 32
[TensorRT-LLM][INFO] TRTGptModel enableTrtOverlap: 0
[TensorRT-LLM][INFO] TRTGptModel normalizeLogProbs: 1
[TensorRT-LLM][INFO] TRTGptModel maxNumTokens: 8192
[TensorRT-LLM][INFO] TRTGptModel maxInputLen: 2047 = min(maxSequenceLen - 1, maxNumTokens) since context FMHA and usePackedInput are enabled
[TensorRT-LLM][INFO] TRTGptModel If model type is encoder, maxInputLen would be reset in trtEncoderModel to maxInputLen: min(maxSequenceLen, maxNumTokens).
[TensorRT-LLM][INFO] Capacity Scheduler Policy: GUARANTEED_NO_EVICT
[TensorRT-LLM][INFO] Context Chunking Scheduler Policy: None
[TensorRT-LLM][INFO] Loaded engine size: 12869 MiB
[TensorRT-LLM][INFO] Inspecting the engine to identify potential runtime issues...
[TensorRT-LLM][INFO] The profiling verbosity of the engine does not allow this analysis to proceed. Re-build the engine with 'detailed' profiling verbosity to get more diagnostics.
[TensorRT-LLM][INFO] [MemUsageChange] Allocated 1112.01 MiB for execution context memory.
[TensorRT-LLM][INFO] [MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +0, GPU +0, now: CPU 0, GPU 12853 (MiB)
[TensorRT-LLM][INFO] [MemUsageChange] Allocated 346.17 MB GPU memory for runtime buffers.
[TensorRT-LLM][INFO] [MemUsageChange] Allocated 1.16 GB GPU memory for decoder.
[TensorRT-LLM][INFO] Memory usage when calculating max tokens in paged kv cache: total: 79.11 GiB, available: 26.40 GiB
[TensorRT-LLM][INFO] Number of blocks in KV cache primary pool: 761
[TensorRT-LLM][INFO] Number of blocks in KV cache secondary pool: 0, onboard blocks to primary memory before reuse: true
[TensorRT-LLM][INFO] Max KV cache pages per sequence: 32
[TensorRT-LLM][INFO] Number of tokens per block: 64.
[TensorRT-LLM][INFO] [MemUsageChange] Allocated 23.78 GiB for max tokens in paged KV cache (48704).
[TensorRT-LLM][INFO] Enable MPI KV cache transport.
[TensorRT-LLM][INFO] Executor instance created by worker
[TensorRT-LLM][INFO] Reading input tokens from ./inputTokens.csv
[TensorRT-LLM][INFO] Number of requests: 3
[TensorRT-LLM][INFO] Creating request with 6 input tokens
[TensorRT-LLM][INFO] Creating request with 4 input tokens
[TensorRT-LLM][INFO] Creating request with 10 input tokens
[TensorRT-LLM][INFO] Got 20 tokens for seqIdx 0 for requestId 3
[TensorRT-LLM][INFO] Request id 3 is completed.
[TensorRT-LLM][INFO] Got 14 tokens for seqIdx 0 for requestId 2
[TensorRT-LLM][INFO] Request id 2 is completed.
[TensorRT-LLM][INFO] Got 16 tokens for seqIdx 0 for requestId 1
[TensorRT-LLM][INFO] Request id 1 is completed.
[TensorRT-LLM][INFO] Writing output tokens to outputTokens.csv
[TensorRT-LLM][INFO] Exiting.
[TensorRT-LLM][INFO] Orchestrator sendReq thread exiting
[TensorRT-LLM][INFO] Orchestrator recv thread exiting
[TensorRT-LLM][INFO] Leader recvReq thread exiting
[TensorRT-LLM][INFO] Leader sendThread exiting
[TensorRT-LLM][INFO] Refreshed the MPI local session
Having trouble using nvcr.io/nvidia/pytorch:24.10-py3 -based containers?
System Info
Who can help?
@ncomly-nvidia
Information
Tasks
examples
folder (such as GLUE/SQuAD, ...)Reproduction
python scripts/build_wheel.py --trt_root=/usr/local/tensorrt --clean --cuda_architectures='90-real' --benchmarks
mpirun -n 7 disaggServerBenchmark --context_engine_dirs /data/models/llm/trtllm_0.16.0.dev2024112600/llama2-7b-tp1,/data/models/llm/trtllm_0.16.0.dev2024112600/llama2-7b-tp2 --generation_engine_dirs /data/models/llm/trtllm_0.16.0.dev2024112600/llama2-7b-tp1,/data/models/llm/trtllm_0.16.0.dev2024112600/llama2-7b-tp2
Expected behavior
success
actual behavior
additional notes
Thanks for your attention!