NVIDIA / TensorRT-LLM

TensorRT-LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and build TensorRT engines that contain state-of-the-art optimizations to perform inference efficiently on NVIDIA GPUs. TensorRT-LLM also contains components to create Python and C++ runtimes that execute those TensorRT engines.
https://nvidia.github.io/TensorRT-LLM
Apache License 2.0
8.81k stars 1.01k forks source link

MPI Abort Error when using disaggServerBenchmark #2518

Open zhangts20 opened 1 day ago

zhangts20 commented 1 day ago

System Info

Who can help?

@ncomly-nvidia

Information

Tasks

Reproduction

  1. Build tensorrt_llm: python scripts/build_wheel.py --trt_root=/usr/local/tensorrt --clean --cuda_architectures='90-real' --benchmarks
  2. Do according to https://github.com/NVIDIA/TensorRT-LLM/tree/main/benchmarks/cpp#4launch-c-disaggserverbenchmark by building a llama2-7b-tp1 and a llama2-7b-tp2 with default build args
  3. mpirun -n 7 disaggServerBenchmark --context_engine_dirs /data/models/llm/trtllm_0.16.0.dev2024112600/llama2-7b-tp1,/data/models/llm/trtllm_0.16.0.dev2024112600/llama2-7b-tp2 --generation_engine_dirs /data/models/llm/trtllm_0.16.0.dev2024112600/llama2-7b-tp1,/data/models/llm/trtllm_0.16.0.dev2024112600/llama2-7b-tp2

Expected behavior

success

actual behavior

[TensorRT-LLM][INFO] Initializing MPI with thread mode 3
[TensorRT-LLM][INFO] Initializing MPI with thread mode 3
[TensorRT-LLM][INFO] Initializing MPI with thread mode 3
[TensorRT-LLM][INFO] Initializing MPI with thread mode 3
[TensorRT-LLM][INFO] Initializing MPI with thread mode 3
[TensorRT-LLM][INFO] Initializing MPI with thread mode 3
[TensorRT-LLM][INFO] Initializing MPI with thread mode 3
[TensorRT-LLM][INFO] Initialized MPI
[TensorRT-LLM][INFO] Initialized MPI
[TensorRT-LLM][INFO] Initialized MPI
[TensorRT-LLM][INFO] Initialized MPI
[TensorRT-LLM][INFO] Initialized MPI
[TensorRT-LLM][INFO] Will Launch benchmark with 2 context engines and 2 generation engines. Context Engines:/data/models/llm/trtllm_0.16.0.dev2024112600/llama2-7b-tp1,/data/models/llm/trtllm_0.16.0.dev2024112600/llama2-7b-tp2, ; Generation Engines:/data/models/llm/trtllm_0.16.0.dev2024112600/llama2-7b-tp1,/data/models/llm/trtllm_0.16.0.dev2024112600/llama2-7b-tp2, ;
[TensorRT-LLM][INFO] Initialized MPI
[TensorRT-LLM][INFO] Initialized MPI
[40a8c9673b05:1334630] Read -1, expected 16777216, errno = 14
--------------------------------------------------------------------------
MPI_ABORT was invoked on rank 2 in communicator MPI_COMM_WORLD
with errorcode 1.

NOTE: invoking MPI_ABORT causes Open MPI to kill all MPI processes.
You may or may not see output from other processes, depending on
exactly when Open MPI kills them.
--------------------------------------------------------------------------
[40a8c9673b05:1334629] Read -1, expected 33554432, errno = 14
[40a8c9673b05:1334631] Read -1, expected 16777216, errno = 14
[40a8c9673b05:1334621] 2 more processes have sent help message help-mpi-api.txt / mpi-abort
[40a8c9673b05:1334621] Set MCA parameter "orte_base_help_aggregate" to 0 to see all help / error messages

additional notes

Thanks for your attention!

chuangz0 commented 1 day ago

please pass --dataset to the commands and verify your engine by gptManagerBenchmark first.

zhangts20 commented 1 day ago

please pass --dataset to the commands and verify your engine by gptManagerBenchmark first.

I have pass --dataset to disaggServerBenchmark, and the llama2-7b-tp1 and llama2-7b-tp2 of gptManagerBenchmark is ok.

chuangz0 commented 1 day ago

Could you comment out


        for (int sig : {SIGABRT, SIGSEGV})
        {
            __sighandler_t previousHandler = nullptr;
            if (forwardAbortToParent)
            {
                previousHandler = std::signal(sig,
                    [](int signal)
                    {
#ifndef _WIN32
                        pid_t parentProcessId = getppid();
                        kill(parentProcessId, SIGKILL);
#endif
                        MPI_Abort(MPI_COMM_WORLD, EXIT_FAILURE);
                    });
            }
            else
            {
                previousHandler = std::signal(sig, [](int signal) { MPI_Abort(MPI_COMM_WORLD, EXIT_FAILURE); });
            }
            TLLM_CHECK_WITH_INFO(previousHandler != SIG_ERR, "Signal handler setup failed");
        }
``` in `cpp/tensorrt_llm/common/mpiUtils.cpp`
and try compile it and run .

Which container do you use?
zhangts20 commented 21 hours ago

Could you comment out

        for (int sig : {SIGABRT, SIGSEGV})
        {
            __sighandler_t previousHandler = nullptr;
            if (forwardAbortToParent)
            {
                previousHandler = std::signal(sig,
                    [](int signal)
                    {
#ifndef _WIN32
                        pid_t parentProcessId = getppid();
                        kill(parentProcessId, SIGKILL);
#endif
                        MPI_Abort(MPI_COMM_WORLD, EXIT_FAILURE);
                    });
            }
            else
            {
                previousHandler = std::signal(sig, [](int signal) { MPI_Abort(MPI_COMM_WORLD, EXIT_FAILURE); });
            }
            TLLM_CHECK_WITH_INFO(previousHandler != SIG_ERR, "Signal handler setup failed");
        }
``` in `cpp/tensorrt_llm/common/mpiUtils.cpp`
and try compile it and run .

Which container do you use?

Thanks, I will try it. I install tensorrt_llm from source in my own container, and the env info is as mentioned above.

chuangz0 commented 20 hours ago

Maybe you can try in the docker image built with instruction https://nvidia.github.io/TensorRT-LLM/installation/build-from-source-linux.html#building-a-tensorrt-llm-docker-image We only have tested disaggServer in docker image base on nvcr.io/nvidia/pytorch:24.10-py3.

zhangts20 commented 19 hours ago

Could you comment out

        for (int sig : {SIGABRT, SIGSEGV})
        {
            __sighandler_t previousHandler = nullptr;
            if (forwardAbortToParent)
            {
                previousHandler = std::signal(sig,
                    [](int signal)
                    {
#ifndef _WIN32
                        pid_t parentProcessId = getppid();
                        kill(parentProcessId, SIGKILL);
#endif
                        MPI_Abort(MPI_COMM_WORLD, EXIT_FAILURE);
                    });
            }
            else
            {
                previousHandler = std::signal(sig, [](int signal) { MPI_Abort(MPI_COMM_WORLD, EXIT_FAILURE); });
            }
            TLLM_CHECK_WITH_INFO(previousHandler != SIG_ERR, "Signal handler setup failed");
        }
``` in `cpp/tensorrt_llm/common/mpiUtils.cpp`
and try compile it and run .

Which container do you use?

I rebuild tensorrt_llm, and now the error like this (There have an error about permissions):

[40a8c9673b05:1447853] Read -1, expected 33554432, errno = 14
[40a8c9673b05:1447850] *** Process received signal ***
[40a8c9673b05:1447850] Signal: Segmentation fault (11)
[40a8c9673b05:1447850] Signal code: Invalid permissions (2)
[40a8c9673b05:1447850] Failing at address: 0x9c5c12400
[40a8c9673b05:1447850] [ 0] /lib/x86_64-linux-gnu/libc.so.6(+0x42520)[0x7f39b8619520]
[40a8c9673b05:1447850] [ 1] /lib/x86_64-linux-gnu/libc.so.6(+0x1a67cd)[0x7f39b877d7cd]
[40a8c9673b05:1447850] [ 2] /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_btl_vader.so(+0x3244)[0x7f39600ec244]
[40a8c9673b05:1447850] [ 3] /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_pml_ob1.so(mca_pml_ob1_send_request_schedule_once+0x1b6)[0x7f3960048556]
[40a8c9673b05:1447850] [ 4] /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_pml_ob1.so(mca_pml_ob1_recv_frag_callback_ack+0x201)[0x7f3960046811]
[40a8c9673b05:1447850] [ 5] /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_btl_vader.so(mca_btl_vader_poll_handle_frag+0x95)[0x7f39600f0ae5]
[40a8c9673b05:1447850] [ 6] /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_btl_vader.so(+0x7e24)[0x7f39600f0e24]
[40a8c9673b05:1447850] [ 7] /lib/x86_64-linux-gnu/libopen-pal.so.40(opal_progress+0x34)[0x7f39b8a3f714]
[40a8c9673b05:1447850] [ 8] /lib/x86_64-linux-gnu/libopen-pal.so.40(ompi_sync_wait_mt+0xbd)[0x7f39b8a4c38d]                                                                                                                                 [40a8c9673b05:1447850] [ 9] /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_pml_ob1.so(mca_pml_ob1_mprobe+0x52d)[0x7f39600432fd]
[40a8c9673b05:1447850] [10] /lib/x86_64-linux-gnu/libmpi.so.40(MPI_Mprobe+0xd7)[0x7f39b8b440e7]
[40a8c9673b05:1447850] [11] /xxx/TensorRT-LLM/cpp/build/tensorrt_llm/libtensorrt_llm.so(_ZNK12tensorrt_llm3mpi7MpiComm6mprobeEiiPP14ompi_message_tP20ompi_status_public_t+0x2a)[0x7f39be09be6a]
[40a8c9673b05:1447850] [12] /xxx/TensorRT-LLM/cpp/build/tensorrt_llm/libtensorrt_llm.so(_ZN12tensorrt_llm8executor8Executor4Impl19leaderRecvReqThreadEv+0x133)[0x7f39c03c4e23]
[40a8c9673b05:1447850] [13] /xxx/TensorRT-LLM/cpp/tensorrt_llm/kernels/decoderMaskedMultiheadAttention/decoderXQAImplJIT/nvrtcWrapper/x86_64-linux-gnu/libtensorrt_llm_nvrtc_wrapper.so(+0x32e7930)[0x7f39bbee793
0]
[40a8c9673b05:1447850] [14] /lib/x86_64-linux-gnu/libc.so.6(+0x94ac3)[0x7f39b866bac3]                                                                                                                                                       [40a8c9673b05:1447850] [15] /lib/x86_64-linux-gnu/libc.so.6(+0x126850)[0x7f39b86fd850]
[40a8c9673b05:1447850] *** End of error message ***
[40a8c9673b05:1447854] Read -1, expected 16777216, errno = 14
[40a8c9673b05:1447851] *** Process received signal ***
[40a8c9673b05:1447851] Signal: Segmentation fault (11)
[40a8c9673b05:1447851] Signal code: Invalid permissions (2)
[40a8c9673b05:1447851] Failing at address: 0x9a2d12600
[40a8c9673b05:1447851] [ 0] /lib/x86_64-linux-gnu/libc.so.6(+0x42520)[0x7fb0f5419520]
[40a8c9673b05:1447851] [ 1] /lib/x86_64-linux-gnu/libc.so.6(+0x1a67cd)[0x7fb0f557d7cd]
[40a8c9673b05:1447851] [ 2] /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_btl_vader.so(+0x3244)[0x7fb09c30a244]
[40a8c9673b05:1447851] [ 3] /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_pml_ob1.so(mca_pml_ob1_send_request_schedule_once+0x1b6)[0x7fb09c165556]
[40a8c9673b05:1447851] [ 4] /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_pml_ob1.so(mca_pml_ob1_recv_frag_callback_ack+0x201)[0x7fb09c163811]
[40a8c9673b05:1447851] [ 5] /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_btl_vader.so(mca_btl_vader_poll_handle_frag+0x95)[0x7fb09c30eae5]
[40a8c9673b05:1447851] [ 6] /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_btl_vader.so(+0x7e24)[0x7fb09c30ee24]
[40a8c9673b05:1447851] [ 7] /lib/x86_64-linux-gnu/libopen-pal.so.40(opal_progress+0x34)[0x7fb0f583f714]
[40a8c9673b05:1447851] [ 8] /lib/x86_64-linux-gnu/libopen-pal.so.40(ompi_sync_wait_mt+0xbd)[0x7fb0f584c38d]
[40a8c9673b05:1447851] [ 9] /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_pml_ob1.so(mca_pml_ob1_mprobe+0x52d)[0x7fb09c1602fd]
[40a8c9673b05:1447851] [10] /lib/x86_64-linux-gnu/libmpi.so.40(MPI_Mprobe+0xd7)[0x7fb0f59440e7]
[40a8c9673b05:1447851] [11] [40a8c9673b05:1447855] Read -1, expected 16777216, errno = 14
[40a8c9673b05:1447852] *** Process received signal ***
[40a8c9673b05:1447852] Signal: Segmentation fault (11)
[40a8c9673b05:1447852] Signal code: Invalid permissions (2)
[40a8c9673b05:1447852] Failing at address: 0x9a2d12600
[40a8c9673b05:1447852] [ 0] /lib/x86_64-linux-gnu/libc.so.6(+0x42520)[0x7fc1fee19520]
[40a8c9673b05:1447852] [ 1] /lib/x86_64-linux-gnu/libc.so.6(+0x1a67cd)[0x7fc1fef7d7cd]
[40a8c9673b05:1447852] [ 2] /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_btl_vader.so(+0x3244)[0x7fc1a598e244]
[40a8c9673b05:1447852] [ 3] /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_pml_ob1.so(mca_pml_ob1_send_request_schedule_once+0x1b6)[0x7fc1a53e8556]
[40a8c9673b05:1447852] [ 4] /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_pml_ob1.so(mca_pml_ob1_recv_frag_callback_ack+0x201)[0x7fc1a53e6811]
[40a8c9673b05:1447852] [ 5] /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_btl_vader.so(mca_btl_vader_poll_handle_frag+0x95)[0x7fc1a5992ae5]
[40a8c9673b05:1447852] [ 6] /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_btl_vader.so(+0x7db1)[0x7fc1a5992db1]
[40a8c9673b05:1447852] [ 7] /xxx/TensorRT-LLM/cpp/build/tensorrt_llm/libtensorrt_llm.so(_ZNK12tensorrt_llm3mpi7MpiComm6mprobeEiiPP14ompi_message_tP20ompi_status_public_t+0x2a)[0x7fb0fae9be6a]
[40a8c9673b05:1447851] [12] /lib/x86_64-linux-gnu/libopen-pal.so.40(opal_progress+0x34)[0x7fc1ff23f714]
[40a8c9673b05:1447852] [ 8] /lib/x86_64-linux-gnu/libopen-pal.so.40(ompi_sync_wait_mt+0xbd)[0x7fc1ff24c38d]
[40a8c9673b05:1447852] [ 9] /lib/x86_64-linux-gnu/libmpi.so.40(ompi_request_default_wait+0x24b)[0x7fc1ff3192db]
[40a8c9673b05:1447852] [10] /lib/x86_64-linux-gnu/libmpi.so.40(ompi_coll_base_bcast_intra_generic+0x5ea)[0x7fc1ff36d40a]
[40a8c9673b05:1447852] [11] /lib/x86_64-linux-gnu/libmpi.so.40(ompi_coll_base_bcast_intra_pipeline+0xd1)[0x7fc1ff36e6c1]
[40a8c9673b05:1447852] [12] /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_coll_tuned.so(ompi_coll_tuned_bcast_intra_dec_fixed+0x40)[0x7fc1a534b640]
[40a8c9673b05:1447852] [13] /lib/x86_64-linux-gnu/libmpi.so.40(MPI_Bcast+0x121)[0x7fc1ff32d881]
[40a8c9673b05:1447852] [14] /xxx/TensorRT-LLM/cpp/build/tensorrt_llm/libtensorrt_llm.so(_ZN12tensorrt_llm8executor8Executor4Impl19leaderRecvReqThreadEv+0x133)[0x7fb0fd1c4e23]
[40a8c9673b05:1447851] [13] /xxx/TensorRT-LLM/cpp/build/tensorrt_llm/libtensorrt_llm.so(_ZNK12tensorrt_llm3mpi7MpiComm5bcastEPvmNS0_7MpiTypeEi+0x47)[0x7fc20489d7b7]
[40a8c9673b05:1447852] [15] /xxx/TensorRT-LLM/cpp/build/tensorrt_llm/libtensorrt_llm.so(_ZN12tensorrt_llm8executor8Executor4Impl16getNewReqWithIdsEiSt8optionalIfE+0x68b)[0x7fc206bb787b]
[40a8c9673b05:1447852] [16] /xxx/TensorRT-LLM/cpp/tensorrt_llm/kernels/decoderMaskedMultiheadAttention/decoderXQAImplJIT/nvrtcWrapper/x86_64-linux-gnu/libtensorrt_llm_nvrtc_wrapper.so(+0x32e7930)[0x7fb0f8ce793
0]
[40a8c9673b05:1447851] [14] /lib/x86_64-linux-gnu/libc.so.6(+0x94ac3)[0x7fb0f546bac3]
[40a8c9673b05:1447851] [15] /xxx/TensorRT-LLM/cpp/build/tensorrt_llm/libtensorrt_llm.so(_ZN12tensorrt_llm8executor8Executor4Impl16fetchNewRequestsEiSt8optionalIfE+0x59)[0x7fc206bc5949]
[40a8c9673b05:1447852] [17] /lib/x86_64-linux-gnu/libc.so.6(+0x126850)[0x7fb0f54fd850]
[40a8c9673b05:1447851] *** End of error message ***
/xxx/TensorRT-LLM/cpp/build/tensorrt_llm/libtensorrt_llm.so(_ZN12tensorrt_llm8executor8Executor4Impl13executionLoopEv+0x3bd)[0x7fc206bc7f5d]
[40a8c9673b05:1447852] [18] /xxx/TensorRT-LLM/cpp/tensorrt_llm/kernels/decoderMaskedMultiheadAttention/decoderXQAImplJIT/nvrtcWrapper/x86_64-linux-gnu/libtensorrt_llm_nvrtc_wrapper.so(+0x32e7930)[0x7fc2026e793
0]
[40a8c9673b05:1447852] [19] /lib/x86_64-linux-gnu/libc.so.6(+0x94ac3)[0x7fc1fee6bac3]
[40a8c9673b05:1447852] [20] /lib/x86_64-linux-gnu/libc.so.6(+0x126850)[0x7fc1feefd850]
[40a8c9673b05:1447852] *** End of error message ***
chuangz0 commented 19 hours ago

Maybe you can't start the trtllm executor in orchestrator mode in your container environment. Could you run executorExampleAdvanced in examples/cpp/executor with orchestartor mode? If your mpi is based on UCX,please set env UCX_MEMTYPE_CACHE=n Please make sure your mpi enable cuda aware. I highly recommend using and docker image based nvcr.io/nvidia/pytorch:24.10-py3.

zhangts20 commented 17 hours ago

Maybe you can't start the trtllm executor in orchestrator mode in your container environment. Could you run executorExampleAdvanced in examples/cpp/executor with orchestartor mode? If your mpi is based on UCX,please set env UCX_MEMTYPE_CACHE=n Please make sure your mpi enable cuda aware. I highly recommend using and docker image based nvcr.io/nvidia/pytorch:24.10-py3.

Thanks, I have executed executorExampleAdvanced successfully.

./build/executorExampleAdvanced --engine_dir /data/models/llm/trtllm_0.16.0.dev2024112600/llama2-7b-tp1 --input_tokens_csv_file ./inputTokens.csv --use_orchestrator_mode --worker_executable_path ../../../cpp/build/tensorrt_llm/executor_worker/executorWorker

The log of output:

[TensorRT-LLM][INFO] Engine version 0.16.0.dev2024112600 found in the config file, assuming engine(s) built by new builder API.
[TensorRT-LLM][INFO] Initializing MPI with thread mode 3
[TensorRT-LLM][INFO] Initialized MPI
[TensorRT-LLM][INFO] Initializing MPI with thread mode 3
[TensorRT-LLM][INFO] Initialized MPI
[TensorRT-LLM][INFO] Engine version 0.16.0.dev2024112600 found in the config file, assuming engine(s) built by new builder API.
[TensorRT-LLM][INFO] Refreshed the MPI local session
[TensorRT-LLM][INFO] MPI size: 1, MPI local size: 1, rank: 0                                                                                                                                                                                [TensorRT-LLM][INFO] MPI size: 1, MPI local size: 1, rank: 0
[TensorRT-LLM][INFO] Rank 0 is using GPU 0
[TensorRT-LLM][INFO] TRTGptModel maxNumSequences: 2048
[TensorRT-LLM][INFO] TRTGptModel maxBatchSize: 2048
[TensorRT-LLM][INFO] TRTGptModel maxBeamWidth: 1
[TensorRT-LLM][INFO] TRTGptModel maxSequenceLen: 2048
[TensorRT-LLM][INFO] TRTGptModel maxDraftLen: 0
[TensorRT-LLM][INFO] TRTGptModel mMaxAttentionWindowSize: (2048) * 32
[TensorRT-LLM][INFO] TRTGptModel enableTrtOverlap: 0
[TensorRT-LLM][INFO] TRTGptModel normalizeLogProbs: 1
[TensorRT-LLM][INFO] TRTGptModel maxNumTokens: 8192
[TensorRT-LLM][INFO] TRTGptModel maxInputLen: 2047 = min(maxSequenceLen - 1, maxNumTokens) since context FMHA and usePackedInput are enabled
[TensorRT-LLM][INFO] TRTGptModel If model type is encoder, maxInputLen would be reset in trtEncoderModel to maxInputLen: min(maxSequenceLen, maxNumTokens).
[TensorRT-LLM][INFO] Capacity Scheduler Policy: GUARANTEED_NO_EVICT
[TensorRT-LLM][INFO] Context Chunking Scheduler Policy: None
[TensorRT-LLM][INFO] Loaded engine size: 12869 MiB
[TensorRT-LLM][INFO] Inspecting the engine to identify potential runtime issues...
[TensorRT-LLM][INFO] The profiling verbosity of the engine does not allow this analysis to proceed. Re-build the engine with 'detailed' profiling verbosity to get more diagnostics.
[TensorRT-LLM][INFO] [MemUsageChange] Allocated 1112.01 MiB for execution context memory.
[TensorRT-LLM][INFO] [MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +0, GPU +0, now: CPU 0, GPU 12853 (MiB)
[TensorRT-LLM][INFO] [MemUsageChange] Allocated 346.17 MB GPU memory for runtime buffers.
[TensorRT-LLM][INFO] [MemUsageChange] Allocated 1.16 GB GPU memory for decoder.
[TensorRT-LLM][INFO] Memory usage when calculating max tokens in paged kv cache: total: 79.11 GiB, available: 26.40 GiB
[TensorRT-LLM][INFO] Number of blocks in KV cache primary pool: 761
[TensorRT-LLM][INFO] Number of blocks in KV cache secondary pool: 0, onboard blocks to primary memory before reuse: true
[TensorRT-LLM][INFO] Max KV cache pages per sequence: 32
[TensorRT-LLM][INFO] Number of tokens per block: 64.
[TensorRT-LLM][INFO] [MemUsageChange] Allocated 23.78 GiB for max tokens in paged KV cache (48704).
[TensorRT-LLM][INFO] Enable MPI KV cache transport.
[TensorRT-LLM][INFO] Executor instance created by worker
[TensorRT-LLM][INFO] Reading input tokens from ./inputTokens.csv
[TensorRT-LLM][INFO] Number of requests: 3
[TensorRT-LLM][INFO] Creating request with 6 input tokens
[TensorRT-LLM][INFO] Creating request with 4 input tokens
[TensorRT-LLM][INFO] Creating request with 10 input tokens
[TensorRT-LLM][INFO] Got 20 tokens for seqIdx 0 for requestId 3
[TensorRT-LLM][INFO] Request id 3 is completed.
[TensorRT-LLM][INFO] Got 14 tokens for seqIdx 0 for requestId 2
[TensorRT-LLM][INFO] Request id 2 is completed.
[TensorRT-LLM][INFO] Got 16 tokens for seqIdx 0 for requestId 1
[TensorRT-LLM][INFO] Request id 1 is completed.
[TensorRT-LLM][INFO] Writing output tokens to outputTokens.csv
[TensorRT-LLM][INFO] Exiting.
[TensorRT-LLM][INFO] Orchestrator sendReq thread exiting
[TensorRT-LLM][INFO] Orchestrator recv thread exiting
[TensorRT-LLM][INFO] Leader recvReq thread exiting
[TensorRT-LLM][INFO] Leader sendThread exiting
[TensorRT-LLM][INFO] Refreshed the MPI local session
chuangz0 commented 14 hours ago

Having trouble using nvcr.io/nvidia/pytorch:24.10-py3 -based containers?