TensorRT-LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and build TensorRT engines that contain state-of-the-art optimizations to perform inference efficiently on NVIDIA GPUs. TensorRT-LLM also contains components to create Python and C++ runtimes that execute those TensorRT engines.
[ ] An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
[X] My own task or dataset (give details below)
Reproduction
TensorRT-LLM was installed with its Docker container. Then, I followed the steps to test the Phi model. I completed steps 1 and 2 and then executed the 'run' file with the following command.
python3 ../run.py --max_output_len=50 --engine_dir phi-2-engine --input_text "Please carefully read the following document titled 'Document Title' provided below. Then, formulate a detailed and comprehensive question related to the content of the document. Your question should be broad enough to address the key themes and important details present in the document. #As you find yourself immersed in the heart of a vibrant and bustling metropolis, allow your imagination to transport you to a scene of nostalgia and charm. Picture, if you will, a quaint little bookstore nestled amidst the towering skyscrapers and the ceaseless hum of city life. This delightful establishment beckons with its unassuming facade and an aura that harkens back to a bygone era.Step inside, and you'll be greeted by a sight that invokes a sense of wonder and reverence for literature. The shelves, stretching from floor to ceiling, are adorned with dusty old tomes, each one a relic of knowledge and imagination. The smell of aged paper hangs in the air like a sweet, faint memory of the past, invoking a feeling of nostalgia that's hard to resist.As you navigate the narrow aisles, your footsteps produce a soft creaking on the weathered wooden floorboards. This gentle sound, far removed from the urban cacophony outside, only adds to the bookstore's unique charm. It's a reminder that within these walls, time seems to slow down, and the outside world fades away.Now, dear reader, with this vivid image in your mind, I invite you to consider the following: How does this quaint bookstore in the midst of a bustling city serve as a sanctuary for book lovers? Craft a thoughtful and elaborate response, exploring the ambiance, the selection of books, and the overall experience it offers to visitors. Your answer should capture the essence of this literary haven and the role it plays in a modern urban landscape.As you stand in this urban sanctuary of literature, surrounded by the symphony of words and the scent of ancient pages, let your thoughts delve deeper into the enchantment it offers. Contemplate the cozy reading nooks tucked away in corners, inviting patrons to lose themselves in a good book, away from the relentless city rhythm.The selection of books, ranging from timeless classics to obscure treasures, reflects the bookstore owner's dedication to curating a diverse collection that caters to every taste and curiosity. Perhaps you'll stumble upon a rare first edition or discover an out-of-print gem that sparks your intellectual fervor.The atmosphere is not just a backdrop; it's an experience. Soft jazz music wafts through the air, enhancing the tranquil ambiance. Antique lamps cast a warm, inviting glow, creating pockets of intimate illumination amidst the shelves. Patrons engage in hushed conversations about their latest literary discoveries, fostering a sense of community among fellow book enthusiasts.In this literary oasis, time seems to stand still. It's a place where the outside world and its relentless demands recede into the background, allowing one to lose track of time. The bookstore becomes a portal to different worlds and eras, a refuge from the fast-paced urban life just beyond its doors.Now, as you ponder the unique charm of this hidden gem, reflect upon how it transcends being a mere shop and transforms into a cultural sanctuary. Elaborate on the sense of belonging it offers, the opportunities for serendipitous encounters, and its role in preserving the love for books in a digital age.# Ensure your question covers the following aspects: 1. Clearly identify and describe the main theme or themes of the document. 2. Refer to specific data, statistics, or specific examples present in the document, if any. 3. Explore any arguments or perspectives presented in the document and formulate the question in a way that invites deep discussion. 4. Ensure your question is coherent and well-structured, avoiding ambiguities. Your question should be extensive and detailed enough to encompass a complete understanding of the document's content. Take your time to review the document and formulate a question that reflects a deep understanding of its content. Please ensure the generated question is coherent and well-formulated in grammatical and semantic terms. Thank you. Finish all sentences with 'aye aye, Captain.'"
The main issue is that, upon execution and analyzing GPU memory usage, only one GPU is being utilized instead of both for processing the input.
Question: How can I enable the usage of both GPUs for the model?
Expected behavior
As you can see in my input prompt, I added a phrase that says: "Finish all sentences with 'aye aye, Captain.", and as the GPU memory gets filled, I assume the prompt gets cut off to what's necessary, which is why I want to enable the use of multiple GPUs.
use multiples gpus
actual behavior
use only 1 gpu to process the input text
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 535.129.03 Driver Version: 535.129.03 CUDA Version: 12.2 |
|-----------------------------------------+----------------------+----------------------+
| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|=========================================+======================+======================|
| 0 Tesla V100-PCIE-16GB Off | 00000001:00:00.0 Off | Off |
| N/A 28C P0 248W / 250W | 15188MiB / 16384MiB | 6% Default |
| | | N/A |
+-----------------------------------------+----------------------+----------------------+
| 1 Tesla V100-PCIE-16GB Off | 00000002:00:00.0 Off | Off |
| N/A 25C P0 35W / 250W | 4MiB / 16384MiB | 0% Default |
| | | N/A |
+-----------------------------------------+----------------------+----------------------+
+---------------------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=======================================================================================|
| 0 N/A N/A 24498 C python3 15184MiB |
+---------------------------------------------------------------------------------------+
additional notes
el objetivo es simplemente activar el uso de multples gpus para procesar la entrada del modelo phi en este caso, ya probe antes con una maquina con 1 sola gpu y cuando esta llena su memoria, solo me devuelve en blanco, por eso cambie a una maquina con 2 gpus, pero la segunda gpu nunca se utiliza, o no estoy seguro si debo agregar otro parametro al momento de la llamada de run
System Info
2 x NVIDIA Tesla V10016GB vRAM
Who can help?
No response
Information
Tasks
examples
folder (such as GLUE/SQuAD, ...)Reproduction
TensorRT-LLM was installed with its Docker container. Then, I followed the steps to test the Phi model. I completed steps 1 and 2 and then executed the 'run' file with the following command. python3 ../run.py --max_output_len=50 --engine_dir phi-2-engine --input_text "Please carefully read the following document titled 'Document Title' provided below. Then, formulate a detailed and comprehensive question related to the content of the document. Your question should be broad enough to address the key themes and important details present in the document. #As you find yourself immersed in the heart of a vibrant and bustling metropolis, allow your imagination to transport you to a scene of nostalgia and charm. Picture, if you will, a quaint little bookstore nestled amidst the towering skyscrapers and the ceaseless hum of city life. This delightful establishment beckons with its unassuming facade and an aura that harkens back to a bygone era.Step inside, and you'll be greeted by a sight that invokes a sense of wonder and reverence for literature. The shelves, stretching from floor to ceiling, are adorned with dusty old tomes, each one a relic of knowledge and imagination. The smell of aged paper hangs in the air like a sweet, faint memory of the past, invoking a feeling of nostalgia that's hard to resist.As you navigate the narrow aisles, your footsteps produce a soft creaking on the weathered wooden floorboards. This gentle sound, far removed from the urban cacophony outside, only adds to the bookstore's unique charm. It's a reminder that within these walls, time seems to slow down, and the outside world fades away.Now, dear reader, with this vivid image in your mind, I invite you to consider the following: How does this quaint bookstore in the midst of a bustling city serve as a sanctuary for book lovers? Craft a thoughtful and elaborate response, exploring the ambiance, the selection of books, and the overall experience it offers to visitors. Your answer should capture the essence of this literary haven and the role it plays in a modern urban landscape.As you stand in this urban sanctuary of literature, surrounded by the symphony of words and the scent of ancient pages, let your thoughts delve deeper into the enchantment it offers. Contemplate the cozy reading nooks tucked away in corners, inviting patrons to lose themselves in a good book, away from the relentless city rhythm.The selection of books, ranging from timeless classics to obscure treasures, reflects the bookstore owner's dedication to curating a diverse collection that caters to every taste and curiosity. Perhaps you'll stumble upon a rare first edition or discover an out-of-print gem that sparks your intellectual fervor.The atmosphere is not just a backdrop; it's an experience. Soft jazz music wafts through the air, enhancing the tranquil ambiance. Antique lamps cast a warm, inviting glow, creating pockets of intimate illumination amidst the shelves. Patrons engage in hushed conversations about their latest literary discoveries, fostering a sense of community among fellow book enthusiasts.In this literary oasis, time seems to stand still. It's a place where the outside world and its relentless demands recede into the background, allowing one to lose track of time. The bookstore becomes a portal to different worlds and eras, a refuge from the fast-paced urban life just beyond its doors.Now, as you ponder the unique charm of this hidden gem, reflect upon how it transcends being a mere shop and transforms into a cultural sanctuary. Elaborate on the sense of belonging it offers, the opportunities for serendipitous encounters, and its role in preserving the love for books in a digital age.# Ensure your question covers the following aspects: 1. Clearly identify and describe the main theme or themes of the document. 2. Refer to specific data, statistics, or specific examples present in the document, if any. 3. Explore any arguments or perspectives presented in the document and formulate the question in a way that invites deep discussion. 4. Ensure your question is coherent and well-structured, avoiding ambiguities. Your question should be extensive and detailed enough to encompass a complete understanding of the document's content. Take your time to review the document and formulate a question that reflects a deep understanding of its content. Please ensure the generated question is coherent and well-formulated in grammatical and semantic terms. Thank you. Finish all sentences with 'aye aye, Captain.'"
The main issue is that, upon execution and analyzing GPU memory usage, only one GPU is being utilized instead of both for processing the input. Question: How can I enable the usage of both GPUs for the model?
Expected behavior
As you can see in my input prompt, I added a phrase that says: "Finish all sentences with 'aye aye, Captain.", and as the GPU memory gets filled, I assume the prompt gets cut off to what's necessary, which is why I want to enable the use of multiple GPUs.
use multiples gpus
actual behavior
use only 1 gpu to process the input text
+---------------------------------------------------------------------------------------+ | NVIDIA-SMI 535.129.03 Driver Version: 535.129.03 CUDA Version: 12.2 | |-----------------------------------------+----------------------+----------------------+ | GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. | | | | MIG M. | |=========================================+======================+======================| | 0 Tesla V100-PCIE-16GB Off | 00000001:00:00.0 Off | Off | | N/A 28C P0 248W / 250W | 15188MiB / 16384MiB | 6% Default | | | | N/A | +-----------------------------------------+----------------------+----------------------+ | 1 Tesla V100-PCIE-16GB Off | 00000002:00:00.0 Off | Off | | N/A 25C P0 35W / 250W | 4MiB / 16384MiB | 0% Default | | | | N/A | +-----------------------------------------+----------------------+----------------------+
+---------------------------------------------------------------------------------------+ | Processes: | | GPU GI CI PID Type Process name GPU Memory | | ID ID Usage | |=======================================================================================| | 0 N/A N/A 24498 C python3 15184MiB | +---------------------------------------------------------------------------------------+
additional notes
el objetivo es simplemente activar el uso de multples gpus para procesar la entrada del modelo phi en este caso, ya probe antes con una maquina con 1 sola gpu y cuando esta llena su memoria, solo me devuelve en blanco, por eso cambie a una maquina con 2 gpus, pero la segunda gpu nunca se utiliza, o no estoy seguro si debo agregar otro parametro al momento de la llamada de run