NVIDIA / TensorRT

NVIDIA® TensorRT™ is an SDK for high-performance deep learning inference on NVIDIA GPUs. This repository contains the open source components of TensorRT.
https://developer.nvidia.com/tensorrt
Apache License 2.0
10.65k stars 2.12k forks source link

Can I run two context at the same time #4132

Open zhongym-sonoscape opened 3 weeks ago

zhongym-sonoscape commented 3 weeks ago

I create two contexts from the same engine, and want to perform inference simultaneously, but from the nsys file, these two context didn't run at the same time. Want to know why and what is the difference between this situation and the example provided from https://developer.download.nvidia.com/CUDA/training/StreamsAndConcurrencyWebinar.pdf

Image

lix19937 commented 3 weeks ago

Each thread use a stream ? @zhongym-sonoscape

Maybe you can use MPS

zhongym-sonoscape commented 3 weeks ago

cudaStream_t stream1; cudaStream_t stream2; context->enqueueV2(buffer1, stream1,nullprt); context2->enqueueV2(buffer2, stream2,nullprt);

And these codes are written in the same main function. Thank you!

sdZ1123 commented 3 weeks ago

@zhongym-sonoscape Hi, I have the same problem. Have you solved it?

zhongym-sonoscape commented 2 weeks ago

@zhongym-sonoscape Hi, I have the same problem. Have you solved it?

not yet :)