IS that one GPU can only support one tensorRT ENGINE inference?

brilliant-soilder commented 2 years ago

IS that one GPU can only support one tensorRT ENGINE inference? IS that the Deployment would automatically find the Most optimal resource of one GPU?

When I opened one .exe using the tensorRT engine to infer, the GPU memory usage is about 40%. Accordingly, it should be OK for the GPU to run 2.exe parallelly. However, it can only inferences serially as the time doubled.

Why can't two kernels run in parallel? Now, the parallel of HtoD data transmission and kernel computing has been implemented, but the parallel of two kernel computing hasn't been realized. So the reason is that the GPU resources used are not enough? Is there a problem with my method? Is there a sample that can be used for reference? Much respect and Thanks!

jiangjiajun commented 2 years ago

How about the GPU-util? I think the GPU memory usage just means it has enough memory to inference 2 engines in the same time, but if the GPU-util is high with one TRT engine already, then there's no need inference with 2 engines.

jiangjiajun commented 8 months ago

此ISSUE由于一年未更新，将会关闭处理，如有需要，可再次更新打开。

PaddlePaddle / FastDeploy

IS that one GPU can only support one tensorRT ENGINE inference? #383