PaddlePaddle / FastDeploy

⚡️An Easy-to-use and Fast Deep Learning Model Deployment Toolkit for ☁️Cloud 📱Mobile and 📹Edge. Including Image, Video, Text and Audio 20+ main stream scenarios and 150+ SOTA models with end-to-end optimization, multi-platform and multi-framework support.
https://www.paddlepaddle.org.cn/fastdeploy
Apache License 2.0
2.97k stars 461 forks source link

IS that one GPU can only support one tensorRT ENGINE inference? #383

Closed brilliant-soilder closed 8 months ago

brilliant-soilder commented 2 years ago

IS that one GPU can only support one tensorRT ENGINE inference? 9 9 IS that the Deployment would automatically find the Most optimal resource of one GPU?

When I opened one .exe using the tensorRT engine to infer, the GPU memory usage is about 40%. Accordingly, it should be OK for the GPU to run 2.exe parallelly. However, it can only inferences serially as the time doubled.

Why can't two kernels run in parallel? Now, the parallel of HtoD data transmission and kernel computing has been implemented, but the parallel of two kernel computing hasn't been realized. So the reason is that the GPU resources used are not enough? Is there a problem with my method? Is there a sample that can be used for reference? Much respect and Thanks! 9

jiangjiajun commented 2 years ago

How about the GPU-util? I think the GPU memory usage just means it has enough memory to inference 2 engines in the same time, but if the GPU-util is high with one TRT engine already, then there's no need inference with 2 engines.

image

jiangjiajun commented 8 months ago

此ISSUE由于一年未更新,将会关闭处理,如有需要,可再次更新打开。