NVIDIA-AI-IOT / cuDLA-samples

YOLOv5 on Orin DLA
Other
187 stars 18 forks source link

Will cuDA standalone mode occupy cuda gpu resources? #18

Open ou525 opened 10 months ago

ou525 commented 10 months ago

I want to deploy the yolov5 model on both gpu and dla at the same time. Will there be resource competition issues between the two? What I learned before is that dla has unsupported layers, such as yolov5, which will use cuda resources, resulting in a significant decrease in efficiency.

lynettez commented 10 months ago

Using cuDLA requires all layers can be supported by DLA, we moved several unsupported layers into post-processing, so that it won't use GPU resource in the runtime. Comparing to cuDLA Hybrid mode, cuDLA Standalone mode won't create CUDA context, that will be no CUDA context switching overhead for multiple processes case.

ou525 commented 10 months ago

If so, after solving this problem #15, I can safely run different models on dla and gpu.

ou525 commented 9 months ago

I conducted tests, and when I executed the command with USE_DLA_STANDALONE_MODE=1 and USE_DETERMINISTIC_SEMAPHORE=1 along with another deep learning model program, the time taken increased significantly compared to running either one individually. It appears that these two options do have an impact.

lynettez commented 2 months ago

Then it should be due to bandwidth-bound. DLA and GPU both consume the same resource: system DRAM. The more bandwidth-bound a workload is, the higher the chances that both DLA and GPU will become bottlenecked for memory access when running in parallel.