Open ou525 opened 10 months ago
Using cuDLA requires all layers can be supported by DLA, we moved several unsupported layers into post-processing, so that it won't use GPU resource in the runtime. Comparing to cuDLA Hybrid mode, cuDLA Standalone mode won't create CUDA context, that will be no CUDA context switching overhead for multiple processes case.
If so, after solving this problem #15, I can safely run different models on dla and gpu.
I conducted tests, and when I executed the command with USE_DLA_STANDALONE_MODE=1 and USE_DETERMINISTIC_SEMAPHORE=1 along with another deep learning model program, the time taken increased significantly compared to running either one individually. It appears that these two options do have an impact.
Then it should be due to bandwidth-bound. DLA and GPU both consume the same resource: system DRAM. The more bandwidth-bound a workload is, the higher the chances that both DLA and GPU will become bottlenecked for memory access when running in parallel.
I want to deploy the yolov5 model on both gpu and dla at the same time. Will there be resource competition issues between the two? What I learned before is that dla has unsupported layers, such as yolov5, which will use cuda resources, resulting in a significant decrease in efficiency.