Removed repeated copying of background image to GPU memory, minimizing the effect of a memory bandwidth bottleneck
Increased GPU utilization by offloading the CPU video encoding to children threads as soon as it is copied to CPU memory, freeing up the parent process to begin processing the next frame
This modification allowed for about twice the performance on my system with R7 5800H and RTX3060 mobile. Using the same 4k video on both resnet50 and resnet101 models, the original version ran at 2.20it/s whereas this runs at an average of 4.65it/s.
Two improvements are made in this contribution:
This modification allowed for about twice the performance on my system with R7 5800H and RTX3060 mobile. Using the same 4k video on both resnet50 and resnet101 models, the original version ran at 2.20it/s whereas this runs at an average of 4.65it/s.