Open h9419 opened 2 years ago
Although this is faster, one major bottleneck is still in VideoDataset. When inferring on a 4k HEVC video, around 80% of the execution time is spent on VideoDataset decode. Future work can focus on using NVDEC or other GPU-accelerated video loaders.
Although this is faster, one major bottleneck is still in VideoDataset. When inferring on a 4k HEVC video, around 80% of the execution time is spent on VideoDataset decode. Future work can focus on using NVDEC or other GPU-accelerated video loaders.
I have made a version of it to work with Nvidia's vpf library that takes advantage of nvenc and nvdec hardware accelerators for video, and directly creates GPU tensors without involving the CPU. It works inside a docker container under WSL.
However, I don't plan to publish the code since I don't think I can redistribute or publish nvenc/nvdec/x264 binaries and my glue code only works with a self compiled version when I wrote the code.
One thing I can verify is that the claimed inference speed is achievable on consumer grade GPUs, and GeForce RTX series GPU can be faster than Quadro RTX simply because of the nvenc/nvdec performance
Two improvements are made in this contribution:
This modification allowed for about three times the performance on my system with R7 5800H and RTX3060 mobile. Using the same 4k video on both resnet50 and resnet101 models, the original version ran at 2.20it/s whereas this runs at an average of 7.5it/s.