Breakthrough / PySceneDetect

:movie_camera: Python and OpenCV-based scene cut/transition detection program & library.
https://www.scenedetect.com/
BSD 3-Clause "New" or "Revised" License
3.16k stars 387 forks source link

Use PySceneDetect with GPUS #76

Open sam09 opened 6 years ago

sam09 commented 6 years ago

Add instructions to compile pyscenedetect to use GPUs.

Breakthrough commented 6 years ago

Hi @sam09;

I don't believe the current implementation of PySceneDetect can be GPU accelerated just yet. There is a pull request in the works, however, that may allow this by using the native OpenCV methods instead of numpy. I haven't looked into using CUDA/OpenCL for the OpenCV Python module, but will definitely look into it (unless there's something for numpy I'm unware of?).

Ideally the core of PySceneDetect should (may) be written in C++ which would allow for integration with GPGPU constructs, but this is something that would have to be planned for a future version as the application sits. That being said, I am definitely interested in pursuing this as an option.

Thanks for the submission!

sam09 commented 6 years ago

Hi @Breakthrough. First of all apologies for the vague issue that I created. It was late night and I was really sleepy. And thanks for the a detailed response.

I tried compiling OpenCV to use CUDA (Some errors there). At the moment I think that is the best we can do. I don't think numpy has an option to use GPUs at all.

All things said, really great tool.

Breakthrough commented 6 years ago

Haha no worries @sam09, it happens - thanks for the feedback.

I would rather use OpenCL in retrospect, because I would like the software to run faster on all GPUs, including Intel/AMD, not just Nvidia. That being said, I did remember PyOpenCL, which seems a lot more mature than the last time I looked at it.

This is definitely a route worth investigating I would say, as my current plans for performance improvement hinged on rewriting parts of the core in C++ to achieve eventual multithreaded/GPGPU support. In retrospect, however, if I can go right to GPGPU support, I may be able to keep everything written in Python via PyOpenCL.

I'll leave this issue open for discussion, and will definitely be further investigating this option. If you or anyone else has any suggestions on the matter, please feel free to share!

rsomani95 commented 5 years ago

@Breakthrough

unless there's something for numpy I'm unware of?

PyTorch can work as a GPU replacement for numpy. Their syntax is similar, and the ability to convert PyTorch tensors to numpy arrays works really well too. But, this will work only on Nvidia, at least for now.

Breakthrough commented 5 years ago

@rsomani95 gotcha, will make a note of that, thanks.

My updated plan is to create a new tool called SceneStats written in C++ that does all the heavy lifting that PySceneDetect does for frame-by-frame analysis.

The idea is that PySceneDetect will invoke SceneStats to create a statsfile, with the final scene detection still be done in Python - however now it just has to load the statsfile and perform some simple data analysis to look for scene cuts, instead of having to analyzing the video in Python as well. I'm hoping this architecture will allow more flexibility and not put too many dependencies on end users whom don't require them.

If you have any comments/suggestions on this regard please feel free to leave them here. I'll leave this issue open until integration with PySceneDetect and SceneStats is complete, at which point this issue can be repoened in the SceneStats issue tracker instead. (I'm not sure if it makes more sense to follow a strictly GPGPU implementation, or if a pipelined multicore approach has the best performance gains - will need to write some benchmarks once SceneStats is up and running)

sam09 commented 5 years ago

@Breakthrough Using a C++ library to do all the heavy lifting is probably a great idea as that allows anyone to write an extension in any frontend language. I had some ad-hoc code that did something similar in CUDA and C++. Only in retrospect did I realise that most of the performance gain I achieved was from using hardware decoders to decode the video.

I think what you are proposing is pretty cool!

rsomani95 commented 5 years ago

@Breakthrough, that sounds like a great way to go forward, both w.r.t dependencies and efficiency.

Breakthrough commented 5 years ago

@sam09 interesting, thanks for the response - I assume by hardware decoder you mean the GPGPU? Was most of the performance gain due to not having to transfer the decoded frames from the CPU to the GPU?

@rsomani95 thanks as well for the reply!

sam09 commented 5 years ago

@Breakthrough Yes. decoding the frames on GPGPU and then processing it on the GPU itself gets rid of the memory transfer between CPU and GPU. It also frees up the CPU.

https://en.wikipedia.org/wiki/Nvidia_NVDEC

Breakthrough commented 5 years ago

Closing this issue as won't-fix due to the current plan for performance improvements and optimization being the SceneStats project.

If you have any comments/suggestions on this regard please feel free to leave them here, or create a new issue in the SceneStats repository referencing this one.

flavienbwk commented 4 years ago

Maybe CuPy, which is the Numpy library running on GPU might help you @Breakthrough !

Breakthrough commented 4 years ago

@flavienbwk that most definitely is a game changer, thanks for letting me know! I'll keep this issue re-opened and in the backlog for the time being. I'd still like to pursue the SceneStats project by writing the core in C++, but I don't want to discount using CuPy as an option either (it definitely seems valuable if it can achieve the same goal!).

sam09 commented 4 years ago

https://github.com/sam09/shot-detector Something I wrote for my use case sometime back. It works with <= CUDA 7.0

flavienbwk commented 4 years ago

@sam09 Tho it's no usable in Python, it may inspire @Breakthrough

sam09 commented 4 years ago

@flavienbwk Ah yes. It's in C++. In case anybody was looking for a GPU solution. :smiley:

Breakthrough commented 1 year ago

As an update on this, there are no plans to do scene detection on GPU. There have been plenty of optimizations made since this issue was first released, so most use cases should be met currently. See latest comment below for more context.

tormento commented 1 week ago

Perhaps an easier way would be to use OpenCV for Vulkan, given its compatibility with a wide range of gpus.

Breakthrough commented 1 week ago

The biggest roadblock to doing this efficiently is having the GPU do video decoding. Transferring frames from the CPU to the GPU is very costly. In the experiments I ran, the cost of that outweighed the performance gain of the processing. The algorithms PySceneDetect currently uses are also pretty cheap on the CPU, since they are done by default on a low-resolution subset of the frame in a single pass.

Another project of mine (DVR-Scan) does support CUDA via OpenCV, but that processes frames significantly slower. For that use case it achieves much higher GPU utilization since it's a heavier workload. PySceneDetect's algorithms are much more efficient, and so the relative cost of the GPU transfer becomes higher.

I'll re-open this since this is something I think we should have support for eventually. I want to first understand how to best achieve it with the Python GPU landscape and have it integrated cleanly with OpenCV/numpy (or CuPy). If someone has any proposals for how to best decode video on the GPU with Python that would be appreciated. One last point to keep in mind is that not all video codecs have GPU implementations, so the application would still have to support both.

For CUDA, both the pyNvVideoCodec docs and OpenCV docs (might be C++ only?), look promising, but I'm not sure what the de-facto standard should be with respect to Vulkan for this task.

In the meantime, examples of how people wish to integrate or use PySceneDetect on a GPU is helpful. For example, does it need to have CLI support? Can we avoid the problem of GPU decoding entirely and just port the detectors?