0x90d / videoduplicatefinder

Video Duplicate Finder - Crossplatform
1.85k stars 181 forks source link

multi gpu support #302

Open flickleafy opened 2 years ago

flickleafy commented 2 years ago

Environment

Describe the solution you'd like I have 3 gpu in my workstation, and I noticed that just a single gpu is used during the scan. Is it possible to assign a gpu and distribute the ffmpeg instances to different gpus? For example, if I choose 48 threads, it would be like 16 ffmpeg instances for each different gpu

0x90d commented 2 years ago

It turns out ffmpeg supports this but the specific GPU has to be selected / passed as argument. So managing all GPU loads in a multi threaded environment is left to VDF then. That sounds very difficult to be honest.

flickleafy commented 2 years ago

Yes, I know ffmpeg support that you assign a gpu number to offload the processing, I have done this with video encodings and decodings.

But I am not sure, what would be difficult?

Here, to list the number of GPUs: lspci | grep VGA or nvidia-smi -L

Then, I bet you already get the number of threads available in the system.

Now, you can create queues based on the amount of GPUs.

Having a queue for each GPU, now you can limit the size of the queue based:

queueSize = (number of threads / number of GPUs).

Each time a queue is empty, you can assign videos from the general list to that queue.

Obviously, this is oversimplistic, and depending the current architecture of your software, that would not be such simple. But I guess it is possible to make a try.

It is not out of reality to someone have 2 or 3 gpus around, that could be used to do a bunch of tasks to avoid having duplicate video files in the computer, in a reasonable time.

0x90d commented 2 years ago

Now, you can create queues based on the amount of GPUs.

VDF doesn't know how powerful a GPU is. Just dividing the amount of items by the number of GPUs is not the right thing. Not everyone has two identical graphic cards. Creating these queues + maintain in code cost performance as well. You shouldn't forgot that VDF doesn't encode/decode it is just taking one (or more) frames from a video file. That is actually a very cheap and fast process. It is still not clear if GPU acceleration is that much faster than CPU.

flickleafy commented 2 years ago

VDF doesn't know how powerful a GPU is.

That is why queues can be used in the first place, you put the amount of videos you want to process, but, since even on the same GPU models on the same system behave diferently, you cant be sure both GPUs will clear the queue at same time.

Then, you process one by one in the queue, and each queue in its own thread.

Each time the GPU ends processing a video in the queue, you "pop" another one from the queue. Since each queue is in its own thread, the progress of one cant interfere with the other.

When the queue is clear, you add more videos there.

It is still not clear if GPU acceleration is that much faster than CPU.

About that, I think it is relative... the GPUs I am using currently are far faster than my CPU, and if they are used concurrently, that would be even faster than using the CPU alone. But, I know some GPUs are slower, and would not be useful, and at this point, it would not worth.

dsrtusr88 commented 1 year ago

@DirtyRacer1337 this relates to a conversation being held for a similar but different project. CPU on software almost always outperforms a GPU in speed for just extracting frames; however that isn't always the case with H265 files.

JamesPous commented 1 year ago

@Flickleafy: Sorry, but I do not agree with you. In most cases the CPU speed is much faster and more effiency than any newer GPU. I am using a Intel 8700k CPU with 16 GB RAM und Asus RTX 3050 and the usage of my GPU is at 1-2 % with Video Duplicate Finder 3.x. The bottleneck for checking videos is the speed of your ssd or hard disk , where your video files are saved.

flickleafy commented 1 year ago

@JamesPous , Sorry, but honestly I think you don't know what you are talking about.

I have built from zero a video duplicate finder in python that does exactly what I said it was possible to do.

Also, it is not only possible to do what I proposed initially, but it is also possible to do multi-thread and multi-device all at once.

The only real bottleneck here is the algorithms that was implemented in Duplicate Finder 3.x, that is very inefficient to do multi-processing.

So, only to give an idea about what I have been using to process videos in parallel today... 3 GPUs nVidia 4000 series, and one Ryzen 9 7950x, and a 2Tb SSD nvme pciex 5.

To fully use GPUs, you need to do batch processing (group together all the data you want and load to the GPU at once). In some other cases you can do another strategy and start around 8 to 16 threads at once to load videos for each GPU.

And at the same time, you can also start another 32 threads to process videos using the CPU together, it simply does not hurt the GPUs, even if you start that much threads, using clever ways.