Farmadupe / vid_dup_finder

vid_dup_finder
Apache License 2.0
7 stars 1 forks source link

--include-exts #5

Open ShareBugreports opened 10 months ago

ShareBugreports commented 10 months ago

Exploring this tool further to identify video duplicates. It appears to perform significantly faster in comparison to videohash. However, I've noticed that the tool has a few issues:

A workaround is to use the "--exclude-exts" option to blacklist specific file types. But it would be more convenient to provide users with more flexibility in deciding the strategy. Here are some suggestions:

Of course, these last two options should be mutually exclusive.

Farmadupe commented 10 months ago

I think I understand your two bullet points, but they do not describe a specific problem. It would be useful to know if such behaviour prevents you from using vid_dup_finder to detect duplicate videos.

For reference, FFMPEG is very good at detecting videos with an incorrect extension, so the default behaviour is to visit every single file Then cache is updated so that non-video files will not be visited in the future. I believe this is the same behaviour as the first suggestion.

ShareBugreports commented 10 months ago

It is indeed usable. But it in my case it generates a lot of log lines (aka: noise). Of course i can filter them out with "grep". But starting ffmpeg for each file seems just pointless.

Tried to measure it (a run with all files and a run with non-video excluded). But my sample set is to small. I think it was a couple of seconds difference. With a large collection this can add up and they are in most cases just wasted cpu cycles. (but it can be useful for some people)

But -to be honest- i am still comparing with czkawka. Both have advantages/disadvantages. I still need to verify if both handling old cache (moved files) correctly.