HaveAGitGat / Tdarr

Tdarr - Distributed transcode automation using FFmpeg/HandBrake + Audio/Video library analytics + video health checking (Windows, macOS, Linux & Docker)
Other
2.98k stars 92 forks source link

Ability to detect duplicate files using frame analysis #528

Closed dsrtusr88 closed 2 years ago

dsrtusr88 commented 2 years ago

Is your feature request related to a problem? Please describe. I find that I often have a lot of files that are duplicates either of the same resolution with different names, or worse, different resolutions. Some programs like Plex can catch this, but they're relying on a successful match of file naming to something in imdb or tmdb which isn't a perfect solution.

Describe the solution you'd like If Tdarr is already doing a frame by frame analysis with health checking, I would like it to also flag when files are almost identical based on certain thresholds or catch frames that are identical, but at different resolutions. If possible, if certain thresholds are met, (exact match), auto-deletion features. (Movie A is transcoded at Time X, at Time X+10 Movie B is loaded in the transcode queue, Tdarr detects it is the same job as Movie A as well as same or worse quality and filters for deletion rather than waste the transcode compute).

Describe alternatives you've considered Video Comparer is a pretty good app on Windows doing exactly this. It is Windows only, CPU processing only, and doesn't handle massive scale well. It obviously, doesn't integrate into the Tdarr workflow queue either, so you can't do in-the-loop duplicate checks.

Additional context As you are working on deploying Tdarr to business functions, it seems like this would be a great selling point. If people are hosting and storing files that are duplicates that they just haven't detected, this saves space and storage costs.

tordenflesk commented 2 years ago

https://github.com/0x90d/videoduplicatefinder/

dsrtusr88 commented 2 years ago

https://github.com/0x90d/videoduplicatefinder/

Yea, I like this. I'd still like it to be in-line with Tdarr. Especially with sharing GPU resources.

HaveAGitGat commented 2 years ago

Tdarr itself doesn't do 'frame' analysis, that's something FFmpeg does internally on a pixel/codec level when decoding.

The best Tdarr would be able to do atm would be checking things like filename, length, resolution, codec etc for duplicates which is what that project does already.

HaveAGitGat commented 2 years ago

Added to .14:

image It's part of the pro version though. It helped me find a lot of exact duplicates and will be improving the non-exact duplicates in time, perhaps with some extra filters.

dsrtusr88 commented 2 years ago

@HaveAGitGat slick add, thank you! In a future build, it would be handy to be able to delete straight from this view. I'd also like to be able to see things tags like h264 or HEVC so I can save time for not worrying about a transcode.

HaveAGitGat commented 2 years ago

@HaveAGitGat slick add, thank you! In a future build, it would be handy to be able to delete straight from this view. I'd also like to be able to see things tags like h264 or HEVC so I can save time for not worrying about a transcode.

Thanks, can add those details in for the next version 👍

HaveAGitGat commented 2 years ago

@dsrtusr88 how's this?

image

dsrtusr88 commented 2 years ago

@HaveAGitGat that's perfect!

jeffward01 commented 2 years ago

@dsrtusr88 how's this?

image

@HaveAGitGat

How does this determine similarity? PHash? Is there a link to the algorithm you use to determine the PHash and similarity?

Very cool app!

Off Topic: Question:

I am a C# dev, admittedly, I have not reviewed this code. Is Tdarr built in a modular way so that I can contribute some C# modules, for example a 'frame by frame' eval using an .exe written in C#, then that can be built into Tdarr?

Thanks!

HaveAGitGat commented 2 years ago

Hey @jeffward01 thanks.

No it just compares the collected file metadata and checks how many similarities there are (with some weighting).

Tdarr is written using Typescript/NodeJS and is closed source (so probably not what you're after in that regard) but yeah it could do something like that. Main issue is would need to be cross-platform for all the platforms Tdarr supports.