Kota-Karthik / twinTrim

TwinTrim is a powerful and efficient tool designed to find and manage duplicate files across directories.
MIT License
17 stars 63 forks source link

Add Fuzzy Matching Feature to Enhance Duplicate Detection #84

Closed anushka4124 closed 2 weeks ago

anushka4124 commented 3 weeks ago

I would like to propose a new feature to enhance the duplicate detection capabilities of TwinTrim by introducing fuzzy matching technique. Currently, the tool relies on strict hashing to identify duplicate files, which works well for exact duplicates. However, many users might encounter near-duplicate files—files that are not identical but share high content similarity (e.g., different versions of documents, edited images).

The fuzzy matching feature would:

I have already implemented an initial version of this feature by creating a new fuzzy.py file. This includes a method _find_fuzzyduplicates that uses fuzzy matching to detect near-duplicates.

If you like the idea, please assign me the task of developing upon this feature @Kota-Karthik

Kota-Karthik commented 3 weeks ago

@anushka4124 It sounds good You can make changes and try to retain folder structure , Also add a separate flag --fuzzy-duplicates But remember this flag shouldn't be taking much time for scanning and finding fuzzy duplicates @techy4shri Please add necessary labels

anushka4124 commented 3 weeks ago

@anushka4124 It sounds good You can make changes and try to retain folder structure , Also add a separate flag --fuzzy-duplicates But remember this flag shouldn't be taking much time for scanning and finding fuzzy duplicates @techy4shri Please add necessary labels

I have already made a PR having done all the changes that were required for the Fuzzy Duplicate File Detection. Please have a look at it! @Kota-Karthik

techy4shri commented 3 weeks ago

@anushka4124 you cannot make a PR wihout being assinged to an issue first, I will be rejecting that PR. Make a new PR and follow the template to ensure it gets reviewed and merged successfully.

anushka4124 commented 3 weeks ago

Thank you so much for assigning this task to me!

I have generated another pull request, kindly check that once @Kota-Karthik @techy4shri