Closed anushka4124 closed 2 weeks ago
@anushka4124
It sounds good
You can make changes and try to retain folder structure ,
Also add a separate flag --fuzzy-duplicates
But remember this flag shouldn't be taking much time for scanning and finding fuzzy duplicates
@techy4shri
Please add necessary labels
@anushka4124 It sounds good You can make changes and try to retain folder structure , Also add a separate flag
--fuzzy-duplicates
But remember this flag shouldn't be taking much time for scanning and finding fuzzy duplicates @techy4shri Please add necessary labels
I have already made a PR having done all the changes that were required for the Fuzzy Duplicate File Detection. Please have a look at it! @Kota-Karthik
@anushka4124 you cannot make a PR wihout being assinged to an issue first, I will be rejecting that PR. Make a new PR and follow the template to ensure it gets reviewed and merged successfully.
Thank you so much for assigning this task to me!
I have generated another pull request, kindly check that once @Kota-Karthik @techy4shri
I would like to propose a new feature to enhance the duplicate detection capabilities of TwinTrim by introducing fuzzy matching technique. Currently, the tool relies on strict hashing to identify duplicate files, which works well for exact duplicates. However, many users might encounter near-duplicate files—files that are not identical but share high content similarity (e.g., different versions of documents, edited images).
The fuzzy matching feature would:
I have already implemented an initial version of this feature by creating a new fuzzy.py file. This includes a method _find_fuzzyduplicates that uses fuzzy matching to detect near-duplicates.
If you like the idea, please assign me the task of developing upon this feature @Kota-Karthik