ahmadqmalzoubi / file-duplicate-finder

Python code to find duplicate files in a base directory. blake2b hash is used to find the duplicates. The code returns the number of the duplicate files with their full path names.
GNU General Public License v3.0
1 stars 1 forks source link

Simplify duplicate_files_dict creation code #6

Open ahmadqmalzoubi opened 1 month ago

ahmadqmalzoubi commented 1 month ago

The code to create duplicate_files_dict dictionary is complicated. It can be simplified to be more concise and better readable.

Current code:

duplicate_files_dict = {}

for size, hashes_dict in files_dict.items():
    for hash in hashes_dict:
        if len(files_dict[size][hash]) > 1:
            if size in duplicate_files_dict:
                duplicate_files_dict[size][hash] = files_dict[size][hash]
            else:
                duplicate_files_dict[size] = {}
                duplicate_files_dict[size][hash] = files_dict[size][hash]

Suggested code change

duplicate_files_dict = {size: hashes for size, hashes in files_dict.items() if any(len(paths) > 1 for paths in hashes.values())}
ahmadqmalzoubi commented 4 weeks ago

This issue will be put on hold. The suggested code does not work properly as it will print some unique files in the output as duplicate files.

ahmad@server1:~/workspace/file-duplicate-finder$ ./file-duplicate-finder.py --minsize 1 ../duplicates/

# Duplicates:

File Size       Files Hash with the list of duplicate files

2.8 KiB
         953049b2fd02cd35b266161babe44cb3cdb94c21013ff591deaf6276016e1b0c2424ff0aa092ae4f49f35a57bdd05078b069708666a9f7ad4f7fe9d5ff714a2b
         ['/home/ahmad/workspace/duplicates/duplicates.py', '/home/ahmad/workspace/duplicates/duplicates2.py', '/home/ahmad/workspace/duplicates/aaa']

         72c251ecb504cb79d43b8c98bd2ce87648d699fe510fb85dc8bdc90b517c292e74f1dd001fa472fddec66a0085e76f0d28fcbec88989b92dd47c0c50735fefdf
         ['/home/ahmad/workspace/duplicates/diffduplicates2.py', '/home/ahmad/workspace/duplicates/diffduplicates.py', '/home/ahmad/workspace/duplicates/bbb']

32.5 KiB
         81d7cd027da1b6ab631e43f377838a680c67e68ed221f8b2f97c8a3066f8575cbc96c80dfa99fce5cd670589c6814d4cdc1b5b1cf8f41a59ddbb4c4361975b5c
         ['/home/ahmad/workspace/duplicates/man.man', '/home/ahmad/workspace/duplicates/dup.man']

         771207e73cb9d461ccf1113ea3c933f18024261fcfadb2cc4c9f4293e86839b087b07ab80eeb8ffa56b8a8b9c958467867af662a2dc2673bb0b9760961e6d7ec
         ['/home/ahmad/workspace/duplicates/diffman.man']

## Searching for duplicate files in the Base Directory: /home/ahmad/workspace/duplicates ->

There are 4 groups of duplicate files with 9 total number of files in these groups, whereof 5 are duplicates.

ahmad@server1:~/workspace/file-duplicate-finder$