different file size is ignored if for some reason md5 of two files is identical

adrianlopezroche / fdupes

FDUPES is a program for identifying or deleting duplicate files residing within specified directories.

2.42k stars 186 forks source link

different file size is ignored if for some reason md5 of two files is identical #182

Open kabu0001 opened 4 months ago

kabu0001 commented 4 months ago

As the title says, i get erratic duplicate results if for some reason (dont ask me how that could happen) two files have the same name, same md5 hash but completely different sizes.

stvoidit commented 2 months ago

Today I found this program and compiled it from the source code. I was very surprised to see the use of md5, although it seems that it is quite difficult to catch a real collision for files. In this case, it is more strange that the file size is not checked - it seems more critical than choosing an algorithm for hashing.

jbruchon commented 2 months ago

The hash is a fast exclusion feature. It is not used to determine if files are the same; it is used to determine if files are different. A hash collision does not cause a false positive duplicate result.