KLDavies / ssdeep

GNU General Public License v2.0
2 stars 0 forks source link

Receiving 100 match for two different fuzzy hashes #1

Open ross-spencer opened 7 years ago

ross-spencer commented 7 years ago

Attached are two files (twofiles.zip) I'm getting different strings for but 100 match.

Hashing file
1536:6ZmdmkLfq8/HRhOzv4lvxGyo2oDhUjYfJxIuPM9PvbmXS1aKMlv5ZagPuNKpwjj:PFLfL/xi0ShXXqPiX3KMlvbPt
Hashing file
1536:S/pXbPRCzY5dSdmkLfq8/HRhOzv4lvxGyo2oDhUjYfJxIuPM9PvbmXS1aKMlv5ZQ:PFLfL/xi0ShXXqPiX3KMlvbPt
MATCH: score = 100

I've been using a Golang wrapper so I checked in SSdeep 2.13 and to double-check the native result I hacked the sample.c program to hash and compare these two files, gist here as a demo:

https://gist.github.com/ross-spencer/ac0d5546a2511ad692aa4ff27abd9ba0

The files are publicly available via archway.govt.nz as part of the collections held by Archives NZ. And i've a handful of other culprits if you are keen on additional files for testing purposes too. Let me know if it's best to share the files, or if the hash strings are enough.

twofiles.zip

NB. Looks to be the same on the other branches as well.

a4lg commented 7 years ago

Hi, I'm Tsukasa OI, a maintainer of ssdeep. It appears it's not the bug.

If the block size is equal (1536 in this case), both first and second blocks are compared and higher score is taken. In this case, value of the second block (which represents rough representation of the file than the first one) is the same ("PFLfL/xi0ShXXqPiX3KMlvbPt") and it will return 100.

This is expected behavior (to prevent false negative I guess). Two "different" hashes can be matched perfectly (two files might be different, but can be considered very similar).