Open ross-spencer opened 8 years ago
Hi, I'm Tsukasa OI, a maintainer of ssdeep. It appears it's not the bug.
If the block size is equal (1536 in this case), both first and second blocks are compared and higher score is taken. In this case, value of the second block (which represents rough representation of the file than the first one) is the same ("PFLfL/xi0ShXXqPiX3KMlvbPt") and it will return 100.
This is expected behavior (to prevent false negative I guess). Two "different" hashes can be matched perfectly (two files might be different, but can be considered very similar).
Attached are two files (twofiles.zip) I'm getting different strings for but 100 match.
I've been using a Golang wrapper so I checked in SSdeep 2.13 and to double-check the native result I hacked the sample.c program to hash and compare these two files, gist here as a demo:
https://gist.github.com/ross-spencer/ac0d5546a2511ad692aa4ff27abd9ba0
The files are publicly available via archway.govt.nz as part of the collections held by Archives NZ. And i've a handful of other culprits if you are keen on additional files for testing purposes too. Let me know if it's best to share the files, or if the hash strings are enough.
twofiles.zip
NB. Looks to be the same on the other branches as well.