cloudtracer / ssdeep.js

Pure JS module for SSDEEP hashing
MIT License
16 stars 7 forks source link

Different similiarity outputs between libraries #1

Open WJDigby opened 5 years ago

WJDigby commented 5 years ago

Hello,

Thank you for providing this code.

This library outputs different "similarity" ratings when comparing two hashes than other ssdeep libraries / examples:

Python3 ssdeep library and the same Eicar strings used in the readme:

>>> e1 = ssdeep.hash("X5O!P%@AP[4\\PZX54(P^)7CC)7}$EICAR-STANDARD-ANTIVIRUS-TEST-FILE!$H+H*")
>>> e2 = ssdeep.hash("X5O!P%@AP[4\\PZX54(P^)7CC)7}$EICAR-THREATPINCH-ANTIVIRUS-TEST-FILE!$H+H*")
>>> e1
'3:a+JraNvsgzsVqSwHq9:tJuOgzsko'
>>> e2
'3:a+JraNvsg7QhyqzWwHq9:tJuOg7Q4Wo'
>>> ssdeep.compare(e1, e2)
18

JavaScript ssdeep.js library:

>> e1 = ssdeep.digest("X5O!P%@AP[4\\PZX54(P^)7CC)7}$EICAR-STANDARD-ANTIVIRUS-TEST-FILE!$H+H*")
"3:a+JraNvsgzsVqSwHq9:tJuOgzsko"
>> e2 = ssdeep.digest("X5O!P%@AP[4\\PZX54(P^)7CC)7}$EICAR-THREATPINCH-ANTIVIRUS-TEST-FILE!$H+H*")
"3:a+JraNvsg7QhyqzWwHq9:tJuOg7Q4Wo"
>> ssdeep.similarity(e1, e2)
70

Both libraries produce identical hashes.

The ssdeep online demo also produces a value of 18 when comparing the two Eicar strings:

image

​Is this intended behavior? Is there a "weight" or some metric that can adjust the grading scale of the comparison?

gehaxelt commented 4 years ago

I noticed the same. Any idea why this happens @cloudtracer ?

memcorrupt commented 1 month ago

I noticed this library has a few bugs in its comparison algorithm and is also inefficient since it runs synchronously. I created a project fast-ssdeep that binds to the ssdeep C API to provide a performant and compliant implementation.

Not sure if this repository is maintained at all. If it isn't, it would be nice if the maintainer could mention my project.