DinoTools / python-ssdeep

Python wrapper for ssdeep fuzzy hashing library
GNU Lesser General Public License v3.0
152 stars 30 forks source link

Hash class vs hash function #14

Closed davidt99 closed 8 years ago

davidt99 commented 8 years ago

It's more a question than an issue: After looking at the documentation and the code of Hash, PseudoHash and hash function, I wanted to know if there is a difference between using Hash class and the hash function given I have ssdeep version 2.10 or above. PseudoHash has a warning about big files, and it uses the hash function as its implementation, so I wonder if this warning applies to the hash function as well.

phibos commented 8 years ago

Yes.

TL;DR

The Hash() class uses an internal function of ssdeep/libfuzzy to calculate the hash. The main advantage is that the data can be read and passed as small chunks into the hashing object. This means only one chunk and some meta data has to be kept at a time.

The PseudoHash() class uses an internal variable called _data to store the data before the hash can be calculated. This means even if you pass only chunks into the hashing object the data is concatenated and has to be kept in memory.

To use the hash() function the data has to be read into memory and passed to the function.

I hope this helps.

davidt99 commented 8 years ago

Yes, that's help :)