Closed wiene closed 1 year ago
This is work in progress now. As you suggested, I'm going to introduce new command line argument --lsh [LSH]
, and deprecate --ssdeep
but making it hidden for backward compatibility.
It's done. Please pull the latest code.
New --lsh [LSH]
switch allows to choose from ssdeep
and tlsh
and selects ssdeep
by default if none is provided. Argument --ssdeep
is deprecated, hidden from the help screen, but still available.
Thanks a lot for your work. To perform a quick test I cherry-picked commit bf192bd and applied it on top of release 20221213. Unfortunately testing TLSH using this setup I occasionally ended up with the following error message:
Exception in thread Thread-14:
Traceback (most recent call last):
File "/usr/lib/python3.11/threading.py", line 1038, in _bootstrap_inner
self.run()
File "/usr/lib/python3/dist-packages/dnstwist.py", line 855, in run
task['tlsh'] = int(100 - (min(tlsh.diff(self.lsh_init, lsh_curr), 300)/3))
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ValueError: argument is not a TLSH hex string
Sadly this issue does not seem to be 100 % reproducible.
I reproduced this issue using:
>>> tlsh.__version
0.2.0
It's been fixed in commit 81896c3063f7d007467ccdacb21f5aec8fa8051c.
Thanks a lot for your fix. I tried again including the changes from commit 81896c3. I can confirm that the issue reported yesterday has disappeared.
To address #170 you kindly added support for TLSH. The present implementation chooses the used fuzzy hash function based on the available packages:
ssdeep
is available, it is chosen, elseppdeep
is available, it is chosen, elsetlsh
is available, it is chosen, elseWhile I have no particular knowledge about fuzzy hashing, a quick internet search seems to suggest that typically
ssdeep
performs worse than other functions (see e. g. this paper). Therefore I was considering switching to TLSH for the Debiandnstwist
package. Do you think this is a good idea?The reason for opening #170 was the imminent removal of
ssdeep
from Debian. In the meantime a new Debian maintainer forssdeep
took over and fixed a build failure issue, such that the package is kept in Debian for the time being. This leaves me with the following situation:If I switch from
python3-ssdeep
topython3-tlsh
as recommendeddnstwist
package dependency in Debian, the used fuzzy hash function depends on whether the user might have installedpython3-ssdeep
or not. I think this is an undesirable situation since it might lead to confusion if people compare results obtained on different computers. Therefore I wonder whether adding a switch which allows to explicitly request a particular fuzzy hash function is a helpful feature.If you have other ideas how to address this issue, suggestions are welcome. :-)