Closed kevinpeters811 closed 2 years ago
In that case you should have ssdeep score close to 100%. What version of dnstwist are you using? By default, if only original domain name is provided as input, dnstwist connects using http://
protocol. I guess the original website redirects such a queries to https://
. Are you serving the mirrored web page on both http and https? You can also run the tool with --debug
argument. It's a bit noisy but you should be able to filter out HTTP connection related issues.
Thanks. I took a closer look and ssdeep does indeed return zero in your program. As a test, I grabbed your r.normalized content for both the original and my clone and then manually called ssdeep to hash and compare and also got a zero, even though the content is clearly virtually identical test.txt .
Both r.normalized_content
are very similar, but not identical. That explains different ssdeep hashes, although zero score is a bit surprising. Could you share raw r.content
too?
There is zero score for the raw inputs too. The main reason is that the inputs use different line endings conventions (CRLF vs LF). Nonetheless, I think I can tune the content normalizer a bit and get a positive score. Stay tuned.
Great. Thanks.
On Tue, Mar 1, 2022 at 5:13 PM Marcin Ulikowski @.***> wrote:
There is zero score for the raw inputs too. The main reason is that the inputs use different line endings conventions (CRLF vs LF). Nonetheless, I think I can tune the content normalizer a bit and get a positive score. Stay tuned.
— Reply to this email directly, view it on GitHub https://github.com/elceef/dnstwist/issues/147#issuecomment-1055910517, or unsubscribe https://github.com/notifications/unsubscribe-auth/ADJJYB3FGWOKZC22R4LDKI3U52I63ANCNFSM5PSH3PFQ . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.
You are receiving this because you authored the thread.Message ID: @.***>
Pull the most recent version and try it. I'm getting ssdeep score 60%.
I get 60% as well. Thank you very much for looking into this.
Thanks for bringing this up. Initially I made it 43%, but then added code which clears attribute values for certain HTML tags, which are usually modified when an offline snapshot (mirror) is made. I think the ssdeep feature should be more accurate now.
You might want to look at the phash feature as well which has been introduced recently. In short, it renders web pages, takes screenshots and compares them visually.
This is indeed a rather cool feature, looking forward to use it in the next release :+1: .
I believe this also adds new recommended dependencies — will Chromedriver also be part of the dnstwist
Docker container?
I haven't decided yet, but most likely it won't. I'd like to keep the Docker container as small as possible. Introducing chromedriver and depending web browser will make it a few times heavier. For now I consider the pHash feature extra/experimental.
Good point, keeping the container as small as possible is a good goal for most use cases, and chromedriver
is indeed very bulky.
Another option would be to add a second, more "fat" container with all "extra features", using the first container as base, adding all the additional features as another layer. Of course, that increases maintenance effort (and / or automation effort). An advantage would be that it increases the number of potential testers.
I cloned the main page of one of our web sites and put it on a server with a known permutation that I registered.
Since the main pages are the same, I expected ssdeep would identify them as very similar. But I am getting a zero ssdeep value.
Any ideas?