Closed ewilded closed 5 years ago
Hi Julian,
Have you tried using --fuzzy-logic on this dataset? This uses levinstien distance to measure page differences (it's useful for bypassing cases where items such as the time are on the page) and is useful in scenarios like this.
If that doesn't work, but you can see another solution, We'd certainly appreciate a pull request!
Regards,
Michael
@ewilded touching base to see where we're headed with this?
Hi, my apologies, I have been quite busy lately. I tried the --fuzzy-logic method but did not help with clear results. Will poke around the code trying to get the desired results without interfering with anything that's already working well. If successful, I'll come up with a pull request. Thanks.
That would be greatly appreciated! Reach out if you have any questions (can also dm me on twitter under @codingo_ if easier).
OK, sorry it took so long. So I sat to this today and started playing with the code. At first I thought that the hash is using the entire response (including headers) and thus ending up different for each of the responses. I introduced an alternative comparison method (a hash of the response.content and the response.status_code) only to realize that it was not the case (response.text == response.content, which is the actual content without headers). The reason I was getting results like above was my unfortunate default virtual host with directory listing, reflecting the provided Host header in the response (
Apache/2.4.34 (Debian) Server at something.something Port 80
). Thus, such instances of the same virtual host were generating unique content leading to unique hashes and results as shown above. Once I added a static index.php to the default webroot and ran the tool again, I got the results I was expecting. Sorry for the commotion, I did not notice I created a quite troublesome test case. I know such stuff could be avoided by introducing an alternative comparison method like word count - or even more sophisticated mechanism like the one James Kettle implemented for his Backslash Powered Scanner, now used in many other plugins as being part of the Burp API. Anyway, I don't think this scenario occurs this often to really bother.
@codingo, should this be marked as closed?
@linted yes, I believe so. Thank-you @linted.
Discovering vhosts
My apologies if I am not using/understanding this tool properly, feel free to correct me and close this issue if irrelevant. This tool got my interest as I simply wanted to automate dictionary-based detection of virtual hosts existing on a given web server (IP:PORT). So, my guess is that the principle this tool works is to keep connecting to the same webserver using different
Host:
headers and comparing the responses to each other. So if anything stands out, it indicates a successful detection of a new virtual host (which probably means a separate web root and separate configuration).So I created a simple test case with a local Apache 2 installation. Below are the contents of the /etc/apache2/sites-enabled/000-default.conf file:
The file defines four separate virtual hosts, each with a different webroot:
Each webroot contains a different default page, which gets displayed accordingly to how I manipulate the host header in Burp's Repeater, with /var/www/html4 being served for any value different than 'localhost','dev.example.org' and 'dev' (making it the default vhost).
Then, I scanned localhost with a small wordlist, making sure that it contained all the vhosts I defined:
The result
So, it simply returned the entire wordlist as valid vhosts. The result I would like to see would be rather:
Unique vhosts: [>] dev [>] localhost [>] dev.example.org
Everything else: [>] somethingelse [>] bar [>] andsoon [>] invalid [>] example.org [>] nosuchhost [>] blablabla [>] nothing [>] something.something [>] foo [>] 127.0.0.2
The result I am getting in its current form is not helping me at all :)
Do you have an idea for a solution?
I know I could grep the output and group it by the length, but this sounds like an overkill and a bit defeats the point of using a tool dedicated for this purpose in the first place.
Before I try to modify the source code, I thought I would ask first - which is what I am doing :) My guess is that successful identification of unique vhosts should boil down to comparing responses, using combination of basic properties like HTTP status code, length, number of words/letters (a hash will in most cases be always different, e.g. because of the common
Date:
header).Another method is to pick a file that exists in the default vhost document root (e.g. /js/jquery.js) and then keep requesting it with different host headers. Once we hit 404/anything else than 200 or not modified, we know we have reached a different webroot (this will still leave other vhosts with the same webroot undetectable, but that's another story). Please let me know what you think. Thanks, Julian