Unclear results (false positives?)

ewilded commented 5 years ago

Discovering vhosts

My apologies if I am not using/understanding this tool properly, feel free to correct me and close this issue if irrelevant. This tool got my interest as I simply wanted to automate dictionary-based detection of virtual hosts existing on a given web server (IP:PORT). So, my guess is that the principle this tool works is to keep connecting to the same webserver using different Host: headers and comparing the responses to each other. So if anything stands out, it indicates a successful detection of a new virtual host (which probably means a separate web root and separate configuration).

So I created a simple test case with a local Apache 2 installation. Below are the contents of the /etc/apache2/sites-enabled/000-default.conf file:

<VirtualHost *:80>
        ServerName 127.0.0.1
        ServerAdmin webmaster@localhost
        DocumentRoot /var/www/html4
        ErrorLog ${APACHE_LOG_DIR}/error.log
        CustomLog ${APACHE_LOG_DIR}/access.log combined
</VirtualHost>
<VirtualHost *:80>
        ServerName localhost
        ServerAdmin webmaster@localhost
        DocumentRoot /var/www/html2
        ErrorLog ${APACHE_LOG_DIR}/error.log
        CustomLog ${APACHE_LOG_DIR}/access.log combined
</VirtualHost>
<VirtualHost *:80>
        ServerName dev.example.org
        ServerAdmin webmaster@localhost
        DocumentRoot /var/www/html3
        ErrorLog ${APACHE_LOG_DIR}/error.log
        CustomLog ${APACHE_LOG_DIR}/access.log combined
</VirtualHost>
<VirtualHost *:80>
        ServerName dev
        ServerAdmin webmaster@localhost
        DocumentRoot /var/www/html
        ErrorLog ${APACHE_LOG_DIR}/error.log
        CustomLog ${APACHE_LOG_DIR}/access.log combined
</VirtualHost>

The file defines four separate virtual hosts, each with a different webroot:

127.0.0.1 -> /var/www/html4
localhost -> /var/www/html2
dev.example.org -> /var/www/html3
dev -> /var/www/html

Each webroot contains a different default page, which gets displayed accordingly to how I manipulate the host header in Burp's Repeater, with /var/www/html4 being served for any value different than 'localhost','dev.example.org' and 'dev' (making it the default vhost).

Then, I scanned localhost with a small wordlist, making sure that it contained all the vhosts I defined:

foo
bar
nothing
invalid
localhost
127.0.0.2
example.org
something.something
dev
somethingelse
dev.example.org
andsoon
nosuchhost
blablabla

The result

VHostScan -w wordlist.txt -t 127.0.0.1 -p 80
/usr/local/lib/python2.7/dist-packages/fuzzywuzzy-0.15.1-py2.7.egg/fuzzywuzzy/fuzz.py:35: UserWarning: Using slow pure-python SequenceMatcher. Install python-Levenshtein to remove this warning
  warnings.warn('Using slow pure-python SequenceMatcher. Install python-Levenshtein to remove this warning')
+-+-+-+-+-+-+-+-+-+  v. 1.21
|V|H|o|s|t|S|c|a|n|  Developed by @codingo_ & @__timk
+-+-+-+-+-+-+-+-+-+  https://github.com/codingo/VHostScan

[+] Starting virtual host scan for 127.0.0.1 using port 80 and wordlists: wordlist.txt
[>] Ignoring HTTP codes: 404
[+] Resolving DNS for additional wordlist entries
[!] Couldn't find any records (NXDOMAIN)
[#] Found: foo (code: 200, length: 399, hash: afd749d3aaab964b10b9bd02aa208a004962a995cdb0bf4b379002e2cbceabf7)
  Date: Thu, 11 Oct 2018 12:17:20 GMT
  Server: Apache/2.4.34 (Debian)
  Vary: Accept-Encoding
  Content-Encoding: gzip
  Content-Length: 399
  Keep-Alive: timeout=5, max=100
  Connection: Keep-Alive
  Content-Type: text/html;charset=UTF-8

[#] Found: bar (code: 200, length: 398, hash: bef01edffcad980916a1e6066da7bc044aaefd974c66005cf032e25d84f84cf6)
  Date: Thu, 11 Oct 2018 12:17:20 GMT
  Server: Apache/2.4.34 (Debian)
  Vary: Accept-Encoding
  Content-Encoding: gzip
  Content-Length: 398
  Keep-Alive: timeout=5, max=100
  Connection: Keep-Alive
  Content-Type: text/html;charset=UTF-8

[#] Found: nothing (code: 200, length: 405, hash: 58ba922d7580747b9983d1f86a86f6879893d257f056bd4f76f7efccec12afe5)
  Date: Thu, 11 Oct 2018 12:17:20 GMT
  Server: Apache/2.4.34 (Debian)
  Vary: Accept-Encoding
  Content-Encoding: gzip
  Content-Length: 405
  Keep-Alive: timeout=5, max=100
  Connection: Keep-Alive
  Content-Type: text/html;charset=UTF-8

[#] Found: invalid (code: 200, length: 400, hash: 1bd6e4a459ec75e597de0ca49f4692dd6bf34bd503d3dcb0055f238e109de6b8)
  Date: Thu, 11 Oct 2018 12:17:20 GMT
  Server: Apache/2.4.34 (Debian)
  Vary: Accept-Encoding
  Content-Encoding: gzip
  Content-Length: 400
  Keep-Alive: timeout=5, max=100
  Connection: Keep-Alive
  Content-Type: text/html;charset=UTF-8

[#] Found: localhost (code: 200, length: 5, hash: c9d04c9565fc665c80681fb1d829938026871f66e14f501e08531df66938a789)
  Date: Thu, 11 Oct 2018 12:17:20 GMT
  Server: Apache/2.4.34 (Debian)
  Last-Modified: Wed, 10 Oct 2018 17:20:10 GMT
  ETag: "5-577e311029df9"
  Accept-Ranges: bytes
  Content-Length: 5
  Keep-Alive: timeout=5, max=100
  Connection: Keep-Alive
  Content-Type: text/html

[#] Found: 127.0.0.2 (code: 200, length: 403, hash: ef68f0ec1c6c1036c595ec4a9fcd9f93ff268829bfe201beee6a33e4c602f1e4)
  Date: Thu, 11 Oct 2018 12:17:20 GMT
  Server: Apache/2.4.34 (Debian)
  Vary: Accept-Encoding
  Content-Encoding: gzip
  Content-Length: 403
  Keep-Alive: timeout=5, max=100
  Connection: Keep-Alive
  Content-Type: text/html;charset=UTF-8

[#] Found: example.org (code: 200, length: 408, hash: bc6eefe1bb3d40fd6eecbdba5231fe62f877e2f8916a007f88dec3db3babc20c)
  Date: Thu, 11 Oct 2018 12:17:20 GMT
  Server: Apache/2.4.34 (Debian)
  Vary: Accept-Encoding
  Content-Encoding: gzip
  Content-Length: 408
  Keep-Alive: timeout=5, max=100
  Connection: Keep-Alive
  Content-Type: text/html;charset=UTF-8

[#] Found: something.something (code: 200, length: 409, hash: 5b84da62c2a41e6f44c48f055249a3f235856c4100773f23ff10c9500ff39d2f)
  Date: Thu, 11 Oct 2018 12:17:20 GMT
  Server: Apache/2.4.34 (Debian)
  Vary: Accept-Encoding
  Content-Encoding: gzip
  Content-Length: 409
  Keep-Alive: timeout=5, max=100
  Connection: Keep-Alive
  Content-Type: text/html;charset=UTF-8

[#] Found: dev (code: 200, length: 563, hash: ae81f9837428f1aa79b94a3b7f809198074334ba498a87821354806980f7fec4)
  Date: Thu, 11 Oct 2018 12:17:20 GMT
  Server: Apache/2.4.34 (Debian)
  Last-Modified: Tue, 07 Aug 2018 13:40:05 GMT
  ETag: "3a6-572d888143908-gzip"
  Accept-Ranges: bytes
  Vary: Accept-Encoding
  Content-Encoding: gzip
  Content-Length: 563
  Keep-Alive: timeout=5, max=100
  Connection: Keep-Alive
  Content-Type: text/html

[#] Found: somethingelse (code: 200, length: 409, hash: 5e6a43b80f4ce19d42df792d4c9a8b954a7d6831bcdcaa273be4d3893f41a80e)
  Date: Thu, 11 Oct 2018 12:17:20 GMT
  Server: Apache/2.4.34 (Debian)
  Vary: Accept-Encoding
  Content-Encoding: gzip
  Content-Length: 409
  Keep-Alive: timeout=5, max=100
  Connection: Keep-Alive
  Content-Type: text/html;charset=UTF-8

[#] Found: dev.example.org (code: 200, length: 334, hash: 896877f13d3f78b00dbd7e3c529b8e2f4e3c583e52fefe031df06baee7653f99)
  Date: Thu, 11 Oct 2018 12:17:20 GMT
  Server: Apache/2.4.34 (Debian)
  Vary: Accept-Encoding
  Content-Encoding: gzip
  Content-Length: 334
  Keep-Alive: timeout=5, max=100
  Connection: Keep-Alive
  Content-Type: text/html;charset=UTF-8

[#] Found: andsoon (code: 200, length: 405, hash: fb9b03a379483f20f76d8676de17060cecbfc70674243aee81d25d21ac8b6539)
  Date: Thu, 11 Oct 2018 12:17:20 GMT
  Server: Apache/2.4.34 (Debian)
  Vary: Accept-Encoding
  Content-Encoding: gzip
  Content-Length: 405
  Keep-Alive: timeout=5, max=100
  Connection: Keep-Alive
  Content-Type: text/html;charset=UTF-8

[#] Found: nosuchhost (code: 200, length: 408, hash: 4e3b1afa8df7ffbf0d65de1bf83cb616d4f38f11df38a7b80660690b769d9ac5)
  Date: Thu, 11 Oct 2018 12:17:20 GMT
  Server: Apache/2.4.34 (Debian)
  Vary: Accept-Encoding
  Content-Encoding: gzip
  Content-Length: 408
  Keep-Alive: timeout=5, max=100
  Connection: Keep-Alive
  Content-Type: text/html;charset=UTF-8

[#] Found: blablabla (code: 200, length: 400, hash: 888c4287730e65eded0c3d540a3f9ad3204fc8b8d5edc0b0aca78262c8a014f9)
  Date: Thu, 11 Oct 2018 12:17:20 GMT
  Server: Apache/2.4.34 (Debian)
  Vary: Accept-Encoding
  Content-Encoding: gzip
  Content-Length: 400
  Keep-Alive: timeout=5, max=100
  Connection: Keep-Alive
  Content-Type: text/html;charset=UTF-8

[+] Most likely matches with a unique count of 1 or less:
        [>] somethingelse
        [>] bar
        [>] dev.example.org
        [>] andsoon
        [>] invalid
        [>] example.org
        [>] nosuchhost
        [>] dev
        [>] blablabla
        [>] nothing
        [>] something.something
        [>] foo
        [>] 127.0.0.2
        [>] localhost

So, it simply returned the entire wordlist as valid vhosts. The result I would like to see would be rather:

Unique vhosts: [>] dev [>] localhost [>] dev.example.org
Everything else: [>] somethingelse [>] bar [>] andsoon [>] invalid [>] example.org [>] nosuchhost [>] blablabla [>] nothing [>] something.something [>] foo [>] 127.0.0.2

The result I am getting in its current form is not helping me at all :)

Do you have an idea for a solution?

I know I could grep the output and group it by the length, but this sounds like an overkill and a bit defeats the point of using a tool dedicated for this purpose in the first place.

Before I try to modify the source code, I thought I would ask first - which is what I am doing :) My guess is that successful identification of unique vhosts should boil down to comparing responses, using combination of basic properties like HTTP status code, length, number of words/letters (a hash will in most cases be always different, e.g. because of the common Date: header).

Another method is to pick a file that exists in the default vhost document root (e.g. /js/jquery.js) and then keep requesting it with different host headers. Once we hit 404/anything else than 200 or not modified, we know we have reached a different webroot (this will still leave other vhosts with the same webroot undetectable, but that's another story). Please let me know what you think. Thanks, Julian

codingo commented 5 years ago

Hi Julian,

Have you tried using --fuzzy-logic on this dataset? This uses levinstien distance to measure page differences (it's useful for bypassing cases where items such as the time are on the page) and is useful in scenarios like this.

If that doesn't work, but you can see another solution, We'd certainly appreciate a pull request!

Regards,

Michael

codingo commented 5 years ago

@ewilded touching base to see where we're headed with this?

ewilded commented 5 years ago

Hi, my apologies, I have been quite busy lately. I tried the --fuzzy-logic method but did not help with clear results. Will poke around the code trying to get the desired results without interfering with anything that's already working well. If successful, I'll come up with a pull request. Thanks.

codingo commented 5 years ago

That would be greatly appreciated! Reach out if you have any questions (can also dm me on twitter under @codingo_ if easier).

ewilded commented 5 years ago

OK, sorry it took so long. So I sat to this today and started playing with the code. At first I thought that the hash is using the entire response (including headers) and thus ending up different for each of the responses. I introduced an alternative comparison method (a hash of the response.content and the response.status_code) only to realize that it was not the case (response.text == response.content, which is the actual content without headers). The reason I was getting results like above was my unfortunate default virtual host with directory listing, reflecting the provided Host header in the response (

Apache/2.4.34 (Debian) Server at something.something Port 80

). Thus, such instances of the same virtual host were generating unique content leading to unique hashes and results as shown above. Once I added a static index.php to the default webroot and ran the tool again, I got the results I was expecting. Sorry for the commotion, I did not notice I created a quite troublesome test case. I know such stuff could be avoided by introducing an alternative comparison method like word count - or even more sophisticated mechanism like the one James Kettle implemented for his Backslash Powered Scanner, now used in many other plugins as being part of the Burp API. Anyway, I don't think this scenario occurs this often to really bother.

linted commented 5 years ago

@codingo, should this be marked as closed?

codingo commented 5 years ago

@linted yes, I believe so. Thank-you @linted.

codingo / VHostScan