EFForg / https-everywhere

A browser extension that encrypts your communications with many websites that offer HTTPS but still allow unencrypted connections.
https://eff.org/https-everywhere
Other
3.37k stars 1.09k forks source link

Using Sublist3r to find subdomains for rulesets #7277

Closed jeremyn closed 7 years ago

jeremyn commented 7 years ago

We've had discussions recently on automating ruleset work, including using a tool to find subdomains, for example see https://github.com/EFForg/https-everywhere/pull/4279 and https://github.com/EFForg/https-everywhere/issues/6912 .

One tool we've discussed is https://github.com/aboul3la/Sublist3r . I recently submitted, and the Sublist3r author @aboul3la merged, a pull request to make Sublist3r produce subdomains in the sort order that we prefer. @aboul3la likes HTTPS Everywhere.

My question here is whether we want to officially recommend Sublist3r in our documentation, pull requests, and issues for contributors to find new subdomains. "Officially recommend" is the difference between "One tool you might try using is Sublist3r" and "We recommend Sublist3r".

I don't mean to suggest that we require contributors to use Sublist3r.

Thoughts? Pinging @fuglede @Hainish @J0WI but anyone should feel free to comment.

jeremyn commented 7 years ago

If we want to recommend Sublist3r, then we will need to discuss how to configure it. Below is an example.

In PR https://github.com/EFForg/https-everywhere/pull/6888 we want to investigate https://nav.gov.hu , which is for a governmental department in Hungary. By default Sublist3r runs against not just regular search engines but DNS and security sites as well. This generates a lot of sites [1] that are not appropriate targets, such as https://cms-test-b.ekaer.nav.gov.hu and https://mx.ekaer.nav.gov.hu . On the other hand, if we limit the search to just Baidu, Google, and Bing by modifying the script, we get a shorter list [2] but it's missing the obvious www.nav.gov.hu.

[1] > [-] Enumerating subdomains now for nav.gov.hu > [-] Searching now in Baidu.. > [-] Searching now in Yahoo.. > [-] Searching now in Google.. > [-] Searching now in Bing.. > [-] Searching now in Ask.. > [-] Searching now in Netcraft.. > [-] Searching now in DNSdumpster.. > [-] Searching now in Virustotal.. > [-] Searching now in ThreatCrowd.. > [-] Searching now in SSL Certificates.. > [-] Searching now in PassiveDNS.. > [-] Total Unique Subdomains Found: 51 > www.nav.gov.hu > abpe.nav.gov.hu > adatbazisok.nav.gov.hu > arveres.nav.gov.hu > auth.nav.gov.hu > clo.nav.gov.hu > ebev.nav.gov.hu > ekaer.nav.gov.hu > www.ekaer.nav.gov.hu > cms.ekaer.nav.gov.hu > cms-test.ekaer.nav.gov.hu > cms-test-b.ekaer.nav.gov.hu > import.ekaer.nav.gov.hu > import-test.ekaer.nav.gov.hu > import-test-b.ekaer.nav.gov.hu > mk.ekaer.nav.gov.hu > mk-test.ekaer.nav.gov.hu > mx.ekaer.nav.gov.hu > nav.ekaer.nav.gov.hu > nav-test.ekaer.nav.gov.hu > nebih.ekaer.nav.gov.hu > nebih-test.ekaer.nav.gov.hu > nkh.ekaer.nav.gov.hu > nkh-test.ekaer.nav.gov.hu > test.ekaer.nav.gov.hu > test-b.ekaer.nav.gov.hu > ve.ekaer.nav.gov.hu > ve-test.ekaer.nav.gov.hu > ve-test-b.ekaer.nav.gov.hu > elekafa.nav.gov.hu > en.nav.gov.hu > hirlevel.nav.gov.hu > kkk.nav.gov.hu > mail1.nav.gov.hu > mail2.nav.gov.hu > moss.nav.gov.hu > nagios.nav.gov.hu > ns.nav.gov.hu > openkkk.nav.gov.hu > opgregteszt.nav.gov.hu > opgteszt.nav.gov.hu > owa.nav.gov.hu > post.nav.gov.hu > secure.nav.gov.hu > vpi.nav.gov.hu > www01.nav.gov.hu > www02.nav.gov.hu > www03.nav.gov.hu > www05.nav.gov.hu > www06.nav.gov.hu > www09.nav.gov.hu
[2] > [-] Enumerating subdomains now for nav.gov.hu > [-] Searching now in Baidu.. > [-] Searching now in Google.. > [-] Searching now in Bing.. > [-] Total Unique Subdomains Found: 19 > abpe.nav.gov.hu > arveres.nav.gov.hu > clo.nav.gov.hu > ebev.nav.gov.hu > ekaer.nav.gov.hu > test.ekaer.nav.gov.hu > elekafa.nav.gov.hu > en.nav.gov.hu > hirlevel.nav.gov.hu > kkk.nav.gov.hu > moss.nav.gov.hu > openkkk.nav.gov.hu > secure.nav.gov.hu > vpi.nav.gov.hu > www01.nav.gov.hu > www02.nav.gov.hu > www03.nav.gov.hu > www06.nav.gov.hu > www09.nav.gov.hu >
aboul3la commented 7 years ago

@jeremyn I think the only sources in Sublist3r that could generate some invalid subdomains and non-web subdomains would be the "PassiveDNS", "DnsDumpster" and "ThreatCrowd", Other sources such as the Search Engines, Netcraft, CrtSearch and VirusTotal should give you working and valid web subdomains.

So you can change the line 1052 in Sublist3r

    enums = [enum(domain, verbose, q=subdomains_queue) for enum in BaiduEnum, YahooEnum, GoogleEnum, BingEnum, AskEnum, NetcraftEnum, DNSdumpster, Virustotal, ThreatCrowd, CrtSearch, PassiveDNS]

To the following:

    enums = [enum(domain, verbose, q=subdomains_queue) for enum in BaiduEnum, YahooEnum, GoogleEnum, BingEnum, AskEnum, NetcraftEnum, Virustotal, CrtSearch]

Then try again and compare the results. It should be Ok

jeremyn commented 7 years ago

Thanks @aboul3la . Below [3] is what I get with your reduced enum list. It's better than [1] above, because it only has 27 subdomains instead of 51 and includes https://www.nav.gov.hu . It still has some unwanted domains though like https://import-test-b.ekaer.nav.gov.hu , which doesn't appear in Google. There's also what looks like a threading error, which is probably not related to the change but which I left in.

Ideally GoogleEnum should return everything we can find using the manual strategy described in our style guide, such as https://www.nav.gov.hu . Do you why Sublist3r is missing www?

[3] [-] Enumerating subdomains now for nav.gov.hu [-] Searching now in Baidu.. [-] Searching now in Yahoo.. [-] Searching now in Google.. [-] Searching now in Bing.. [-] Searching now in Ask.. [-] Searching now in Netcraft.. [-] Searching now in Virustotal.. [-] Searching now in SSL Certificates.. Process BaiduEnum-2: Traceback (most recent call last): File "/usr/lib/python2.7/multiprocessing/process.py", line 258, in _bootstrap self.run() File "sublist3r.py", line 245, in run domain_list = self.enumerate() File "sublist3r.py", line 218, in enumerate links = self.extract_domains(resp) File "sublist3r.py", line 465, in extract_domains return links UnboundLocalError: local variable 'links' referenced before assignment [-] Total Unique Subdomains Found: 27 www.nav.gov.hu abpe.nav.gov.hu adatbazisok.nav.gov.hu arveres.nav.gov.hu clo.nav.gov.hu ebev.nav.gov.hu ekaer.nav.gov.hu www.ekaer.nav.gov.hu import.ekaer.nav.gov.hu import-test.ekaer.nav.gov.hu import-test-b.ekaer.nav.gov.hu test-b.ekaer.nav.gov.hu elekafa.nav.gov.hu en.nav.gov.hu hirlevel.nav.gov.hu kkk.nav.gov.hu moss.nav.gov.hu openkkk.nav.gov.hu owa.nav.gov.hu secure.nav.gov.hu vpi.nav.gov.hu www01.nav.gov.hu www02.nav.gov.hu www03.nav.gov.hu www05.nav.gov.hu www06.nav.gov.hu www09.nav.gov.hu
Foorack commented 7 years ago

"Do you why Sublist3r is missing www?" If it relies purely on search engines then I have a guess why. There is a setting in the Google webmaster control panel to make it show example.com instead of www.example.com. ~Other engines may have similar settings or even similar automatic behavior.~ I can't find anything about this on the other engines, hm...

https://support.google.com/webmasters/answer/44231

Hainish commented 7 years ago

Sublist3r looks like a nice tool for the job. I would definitely recommend against the -b option, as depending on your locale that tool may have ambiguous legality.

I generally think it's a good idea to use this tool, and I've created a pull request so that we can recommend the optimal configuration without requiring contribs to modify the source themselves: https://github.com/aboul3la/Sublist3r/pull/50

If this is pulled, the suggested configuration would be:

python sublist3r.py -d example.com -e Baidu -e Yahoo -e Google -e Bing -e Ask -e Netcraft -e Virustotal -e "SSL Certificates"
numismatika commented 7 years ago

Wouldn't it also make sense to extend this tool with grabbing additional domain candidates from the SAN part of the certificate? Some companies split their content in a main site and let's say a cdn domain.

Hainish commented 7 years ago

@numismatika it would! I just tested and indeed, it does not check the SAN field.

Hainish commented 7 years ago

@numismatika I suggest opening an issue from within Sublist3r.

For now

python sublist3r.py -d example.com -e Baidu,Yahoo,Google,Bing,Ask,Netcraft,Virustotal,SSL works for our enumeration purposes.

Once we add documentation for this enumeration technique, I consider this issue closed.

Hainish commented 7 years ago

https://github.com/EFForg/https-everywhere/pull/7683 closes this issue

jeremyn commented 7 years ago

Thanks @Hainish .