Closed jeremyn closed 7 years ago
If we want to recommend Sublist3r, then we will need to discuss how to configure it. Below is an example.
In PR https://github.com/EFForg/https-everywhere/pull/6888 we want to investigate https://nav.gov.hu , which is for a governmental department in Hungary. By default Sublist3r runs against not just regular search engines but DNS and security sites as well. This generates a lot of sites [1] that are not appropriate targets, such as https://cms-test-b.ekaer.nav.gov.hu and https://mx.ekaer.nav.gov.hu . On the other hand, if we limit the search to just Baidu, Google, and Bing by modifying the script, we get a shorter list [2] but it's missing the obvious www.nav.gov.hu
.
@jeremyn I think the only sources in Sublist3r that could generate some invalid subdomains and non-web subdomains would be the "PassiveDNS", "DnsDumpster" and "ThreatCrowd", Other sources such as the Search Engines, Netcraft, CrtSearch and VirusTotal should give you working and valid web subdomains.
So you can change the line 1052 in Sublist3r
enums = [enum(domain, verbose, q=subdomains_queue) for enum in BaiduEnum, YahooEnum, GoogleEnum, BingEnum, AskEnum, NetcraftEnum, DNSdumpster, Virustotal, ThreatCrowd, CrtSearch, PassiveDNS]
To the following:
enums = [enum(domain, verbose, q=subdomains_queue) for enum in BaiduEnum, YahooEnum, GoogleEnum, BingEnum, AskEnum, NetcraftEnum, Virustotal, CrtSearch]
Then try again and compare the results. It should be Ok
Thanks @aboul3la . Below [3] is what I get with your reduced enum list. It's better than [1] above, because it only has 27 subdomains instead of 51 and includes https://www.nav.gov.hu . It still has some unwanted domains though like https://import-test-b.ekaer.nav.gov.hu , which doesn't appear in Google. There's also what looks like a threading error, which is probably not related to the change but which I left in.
Ideally GoogleEnum should return everything we can find using the manual strategy described in our style guide, such as https://www.nav.gov.hu . Do you why Sublist3r is missing www
?
"Do you why Sublist3r is missing www?"
If it relies purely on search engines then I have a guess why. There is a setting in the Google webmaster control panel to make it show example.com instead of www.example.com. ~Other engines may have similar settings or even similar automatic behavior.~ I can't find anything about this on the other engines, hm...
Sublist3r looks like a nice tool for the job. I would definitely recommend against the -b
option, as depending on your locale that tool may have ambiguous legality.
I generally think it's a good idea to use this tool, and I've created a pull request so that we can recommend the optimal configuration without requiring contribs to modify the source themselves: https://github.com/aboul3la/Sublist3r/pull/50
If this is pulled, the suggested configuration would be:
python sublist3r.py -d example.com -e Baidu -e Yahoo -e Google -e Bing -e Ask -e Netcraft -e Virustotal -e "SSL Certificates"
Wouldn't it also make sense to extend this tool with grabbing additional domain candidates from the SAN part of the certificate? Some companies split their content in a main site and let's say a cdn domain.
@numismatika it would! I just tested and indeed, it does not check the SAN field.
@numismatika I suggest opening an issue from within Sublist3r.
For now
python sublist3r.py -d example.com -e Baidu,Yahoo,Google,Bing,Ask,Netcraft,Virustotal,SSL
works for our enumeration purposes.
Once we add documentation for this enumeration technique, I consider this issue closed.
https://github.com/EFForg/https-everywhere/pull/7683 closes this issue
Thanks @Hainish .
We've had discussions recently on automating ruleset work, including using a tool to find subdomains, for example see https://github.com/EFForg/https-everywhere/pull/4279 and https://github.com/EFForg/https-everywhere/issues/6912 .
One tool we've discussed is https://github.com/aboul3la/Sublist3r . I recently submitted, and the Sublist3r author @aboul3la merged, a pull request to make Sublist3r produce subdomains in the sort order that we prefer. @aboul3la likes HTTPS Everywhere.
My question here is whether we want to officially recommend Sublist3r in our documentation, pull requests, and issues for contributors to find new subdomains. "Officially recommend" is the difference between "One tool you might try using is Sublist3r" and "We recommend Sublist3r".
I don't mean to suggest that we require contributors to use Sublist3r.
Thoughts? Pinging @fuglede @Hainish @J0WI but anyone should feel free to comment.