Open CarsonHrusovsky opened 3 months ago
Hey @CarsonHrusovsky , Maybe there is some info here #1245 that helps.
@TheTechromancer There is a repo which is an updated version of wappalyzer: https://github.com/enthec/webappanalyzer
https://github.com/rverton/webanalyze also uses their data
Yes, this is on our TODO list. Our current wappalyzer module is using an out-of-date library, and needs to be updated.
Most likely the new wappalyzer will be built on top of @liquidsec's excavate rework, which uses yara rules instead of python regexes.
Also @CarsonHrusovsky, it's possible the reason you're seeing different results is because wappalyzer is looking at the javascript files, etc. in addition to the main HTTP response. If you want this functionality in BBOT, you can enable the web spider.
Describe the bug For a findings within the Wappalyzer module, we have this.
{"host": "website.com", "technology": "nginx", "url": "https://website.com/"} httpx->wappalyzer (in-scope)
{"host": "website.com", "technology": "varnish", "url": "https://website.com/"} httpx->wappalyzer (in-scope)
In contrast, when running the wappalyzer python module manually following the same criteria, we have more robust findings. Here is a snippet,
{'Apache', 'Amazon Web Services', 'Cloud Platform', 'PHP', 'Cloudflare', 'Amazon EC2', 'Varnish', 'Polyfill'}
Here is the code I used to generate these findings:
from Wappalyzer import Wappalyzer, WebPage
wappalyzer = Wappalyzer.latest()
webpage = WebPage.new_from_url('https://website.com')
wappalyzer.analyze_with_versions_and_categories(webpage)
After looking at the wappalyzer module, the code is extremely similar to what I used in my test. My assumption here is that we are using an out of date version of wappalyzer as I can't imagine what else would supply these discrepancies. I am happy to supply more information if needed.