blacklanternsecurity / bbot

A recursive internet scanner for hackers.
https://www.blacklanternsecurity.com/bbot/
GNU General Public License v3.0
4.55k stars 411 forks source link

Discrepancies in wappalyzer findings. #1453

Open CarsonHrusovsky opened 3 months ago

CarsonHrusovsky commented 3 months ago

Describe the bug For a findings within the Wappalyzer module, we have this. {"host": "website.com", "technology": "nginx", "url": "https://website.com/"} httpx->wappalyzer (in-scope) {"host": "website.com", "technology": "varnish", "url": "https://website.com/"} httpx->wappalyzer (in-scope)

In contrast, when running the wappalyzer python module manually following the same criteria, we have more robust findings. Here is a snippet,

{'Apache', 'Amazon Web Services', 'Cloud Platform', 'PHP', 'Cloudflare', 'Amazon EC2', 'Varnish', 'Polyfill'}

Here is the code I used to generate these findings:

from Wappalyzer import Wappalyzer, WebPage wappalyzer = Wappalyzer.latest() webpage = WebPage.new_from_url('https://website.com') wappalyzer.analyze_with_versions_and_categories(webpage)

After looking at the wappalyzer module, the code is extremely similar to what I used in my test. My assumption here is that we are using an out of date version of wappalyzer as I can't imagine what else would supply these discrepancies. I am happy to supply more information if needed.

Sh4d0wHunt3rX commented 3 months ago

Hey @CarsonHrusovsky , Maybe there is some info here #1245 that helps.

@TheTechromancer There is a repo which is an updated version of wappalyzer: https://github.com/enthec/webappanalyzer

https://github.com/rverton/webanalyze also uses their data

TheTechromancer commented 3 months ago

Yes, this is on our TODO list. Our current wappalyzer module is using an out-of-date library, and needs to be updated.

Most likely the new wappalyzer will be built on top of @liquidsec's excavate rework, which uses yara rules instead of python regexes.

TheTechromancer commented 3 months ago

Also @CarsonHrusovsky, it's possible the reason you're seeing different results is because wappalyzer is looking at the javascript files, etc. in addition to the main HTTP response. If you want this functionality in BBOT, you can enable the web spider.