chorsley / python-Wappalyzer

Python driver for Wappalyzer, a web application detection utility.
GNU General Public License v3.0
309 stars 122 forks source link

Invalid regex in Wappalyzer/data/technologies.json: Symfony: html #81

Open arielf opened 1 year ago

arielf commented 1 year ago

Following code work with python3.9 but correctly warns about a bad regex in python3.11:

   from Wappalyzer import Wappalyzer, WebPage
   WPL = Wappalyzer.latest()
   webpage = WebPage.new_from_url(url)
   web_record = WPL.analyze_with_versions_and_categories(webpage)

Trying to run this with python3.11 on " http://yahoo.com" I get:

.../python3.11/site-packages/Wappalyzer/Wappalyzer.py:226: UserWarning: Caught 'unbalanced parenthesis at position 119' compiling regex:

['(?:<div class="sf-toolbar[^>]+?>[^]+<span class="sf-toolbar-value">([\\d.])+|<div id="sfwdt[^"]+" class="[^"]*sf-toolbar)', 'version:\\1']
----------------------------------^^^ invalid?

The 'position 119' seems to a delayed reaction to the core issue.

Indeed it looks like the sub-regex: [^]+ just before is invalid since ^ is a negation/complement for the char-class which is empty here.

The problem is in the data-file: Wappalyzer/data/technologies.json (towards the end, technologies are alphabetically sorted)

The rule for "Symfony": "html": should be (one char change):

"html": "(?:<div class=\"sf-toolbar[^>]+?>[^<]+<span class=\"sf-toolbar-value\">([\\d.])+|<div id=\"sfwdt[^\"]+\" class=\"[^\"]*sf-toolbar)\\;version:\\1",
------------------------------------------^^^^ the fix

Fixed in this PR: https://github.com/chorsley/python-Wappalyzer/pull/80