chorsley / python-Wappalyzer

Python driver for Wappalyzer, a web application detection utility.
GNU General Public License v3.0
309 stars 122 forks source link

unbalanced parenthesis #40

Closed dogasantos closed 3 years ago

dogasantos commented 3 years ago

Hey guys, I'm testing this library, but I'm seeing this error:

/usr/local/lib/python3.8/dist-packages/python_Wappalyzer-0.3.1-py3.8.egg/Wappalyzer/Wappalyzer.py:249: UserWarning: Caught 'unbalanced parenthesis at position 119' compiling regex: ['(?:<div class="sf-toolbar[^>]+?>[^]+<span class="sf-toolbar-value">([\\d.])+|<div id="sfwdt[^"]+" class="[^"]*sf-toolbar)', 'version:\\1']

The regex seems fine, but Python still fails to compile. I'm testing using python 3.8, current library version (via git) and most resent technologies.json.

Any thoughts ? Ty!

tristanlatr commented 3 years ago

Wappalyzer is a JavaScript application therefore some of the regex wont compile in Python.

There is nothing we can really do about that I think, we would need to maintain an alternative version of the technologies.json file...

We could just ignore the warning for good, see #39

dogasantos commented 3 years ago

Yeap, i'm doing that, but the wappalyer won't run at all after that.

tristanlatr commented 3 years ago

You mean it fails ?

dogasantos commented 3 years ago

yes. I'll try to narrow down to check why this is happening

osean-man commented 3 years ago

I am having the same issue, it will completely fail with this warning. I'm also going to find out why.

tristanlatr commented 3 years ago

Warnings can be treated as errors by Python with -Werror switch (https://docs.python.org/3/using/cmdline.html#cmdoption-w) or directly by handling the warnings module (https://docs.python.org/3/library/warnings.html#warnings.filterwarnings).

It is possible that your software enable this by default. We should change the warning here because the users can't do anything about this regex error anyway. Maybe we should completely ignore this regex errors ?

osean-man commented 3 years ago

That did the trick, just throw warnings.simplefilter("ignore") in and it runs fine.

tristanlatr commented 3 years ago

@sdiggles Your code will ignore all warnings, it's not a great practice. It's better to target specific warnings. See #39.

But, as I said, we should not throw a warning at all actually.

I'll change that soon and publish patched version to PyPi

tristanlatr commented 3 years ago

Here is the fix I think: https://github.com/panterloons/python-Wappalyzer/commit/3d6ad31295bda3dae6eb12d9f672f5c47da2a9eb

Should we programatically apply it ?

This would mean that we'de need to change the logging and add a debug mode to spot the regex errors on build.

Or this fix for the regex issue is more general and should be merged into https://github.com/AliasIO/wappalyzer maybe.

I'm not a regex expert.