chorsley / python-Wappalyzer

Python driver for Wappalyzer, a web application detection utility.
GNU General Public License v3.0
309 stars 122 forks source link

Replace beautiful soup 4 with Python standard library #58

Closed cn-kali-team closed 2 years ago

cn-kali-team commented 3 years ago

I need to install lxml support when I install beautiful soup 4. I think a simpler method should be used to parse HTML, so I submitted a branch without beautifulsoup4

tristanlatr commented 3 years ago

Hi, @cn-kali-team ,

Thanks for the PR, honestly I don't know what to think about parsing the HTML with the std lib... Beautiful soup can be very useful when dealing with half-broken HTML. Maybe this should be the behaviour when beautiful soup cannot be imported ?

Does this fixes #48 ?

Thanks again!

tristanlatr commented 2 years ago

We provide a way to use python-Wappalyzer without lxml. This should only be used only lxml cannot be installed, the standard library DOM parser will fail on broken HTML, resulting in incomplete results.

It can be used by installing python-Wappalyzer with pip option --no-deps. Then install the required packages manually (pip install requests aiohttp cached_property dom_query pytest).

Thanks @cn-kali-team