blacklanternsecurity / bbot

A recursive internet scanner for hackers.
https://www.blacklanternsecurity.com/bbot/
GNU General Public License v3.0
4.47k stars 401 forks source link

Beautiful Soup Warning Making it to stdout #618

Closed liquidsec closed 1 year ago

liquidsec commented 1 year ago

The following message is occasionally making it into the scan stdout:

/home/****/.cache/pypoetry/virtualenvs/bbot-IFSyk-JB-py3.10/lib/python3.10/site-packages/bs4/builder/__init__.py:545: XMLParsedAsHTMLWarning: It looks like you're parsing an XML document using an HTML parser. If this really is an HTML document (maybe it's XHTML?), you can ignore or filter this warning. If it's XML, you should know that using an XML parser will be more reliable. To parse this document as XML, make sure you have the lxml package installed, and pass the keyword argument `features="xml"` into the BeautifulSoup constructor.
  warnings.warn(
liquidsec commented 1 year ago

Also spotted, Possibly related:

/opt/bbot/bbot/core/helpers/web.py:348: MarkupResemblesLocatorWarning: The input looks more like a filename than markup. You may want to open this file and pass the filehandle into Beautiful Soup.
  soup = BeautifulSoup(html, "html.parser")
TheTechromancer commented 1 year ago

This is also an open issue in Wappalyzer:

https://github.com/chorsley/python-Wappalyzer/issues/85

TheTechromancer commented 1 year ago

Fixed in a90dca04a0455d73baed182f9a43eeaef4d3b2f0