DanMcInerney / xsscrapy

XSS spider - 66/66 wavsep XSS detected
1.66k stars 441 forks source link

XMLSyntaxError #40

Open jthorpe6 opened 7 years ago

jthorpe6 commented 7 years ago

what version of the lxml library is needed ? on 3.8.0-2 i get the following errors, but it does not break the script from running.

[scrapy] ERROR: Error processing
Traceback (most recent call last):
  File "/usr/local/lib/python2.7/dist-packages/twisted/internet/defer.py", line 651, in _runCallbacks
    current.result = callback(current.result, *args, **kw)
  File "/opt/xsscrapy/xsscrapy/pipelines.py", line 61, in process_item
    unclaimedURL = self.unclaimedURL_check(body)
  File "/opt/xsscrapy/xsscrapy/pipelines.py", line 218, in unclaimedURL_check
    tree = fromstring(body)
  File "/usr/local/lib/python2.7/dist-packages/lxml/html/__init__.py", line 876, in fromstring
    doc = document_fromstring(html, parser=parser, base_url=base_url, **kw)
  File "/usr/local/lib/python2.7/dist-packages/lxml/html/__init__.py", line 762, in document_fromstring
    value = etree.fromstring(html, parser, **kw)
  File "src/lxml/lxml.etree.pyx", line 3228, in lxml.etree.fromstring (src/lxml/lxml.etree.c:79609)
  File "src/lxml/parser.pxi", line 1848, in lxml.etree._parseMemoryDocument (src/lxml/lxml.etree.c:119128)
  File "src/lxml/parser.pxi", line 1736, in lxml.etree._parseDoc (src/lxml/lxml.etree.c:117808)
  File "src/lxml/parser.pxi", line 1102, in lxml.etree._BaseParser._parseDoc (src/lxml/lxml.etree.c:112052)
  File "src/lxml/parser.pxi", line 595, in lxml.etree._ParserContext._handleParseResultDoc (src/lxml/lxml.etree.c:105896)
  File "src/lxml/parser.pxi", line 706, in lxml.etree._handleParseResult (src/lxml/lxml.etree.c:107604)
  File "src/lxml/parser.pxi", line 644, in lxml.etree._raiseParseError (src/lxml/lxml.etree.c:106661)
XMLSyntaxError: line 3661: Tag footer invalid (line 3661)
decidedlygray commented 6 years ago

Interesting, crash is in the same spot as #37 Crash originates here https://github.com/DanMcInerney/xsscrapy/blob/master/xsscrapy/pipelines.py#L218

DanMcInerney commented 6 years ago

Hmm yeah this is weird.