algolia / docsearch-scraper

DocSearch - Scraper
https://docsearch.algolia.com/
Other
306 stars 106 forks source link

TypeError: expected string or buffer #19

Closed dustincoates closed 8 years ago

dustincoates commented 8 years ago

I'm getting this error independent on a number of different configs. (For example, received it on lodash and chef).

TypeError: expected string or buffer
https://docs.chef.io/nodes.html
2015-12-31 15:48:04 [scrapy] ERROR: Spider error processing <GET https://docs.chef.io/nodes.html> (referer: https://docs.chef.io/)
Traceback (most recent call last):
  File "/usr/local/lib/python2.7/site-packages/scrapy/utils/defer.py", line 102, in iter_errback
    yield next(it)
  File "/usr/local/lib/python2.7/site-packages/scrapy/spidermiddlewares/offsite.py", line 28, in process_spider_output
    for x in result:
  File "/usr/local/lib/python2.7/site-packages/scrapy/spidermiddlewares/referer.py", line 22, in <genexpr>
    return (_set_referer(r) for r in result or ())
  File "/usr/local/lib/python2.7/site-packages/scrapy/spidermiddlewares/urllength.py", line 37, in <genexpr>
    return (r for r in result or () if _filter(r))
  File "/usr/local/lib/python2.7/site-packages/scrapy/spidermiddlewares/depth.py", line 54, in <genexpr>
    return (r for r in result or () if _filter(r))
  File "/usr/local/lib/python2.7/site-packages/scrapy/spiders/crawl.py", line 67, in _parse_response
    cb_res = callback(response, **cb_kwargs) or ()
  File "/Users/dustin/Documents/code/documentation-scrapper/src/documentation_spider.py", line 51, in callback
    records = self.strategy.get_records_from_response(response)
  File "/Users/dustin/Documents/code/documentation-scrapper/src/strategies/default_strategy.py", line 26, in get_records_from_response
    records = self.get_records_from_dom()
  File "/Users/dustin/Documents/code/documentation-scrapper/src/strategies/default_strategy.py", line 51, in get_records_from_dom
    nodes_per_level[level] = self.cssselect(level_selector)
  File "/Users/dustin/Documents/code/documentation-scrapper/src/strategies/abstract_strategy.py", line 84, in cssselect
    return CSSSelector(selector)(self.dom)
  File "/usr/local/lib/python2.7/site-packages/lxml/cssselect.py", line 94, in __init__
    path = translator.css_to_xpath(css)
  File "/usr/local/lib/python2.7/site-packages/cssselect/xpath.py", line 192, in css_to_xpath
    for selector in parse(css))
  File "/usr/local/lib/python2.7/site-packages/cssselect/parser.py", line 341, in parse
    match = _el_re.match(css)
TypeError: expected string or buffer
redox commented 8 years ago

master VS develop branch schema? I would merge develop to master btw \o/

pixelastic commented 8 years ago

I'll move everything on master and remove the develop branch to avoid confusion.

pixelastic commented 8 years ago

@dustincoates Could you try again? On master for real this time (I just updated it) :)

dustincoates commented 8 years ago

@pixelastic @redox I think that was the problem. I'm going to try again later today. :)