ContentMine / quickscrape

A scraping command line tool for the modern web
MIT License
259 stars 42 forks source link

undefined response object in UrlResolver #88

Closed petermr closed 8 years ago

petermr commented 8 years ago
info: Saving logs to ./test2010-03-01/quickscrape1.2016-09-11-19-16.log
info: quickscrape 0.4.7 launched with...
info: - URLs from file: undefined
info: - Scraperdir: /home/pm286/journal-scrapers/scrapers
info: - Rate limit: 10 per minute
info: - Log level: info
info: urls to scrape: 13110
info: processing URL: http://dx.doi.org/10.1088/0965-0393/18/2/025015
error: Error: ETIMEDOUT so moving on to next url in list
info: processing URL: http://dx.doi.org/10.1088/0965-0393/18/2/025016
error: Error: ETIMEDOUT so moving on to next url in list
info: processing URL: http://dx.doi.org/10.4304/jsw.5.3.304-311
error: page did not return a 200 instead returned 500 so moving on to next url in list
info: processing URL: http://dx.doi.org/10.1209/0295-5075/89/69002
error: Error: ETIMEDOUT so moving on to next url in list
info: processing URL: http://dx.doi.org/10.5373/jaram.223.092109
/home/pm286/.nvm/versions/node/v6.3.1/lib/node_modules/quickscrape/node_modules/thresher/lib/url.js:60
    callback(err, response.request.href);
                          ^

TypeError: Cannot read property 'request' of undefined
    at Request._callback (/home/pm286/.nvm/versions/node/v6.3.1/lib/node_modules/quickscrape/node_modules/thresher/lib/url.js:60:27)
    at self.callback (/home/pm286/.nvm/versions/node/v6.3.1/lib/node_modules/quickscrape/node_modules/request/request.js:368:22)
    at emitOne (events.js:96:13)
    at Request.emit (events.js:188:7)
    at Request.onRequestError (/home/pm286/.nvm/versions/node/v6.3.1/lib/node_modules/quickscrape/node_modules/request/request.js:1025:8)
    at emitOne (events.js:96:13)
    at ClientRequest.emit (events.js:188:7)
    at Socket.socketErrorListener (_http_client.js:308:9)
    at emitOne (events.js:96:13)
    at Socket.emit (events.js:188:7)
finished
tarrow commented 8 years ago

This is actually a bug in thresher. See contentmine/thresher#23