ContentMine / quickscrape

A scraping command line tool for the modern web
MIT License
259 stars 43 forks source link

Running test on OpenSuse 13.1 64bit #25

Closed Mec-iS closed 10 years ago

Mec-iS commented 10 years ago

I get this error while running a test on a single URL:

quickscrape \
  --url https://peerj.com/articles/384 \
  --scraper journal-scrapers/scrapers/peerj.json \
  --output peerj-384 -l debug

info:    quickscrape launched with...
info:    - URL: https://peerj.com/articles/384
info:    - Scraper: journal-scrapers/scrapers/peerj.json   
info:    - Rate limit: 3 per minute
info:    - Log level: debug
info:    urls to scrape: 1

/usr/lib/node_modules/quickscrape/bin/quickscrape.js:97 
var scrapers = new ScraperBox(program.scraperdir);
             ^
TypeError: undefined is not a function
    at Object.<anonymous> (/usr/lib/node_modules/quickscrape/bin/quickscrape.js:97:16)
    at Module._compile (module.js:456:26)
    at Object.Module._extensions..js (module.js:474:10)
    at Module.load (module.js:356:32)
    at Function.Module._load (module.js:312:12)
    at Function.Module.runMain (module.js:497:10)
    at startup (node.js:119:16)
    at node.js:901:3

The node version is v0.10.5 commander thresher which winston are installed in node_modules in the global environment (/usr/lib/node_modules/quickscrape/). I installed everything correctly in node via npm, am I missing some system library?


INSTALLATION LOGS

npm http GET https://registry.npmjs.org/quickscrape npm http 304 https://registry.npmjs.org/quickscrape npm http GET https://registry.npmjs.org/commander npm http GET https://registry.npmjs.org/which npm http GET https://registry.npmjs.org/winston npm http GET https://registry.npmjs.org/thresher npm http 304 https://registry.npmjs.org/which npm http 304 https://registry.npmjs.org/commander npm http 304 https://registry.npmjs.org/thresher npm http 304 https://registry.npmjs.org/winston npm http GET https://registry.npmjs.org/casperjs npm http GET https://registry.npmjs.org/download npm http GET https://registry.npmjs.org/jsdom npm http GET https://registry.npmjs.org/request npm http GET https://registry.npmjs.org/phantomjs npm http GET https://registry.npmjs.org/shelljs npm http GET https://registry.npmjs.org/spooky npm http GET https://registry.npmjs.org/xpath/0.0.6 npm http GET https://registry.npmjs.org/jsdom-little npm http GET https://registry.npmjs.org/async npm http GET https://registry.npmjs.org/colors npm http GET https://registry.npmjs.org/cycle npm http GET https://registry.npmjs.org/eyes npm http GET https://registry.npmjs.org/request npm http GET https://registry.npmjs.org/pkginfo npm http GET https://registry.npmjs.org/stack-trace npm http 304 https://registry.npmjs.org/request npm http 304 https://registry.npmjs.org/phantomjs npm http 304 https://registry.npmjs.org/jsdom npm http 304 https://registry.npmjs.org/shelljs npm http 304 https://registry.npmjs.org/jsdom-little npm http 304 https://registry.npmjs.org/async npm http 304 https://registry.npmjs.org/download npm http 304 https://registry.npmjs.org/colors npm http 304 https://registry.npmjs.org/spooky npm http 304 https://registry.npmjs.org/eyes npm http 304 https://registry.npmjs.org/xpath/0.0.6 npm http 304 https://registry.npmjs.org/request npm http 304 https://registry.npmjs.org/casperjs npm http 304 https://registry.npmjs.org/stack-trace npm http GET https://registry.npmjs.org/decompress npm http GET https://registry.npmjs.org/each-async npm http GET https://registry.npmjs.org/get-stdin npm http GET https://registry.npmjs.org/get-urls npm http 304 https://registry.npmjs.org/pkginfo npm http GET https://registry.npmjs.org/mkdirp npm http GET https://registry.npmjs.org/nopt npm http GET https://registry.npmjs.org/through2 npm http GET https://registry.npmjs.org/adm-zip/0.2.1 npm http GET https://registry.npmjs.org/kew npm http GET https://registry.npmjs.org/ncp/0.4.2 npm http GET https://registry.npmjs.org/npmconf/0.0.24 npm http GET https://registry.npmjs.org/mkdirp/0.3.5 npm http GET https://registry.npmjs.org/progress npm http GET https://registry.npmjs.org/request/2.36.0 npm http GET https://registry.npmjs.org/request-progress npm http GET https://registry.npmjs.org/rimraf npm http GET https://registry.npmjs.org/json-stringify-safe npm http GET https://registry.npmjs.org/mime-types npm http GET https://registry.npmjs.org/qs npm http GET https://registry.npmjs.org/forever-agent npm http GET https://registry.npmjs.org/node-uuid npm http GET https://registry.npmjs.org/tough-cookie npm http GET https://registry.npmjs.org/form-data npm http GET https://registry.npmjs.org/tunnel-agent npm http GET https://registry.npmjs.org/http-signature npm http GET https://registry.npmjs.org/oauth-sign npm http GET https://registry.npmjs.org/hawk/1.1.1 npm http GET https://registry.npmjs.org/aws-sign2 npm http GET https://registry.npmjs.org/underscore npm http GET https://registry.npmjs.org/tiny-jsonrpc npm http GET https://registry.npmjs.org/carrier npm http GET https://registry.npmjs.org/duplexer npm http GET https://registry.npmjs.org/readable-stream npm http 304 https://registry.npmjs.org/cycle npm http GET https://registry.npmjs.org/htmlparser2 npm http GET https://registry.npmjs.org/nwmatcher npm http GET https://registry.npmjs.org/cssom npm http GET https://registry.npmjs.org/cssstyle npm http GET https://registry.npmjs.org/xmlhttprequest npm http GET https://registry.npmjs.org/contextify npm http 304 https://registry.npmjs.org/mkdirp npm http GET https://registry.npmjs.org/form-data npm http GET https://registry.npmjs.org/mime npm http GET https://registry.npmjs.org/hawk npm http GET https://registry.npmjs.org/cookie-jar npm http GET https://registry.npmjs.org/oauth-sign npm http GET https://registry.npmjs.org/aws-sign npm http GET https://registry.npmjs.org/forever-agent npm http GET https://registry.npmjs.org/tunnel-agent npm http GET https://registry.npmjs.org/json-stringify-safe npm http GET https://registry.npmjs.org/qs npm http 304 https://registry.npmjs.org/nopt npm http 304 https://registry.npmjs.org/get-urls npm http 304 https://registry.npmjs.org/through2 npm http 304 https://registry.npmjs.org/adm-zip/0.2.1 npm http 304 https://registry.npmjs.org/kew npm http 304 https://registry.npmjs.org/get-stdin npm http 304 https://registry.npmjs.org/decompress npm http 304 https://registry.npmjs.org/ncp/0.4.2 npm http 304 https://registry.npmjs.org/npmconf/0.0.24 npm http 304 https://registry.npmjs.org/mkdirp/0.3.5 npm http 304 https://registry.npmjs.org/progress npm http 304 https://registry.npmjs.org/request/2.36.0 npm http 304 https://registry.npmjs.org/json-stringify-safe npm http 304 https://registry.npmjs.org/request-progress npm http 304 https://registry.npmjs.org/rimraf npm http GET https://registry.npmjs.org/throttleit npm http GET https://registry.npmjs.org/config-chain npm http GET https://registry.npmjs.org/inherits npm http GET https://registry.npmjs.org/osenv/0.0.3 npm http GET https://registry.npmjs.org/once npm http GET https://registry.npmjs.org/semver npm http GET https://registry.npmjs.org/ini npm http 304 https://registry.npmjs.org/forever-agent npm http 304 https://registry.npmjs.org/qs npm http 304 https://registry.npmjs.org/mime-types npm http 304 https://registry.npmjs.org/node-uuid npm http GET https://registry.npmjs.org/mime npm http GET https://registry.npmjs.org/hawk npm http 304 https://registry.npmjs.org/http-signature npm http 304 https://registry.npmjs.org/tough-cookie npm http 304 https://registry.npmjs.org/form-data npm http 304 https://registry.npmjs.org/tunnel-agent npm http 304 https://registry.npmjs.org/each-async npm http GET https://registry.npmjs.org/adm-zip npm http GET https://registry.npmjs.org/extname npm http GET https://registry.npmjs.org/map-key npm http GET https://registry.npmjs.org/stream-combiner npm http GET https://registry.npmjs.org/tar npm http GET https://registry.npmjs.org/tempfile npm http GET https://registry.npmjs.org/readable-stream npm http GET https://registry.npmjs.org/xtend npm http GET https://registry.npmjs.org/abbrev npm http 304 https://registry.npmjs.org/oauth-sign npm http 200 https://registry.npmjs.org/hawk/1.1.1 npm http 304 https://registry.npmjs.org/aws-sign2 npm http GET https://registry.npmjs.org/hawk/-/hawk-1.1.1.tgz npm http 200 https://registry.npmjs.org/underscore npm http 304 https://registry.npmjs.org/duplexer npm http 304 https://registry.npmjs.org/carrier npm http 304 https://registry.npmjs.org/htmlparser2 npm http 304 https://registry.npmjs.org/tiny-jsonrpc npm http 304 https://registry.npmjs.org/cssom npm http 304 https://registry.npmjs.org/cssstyle npm http 304 https://registry.npmjs.org/readable-stream npm http 200 https://registry.npmjs.org/hawk/-/hawk-1.1.1.tgz npm http 304 https://registry.npmjs.org/form-data npm http 304 https://registry.npmjs.org/contextify npm http 304 https://registry.npmjs.org/xmlhttprequest npm http GET https://registry.npmjs.org/inherits npm http GET https://registry.npmjs.org/string_decoder npm http GET https://registry.npmjs.org/core-util-is npm http GET https://registry.npmjs.org/isarray/0.0.1 npm http 304 https://registry.npmjs.org/mime npm http 304 https://registry.npmjs.org/nwmatcher npm http GET https://registry.npmjs.org/bindings npm http GET https://registry.npmjs.org/nan npm http 304 https://registry.npmjs.org/cookie-jar npm http 304 https://registry.npmjs.org/oauth-sign npm http 304 https://registry.npmjs.org/forever-agent npm http 304 https://registry.npmjs.org/aws-sign npm http GET https://registry.npmjs.org/domhandler npm http GET https://registry.npmjs.org/domutils npm http GET https://registry.npmjs.org/domelementtype npm http 304 https://registry.npmjs.org/tunnel-agent npm http GET https://registry.npmjs.org/entities npm http 304 https://registry.npmjs.org/json-stringify-safe npm http 304 https://registry.npmjs.org/qs npm http 304 https://registry.npmjs.org/throttleit npm http 200 https://registry.npmjs.org/hawk npm WARN engine hawk@0.10.2: wanted: {"node":"0.8.x"} (current: {"node":"v0.10.5","npm":"1.4.6"}) npm http 304 https://registry.npmjs.org/config-chain npm http 304 https://registry.npmjs.org/once npm http 304 https://registry.npmjs.org/osenv/0.0.3 npm http 304 https://registry.npmjs.org/inherits npm http GET https://registry.npmjs.org/combined-stream npm http 304 https://registry.npmjs.org/semver npm http GET https://registry.npmjs.org/hoek npm http GET https://registry.npmjs.org/boom npm http GET https://registry.npmjs.org/cryptiles npm http GET https://registry.npmjs.org/sntp npm http 304 https://registry.npmjs.org/mime npm http 304 https://registry.npmjs.org/ini npm http 304 https://registry.npmjs.org/adm-zip npm http GET https://registry.npmjs.org/proto-list npm http 304 https://registry.npmjs.org/stream-combiner npm http 304 https://registry.npmjs.org/tar npm http 304 https://registry.npmjs.org/extname npm http GET https://registry.npmjs.org/asn1/0.1.11 npm http GET https://registry.npmjs.org/assert-plus/0.1.2 npm http GET https://registry.npmjs.org/ctype/0.5.2 npm http 200 https://registry.npmjs.org/hawk npm http 304 https://registry.npmjs.org/readable-stream npm http 304 https://registry.npmjs.org/xtend npm http 304 https://registry.npmjs.org/tempfile npm http 304 https://registry.npmjs.org/abbrev npm http 304 https://registry.npmjs.org/inherits npm http 200 https://registry.npmjs.org/string_decoder npm http GET https://registry.npmjs.org/object-keys npm http GET https://registry.npmjs.org/punycode npm http 304 https://registry.npmjs.org/core-util-is npm http GET https://registry.npmjs.org/hoek npm http GET https://registry.npmjs.org/boom npm http GET https://registry.npmjs.org/cryptiles npm http 304 https://registry.npmjs.org/isarray/0.0.1 npm http GET https://registry.npmjs.org/sntp npm http 304 https://registry.npmjs.org/nan npm http 304 https://registry.npmjs.org/bindings

contextify@0.1.8 install /usr/lib/node_modules/quickscrape/node_modules/thresher/node_modules/jsdom/node_modules/contextify node-gyp rebuild

gyp WARN EACCES user "root" does not have permission to access the dev dir "/root/.node-gyp/0.10.5" gyp WARN EACCES attempting to reinstall using temporary dev dir "/usr/lib/node_modules/quickscrape/node_modules/thresher/node_modules/jsdom/node_modules/contextify/.node-gyp" gyp http GET http://nodejs.org/dist/v0.10.5/node-v0.10.5.tar.gz gyp http 200 http://nodejs.org/dist/v0.10.5/node-v0.10.5.tar.gz gyp http GET http://nodejs.org/dist/v0.10.5/SHASUMS.txt gyp http GET http://nodejs.org/dist/v0.10.5/SHASUMS.txt gyp http 200 http://nodejs.org/dist/v0.10.5/SHASUMS.txt gyp http 200 http://nodejs.org/dist/v0.10.5/SHASUMS.txt make: Entering directory /usr/lib/node_modules/quickscrape/node_modules/thresher/node_modules/jsdom/node_modules/contextify/build' CXX(target) Release/obj.target/contextify/src/contextify.o SOLINK_MODULE(target) Release/obj.target/contextify.node SOLINK_MODULE(target) Release/obj.target/contextify.node: Finished COPY Release/contextify.node make: Leaving directory/usr/lib/node_modules/quickscrape/node_modules/thresher/node_modules/jsdom/node_modules/contextify/build' npm http 304 https://registry.npmjs.org/entities npm http 304 https://registry.npmjs.org/domhandler npm http 304 https://registry.npmjs.org/domutils npm http 304 https://registry.npmjs.org/combined-stream npm http 304 https://registry.npmjs.org/map-key npm http 304 https://registry.npmjs.org/domelementtype npm http GET https://registry.npmjs.org/uuid npm http GET https://registry.npmjs.org/lodash npm http GET https://registry.npmjs.org/underscore.string npm http GET https://registry.npmjs.org/underscore.string npm http GET https://registry.npmjs.org/ext-list npm http GET https://registry.npmjs.org/delayed-stream/0.0.5 npm http 200 https://registry.npmjs.org/boom npm http 304 https://registry.npmjs.org/proto-list npm http 200 https://registry.npmjs.org/cryptiles npm http 200 https://registry.npmjs.org/sntp npm http 304 https://registry.npmjs.org/asn1/0.1.11 npm http 304 https://registry.npmjs.org/assert-plus/0.1.2 npm http 200 https://registry.npmjs.org/hoek npm http 304 https://registry.npmjs.org/ctype/0.5.2 npm http 304 https://registry.npmjs.org/object-keys npm http 200 https://registry.npmjs.org/punycode npm WARN engine boom@0.3.8: wanted: {"node":"0.8.x"} (current: {"node":"v0.10.5","npm":"1.4.6"}) npm WARN engine cryptiles@0.1.3: wanted: {"node":"0.8.x"} (current: {"node":"v0.10.5","npm":"1.4.6"}) npm WARN engine sntp@0.1.4: wanted: {"node":"0.8.x"} (current: {"node":"v0.10.5","npm":"1.4.6"}) npm WARN engine hoek@0.7.6: wanted: {"node":"0.8.x"} (current: {"node":"v0.10.5","npm":"1.4.6"}) npm http GET https://registry.npmjs.org/block-stream npm http GET https://registry.npmjs.org/fstream npm http 304 https://registry.npmjs.org/uuid npm http 200 https://registry.npmjs.org/hoek npm http 200 https://registry.npmjs.org/boom npm http 200 https://registry.npmjs.org/cryptiles npm http 200 https://registry.npmjs.org/sntp npm http 304 https://registry.npmjs.org/lodash npm http 304 https://registry.npmjs.org/underscore.string npm http 304 https://registry.npmjs.org/underscore.string npm http 304 https://registry.npmjs.org/delayed-stream/0.0.5 npm http 304 https://registry.npmjs.org/block-stream npm http 200 https://registry.npmjs.org/fstream npm http GET https://registry.npmjs.org/graceful-fs npm http 304 https://registry.npmjs.org/ext-list npm http 304 https://registry.npmjs.org/graceful-fs npm http GET https://registry.npmjs.org/minimist/0.0.8 npm http 304 https://registry.npmjs.org/minimist/0.0.8

phantomjs@1.9.7-15 install /usr/lib/node_modules/quickscrape/node_modules/thresher/node_modules/phantomjs node install.js

Downloading https://bitbucket.org/ariya/phantomjs/downloads/phantomjs-1.9.7-linux-x86_64.tar.bz2 Saving to /usr/lib/node_modules/quickscrape/node_modules/thresher/node_modules/phantomjs/phantomjs/phantomjs-1.9.7-linux-x86_64.tar.bz2 Receiving... [======================================-] 98% 0.0s Received 12852K total. Extracting tar contents (via spawned process) Copying extracted folder /usr/lib/node_modules/quickscrape/node_modules/thresher/node_modules/phantomjs/phantomjs/phantomjs-1.9.7-linux-x86_64.tar.bz2-extract-1406977951406/phantomjs-1.9.7-linux-x86_64 -> /usr/lib/node_modules/quickscrape/node_modules/thresher/node_modules/phantomjs/lib/phantom Writing location.js file Done. Phantomjs binary available at /usr/lib/node_modules/quickscrape/node_modules/thresher/node_modules/phantomjs/lib/phantom/bin/phantomjs /usr/bin/quickscrape -> /usr/lib/node_modules/quickscrape/bin/quickscrape.js quickscrape@0.2.7 /usr/lib/node_modules/quickscrape ├── which@1.0.5 ├── commander@2.2.0 ├── winston@0.7.3 (cycle@1.0.3, stack-trace@0.0.9, eyes@0.1.8, colors@0.6.2, async@0.2.10, pkginfo@0.3.0, request@2.16.6) └── thresher@0.0.9 (xpath@0.0.6, shelljs@0.3.0, casperjs@1.1.0-beta3, spooky@0.2.4, jsdom-little@0.10.5, request@2.37.0, download@0.1.18, jsdom@0.11.1, phantomjs@1.9.7-15)

Mec-iS commented 10 years ago

the correct property's name for thresher object at line 10 is ScraperBox

var ScraperBox = thresher.ScraperBox;
blahah commented 10 years ago

This error is fixed in your pull request #26.