ContentMine / quickscrape

A scraping command line tool for the modern web
MIT License
259 stars 43 forks source link

running quickscrape on ubuntu 14.04 results in "unknown error" #20

Closed emanuil-tolev closed 10 years ago

emanuil-tolev commented 10 years ago
vagrant@vagrant-ubuntu-trusty-32:~$ quickscrape --url https://peerj.com/articles/384 --scraper /vagrant_data/step1_quickscrape-configure/peerj.json --output peerj-384
info:    quickscrape launched with...
info:    - URL: https://peerj.com/articles/384
info:    - Scraper: /vagrant_data/step1_quickscrape-configure/peerj.json
info:    - Rate limit: 3 per minute
info:    - Log level: info
info:    urls to scrape: 1
info:    processing URL: https://peerj.com/articles/384

events.js:72
        throw er; // Unhandled 'error' event
              ^
Error: Child terminated with non-zero exit code 127
    at Spooky.<anonymous> (/usr/lib/node_modules/quickscrape/node_modules/thresher/node_modules/spooky/lib/spooky.js:180:17)
    at ChildProcess.emit (events.js:98:17)
    at Process.ChildProcess._handle.onexit (child_process.js:809:12)

In terms of the rest of the setup - the scraper file is readable, the current directory is writeable (it's ~) and there should be plenty of space:

vagrant@vagrant-ubuntu-trusty-32:~$ ls -l /vagrant_data/step1_quickscrape-configure/peerj.json
-rwxrwx--- 1 vagrant vagrant 2218 Jul 10 08:49 /vagrant_data/step1_quickscrape-configure/peerj.json
vagrant@vagrant-ubuntu-trusty-32:~$ df -h
Filesystem      Size  Used Avail Use% Mounted on
/dev/sda1        40G  1.7G   37G   5% /
vagrant@vagrant-ubuntu-trusty-32:~$ ls -al
total 40
drwxr-xr-x 5 vagrant vagrant 4096 Jul 10 08:53 .
blahah commented 10 years ago

Can you re-run the command with --loglevel debug and paste the output please? Can you also tell me the result of quickscrape -version.

blahah commented 10 years ago

I'm pretty sure what's happening here is that either casperjs or phantomjs is not being installed by npm as it should - did you get any funky messages during install?

blahah commented 10 years ago

Aha - just remember when I had this on a vanilla Ubuntu - you need to:

apt-get install libfontconfig1
sudo -H npm install -g quickscrape
emanuil-tolev commented 10 years ago

did you get any funky messages during install?

yeah, actually I did, it couldn't build in the normal build directories - I'll try the commands out first and let you know though

emanuil-tolev commented 10 years ago

OK, getting those again, a bit clearer now I think. During the run of sudo -H npm install -g quickscrape this is what I get.

vagrant@vagrant-ubuntu-trusty-32:~$ sudo -H npm install -g quickscrape
npm WARN engine hawk@0.10.2: wanted: {"node":"0.8.x"} (current: {"node":"v0.10.29","npm":"1.4.14"})
npm WARN engine cryptiles@0.1.3: wanted: {"node":"0.8.x"} (current: {"node":"v0.10.29","npm":"1.4.14"})
npm WARN engine hoek@0.7.6: wanted: {"node":"0.8.x"} (current: {"node":"v0.10.29","npm":"1.4.14"})
npm WARN engine sntp@0.1.4: wanted: {"node":"0.8.x"} (current: {"node":"v0.10.29","npm":"1.4.14"})
npm WARN engine boom@0.3.8: wanted: {"node":"0.8.x"} (current: {"node":"v0.10.29","npm":"1.4.14"})

> contextify@0.1.8 install /usr/lib/node_modules/quickscrape/node_modules/thresher/node_modules/jsdom/node_modules/contextify
> node-gyp rebuild

gyp WARN EACCES user "root" does not have permission to access the dev dir "/root/.node-gyp/0.10.29"
gyp WARN EACCES attempting to reinstall using temporary dev dir "/usr/lib/node_modules/quickscrape/node_modules/thresher/node_modules/jsdom/node_modules/contextify/.node-gyp"
gyp WARN install got an error, rolling back install
gyp WARN install got an error, rolling back install
gyp ERR! configure error 
gyp ERR! stack Error: node-v0.10.29.tar.gz local sha1 da1bee4e7e9ee1d7d95d71f6ca584f4d813f19f4 not match remote 0d5dc62090404f7c903f29779295758935529242
gyp ERR! stack     at deref (/usr/lib/node_modules/npm/node_modules/node-gyp/lib/install.js:296:20)
gyp ERR! stack     at IncomingMessage.<anonymous> (/usr/lib/node_modules/npm/node_modules/node-gyp/lib/install.js:336:13)
gyp ERR! stack     at IncomingMessage.emit (events.js:117:20)
gyp ERR! stack     at _stream_readable.js:929:16
gyp ERR! stack     at process._tickCallback (node.js:419:13)
gyp ERR! System Linux 3.13.0-30-generic
gyp ERR! command "node" "/usr/lib/node_modules/npm/node_modules/node-gyp/bin/node-gyp.js" "rebuild"
gyp ERR! cwd /usr/lib/node_modules/quickscrape/node_modules/thresher/node_modules/jsdom/node_modules/contextify
gyp ERR! node -v v0.10.29
gyp ERR! node-gyp -v v0.13.1
gyp ERR! not ok 

> phantomjs@1.9.7-14 install /usr/lib/node_modules/quickscrape/node_modules/thresher/node_modules/phantomjs
> node install.js

-
module.js:340
    throw err;
          ^
Error: Cannot find module './request'
    at Function.Module._resolveFilename (module.js:338:15)
    at Function.Module._load (module.js:280:25)
    at Module.require (module.js:364:17)
    at require (module.js:380:17)
    at Object.<anonymous> (/usr/lib/node_modules/quickscrape/node_modules/thresher/node_modules/phantomjs/node_modules/request/index.js:17:15)
    at Module._compile (module.js:456:26)
    at Object.Module._extensions..js (module.js:474:10)
    at Module.load (module.js:356:32)
    at Function.Module._load (module.js:312:12)
    at Module.require (module.js:364:17)
npm ERR! contextify@0.1.8 install: `node-gyp rebuild`
npm ERR! Exit status 1
npm ERR!
npm ERR! Failed at the contextify@0.1.8 install script.
npm ERR! This is most likely a problem with the contextify package,
npm ERR! not with npm itself.
npm ERR! Tell the author that this fails on your system:
npm ERR!     node-gyp rebuild
npm ERR! You can get their info via:
npm ERR!     npm owner ls contextify
npm ERR! There is likely additional logging output above.

npm ERR! System Linux 3.13.0-30-generic
npm ERR! command "/usr/bin/node" "/usr/bin/npm" "install" "-g" "quickscrape"
npm ERR! cwd /home/vagrant
npm ERR! node -v v0.10.29
npm ERR! npm -v 1.4.14
npm ERR! code ELIFECYCLE
npm ERR!
npm ERR! Additional logging details can be found in:
npm ERR!     /home/vagrant/npm-debug.log
npm ERR! not ok code 0
emanuil-tolev commented 10 years ago

Can you re-run the command with --loglevel debug and paste the output please? Can you also tell me the result of quickscrape -version.

The attempt to reinstall it seems to have deleted it (and the install then failed as per prev comment), so this is pending... :)

emanuil-tolev commented 10 years ago

Got it, works. This is required on 14.04: sudo apt-get install node-gyp.