ContentMine / quickscrape

A scraping command line tool for the modern web
MIT License
259 stars 43 forks source link

No casperjs installation found #5

Closed rossmounce closed 10 years ago

rossmounce commented 10 years ago

Not sure if the installation process is actually fully working yet...

This is on Bath machine, Lubuntu 12.04 LTS 64-bit

ran curl | sudo bash installation from readme

then cloned the journal-scrapers, but peerj single article example did not work:

-SNIP-

node-gyp rebuild

gyp WARN EACCES user "root" does not have permission to access the dev dir "/home/ross/.node-gyp/0.10.28" gyp WARN EACCES attempting to reinstall using temporary dev dir "/usr/lib/node_modules/quickscrape/node_modules/jsdom/node_modules/contextify/.node-gyp" gyp http GET http://nodejs.org/dist/v0.10.28/node-v0.10.28.tar.gz gyp http 200 http://nodejs.org/dist/v0.10.28/node-v0.10.28.tar.gz gyp http GET http://nodejs.org/dist/v0.10.28/SHASUMS.txt gyp http GET http://nodejs.org/dist/v0.10.28/SHASUMS.txt gyp http 200 http://nodejs.org/dist/v0.10.28/SHASUMS.txt gyp http 200 http://nodejs.org/dist/v0.10.28/SHASUMS.txt make: Entering directory /usr/lib/node_modules/quickscrape/node_modules/jsdom/node_modules/contextify/build' CXX(target) Release/obj.target/contextify/src/contextify.o SOLINK_MODULE(target) Release/obj.target/contextify.node SOLINK_MODULE(target) Release/obj.target/contextify.node: Finished COPY Release/contextify.node make: Leaving directory/usr/lib/node_modules/quickscrape/node_modules/jsdom/node_modules/contextify/build' npm ERR! phantomjs@1.9.7-8 install: node install.js npm ERR! Exit status 8 npm ERR! npm ERR! Failed at the phantomjs@1.9.7-8 install script. npm ERR! This is most likely a problem with the phantomjs package, npm ERR! not with npm itself. npm ERR! Tell the author that this fails on your system: npm ERR! node install.js npm ERR! You can get their info via: npm ERR! npm owner ls phantomjs npm ERR! There is likely additional logging output above. npm ERR! System Linux 3.2.0-63-generic npm ERR! command "/usr/bin/node" "/usr/bin/npm" "install" "--global" "--unsafe-perms" "casperjs" "quickscrape" npm ERR! cwd /home/ross/Documents/corpuses/quickscrape npm ERR! node -v v0.10.28 npm ERR! npm -v 1.4.9 npm ERR! code ELIFECYCLE /usr/bin/quickscrape -> /usr/lib/node_modules/quickscrape/bin/quickscrape.js npm ERR! npm ERR! Additional logging details can be found in: npm ERR! /home/ross/Documents/corpuses/quickscrape/npm-debug.log npm ERR! not ok code 0

Usage: quickscrape [options]

Options:

-h, --help              output usage information
-V, --version           output the version number
-u, --url <url>         URL to scrape
-r, --urllist <path>    path to file with list of URLs to scrape (one per line)
-s, --scraper <path>    path to scraper definition (in JSON format)
-o, --output <path>     where to output results (directory will be created if it doesn't exist
-r, --ratelimit <int>   maximum number of scrapes per minute (default 3)
-l, --loglevel <level>  amount of information to log (silent, verbose, info*, data, warn, error, or debug)

quickscrape successfully installed! ross@ross-x3:~/Documents/corpuses/quickscrape$ git clone https://github.com/ContentMine/journal-scrapers.git Cloning into 'journal-scrapers'... WARNING: gnome-keyring:: couldn't connect to: /tmp/keyring-SndeC9/pkcs11: No such file or directory remote: Reusing existing pack: 35, done. remote: Total 35 (delta 0), reused 0 (delta 0) Unpacking objects: 100% (35/35), done. ross@ross-x3:~/Documents/corpuses/quickscrape$ quickscrape \

--url https://peerj.com/articles/384 \ --scraper journal-scrapers/peerj.json \ --output peerj-384

/usr/lib/node_modules/quickscrape/bin/quickscrape.js:64 throw new Error(msg); ^ Error: Nocasperjs installation found.See installation instructions at https://github.com/ContentMine/quickscrape at fs.readFileSync.encoding (/usr/lib/node_modules/quickscrape/bin/quickscrape.js:64:11) at Array.forEach (native) at Object. (/usr/lib/node_modules/quickscrape/bin/quickscrape.js:57:27) at Module._compile (module.js:456:26) at Object.Module._extensions..js (module.js:474:10) at Module.load (module.js:356:32) at Function.Module._load (module.js:312:12) at Function.Module.runMain (module.js:497:10) at startup (node.js:119:16) at node.js:906:3

blahah commented 10 years ago

Hmm, wonder why this keeps failing on lubuntu.

Can you try:

sudo chown -R `whoami` ~/.npm
sudo chown -R `whoami` ~/.node-gyp
sudo npm -H install --global casperjs quickscrape
blahah commented 10 years ago

I think it's the same issue highlighted here. Adding the -H option to npm should fix it. If it works for you I'll modify the install script and test on a fresh lubuntu VM.

rossmounce commented 10 years ago

So just to be clear, I should run:

sudo -H npm install --global --unsafe-perms casperjs quickscrape

(and I can do that on it's own, rather than in / as part of the combined install script?)

rossmounce commented 10 years ago

still the same with sudo -H (perhaps I need to run it with -H inside the install script? Will that make a difference? https://gist.github.com/Blahah/827f183fb30ea5b6d571 )

make: Leaving directory /usr/lib/node_modules/quickscrape/node_modules/jsdom/node_modules/contextify/build' npm ERR! phantomjs@1.9.7-8 install:node install.js` npm ERR! Exit status 8 npm ERR! npm ERR! Failed at the phantomjs@1.9.7-8 install script. npm ERR! This is most likely a problem with the phantomjs package, npm ERR! not with npm itself. npm ERR! Tell the author that this fails on your system: npm ERR! node install.js npm ERR! You can get their info via: npm ERR! npm owner ls phantomjs npm ERR! There is likely additional logging output above. npm ERR! System Linux 3.2.0-63-generic npm ERR! command "/usr/bin/node" "/usr/bin/npm" "install" "--global" "--unsafe-perms" "casperjs" "quickscrape" npm ERR! cwd /home/ross/Documents/corpuses/quickscrape npm ERR! node -v v0.10.28 npm ERR! npm -v 1.4.9 npm ERR! code ELIFECYCLE /usr/bin/quickscrape -> /usr/lib/node_modules/quickscrape/bin/quickscrape.js npm ERR! npm ERR! Additional logging details can be found in: npm ERR! /home/ross/Documents/corpuses/quickscrape/npm-debug.log npm ERR! not ok code 0 ross@ross-x3:~/Documents/corpuses/quickscrape$ quickscrape --url https://peerj.com/articles/384 --scraper journal-scrapers/peerj.json --output peerj-384

/usr/lib/node_modules/quickscrape/bin/quickscrape.js:64 throw new Error(msg); ^ Error: Nocasperjs installation found.See installation instructions at https://github.com/ContentMine/quickscrape at fs.readFileSync.encoding (/usr/lib/node_modules/quickscrape/bin/quickscrape.js:64:11) at Array.forEach (native) at Object. (/usr/lib/node_modules/quickscrape/bin/quickscrape.js:57:27) at Module._compile (module.js:456:26) at Object.Module._extensions..js (module.js:474:10) at Module.load (module.js:356:32) at Function.Module._load (module.js:312:12) at Function.Module.runMain (module.js:497:10) at startup (node.js:119:16) at node.js:906:3

blahah commented 10 years ago

I meant to run the commands exactly as I wrote above (without --unsafe-perms). And yes, you don't need to re-run the install script, just run the commands on their own.

rossmounce commented 10 years ago

ross@ross-x3:~/Documents/corpuses/quickscrape$ sudo chown -R whoami ~/.npm[sudo] password for ross: ross@ross-x3:~/Documents/corpuses/quickscrape$ sudo chown -R whoami ~/.node-gyp chown: cannot access /home/ross/.node-gyp': No such file or directory ross@ross-x3:~/Documents/corpuses/quickscrape$ sudo chown -Rwhoami~/.node-gyp chown: cannot access/home/ross/.node-gyp': No such file or directory ross@ross-x3:~/Documents/corpuses/quickscrape$ sudo npm -H install --global casperjs quickscrape Top hits for "install" "casperjs" "quickscrape" ———————————————————————————————————————————————————————————————————————————— npm help install install:64 npm help scripts install:45 npm help folders install:40 npm help config install:32 npm help package.json install:29 npm help shrinkwrap install:28 npm help faq install:21 npm help npm install:18 npm help developers install:14 npm help removing-npm install:14 npm help index install:13 npm help uninstall install:10 npm apihelp install install:7 npm apihelp uninstall install:6 npm help link install:5 npm apihelp ls install:5 npm help edit install:5 npm help tag install:5 npm help publish install:5 npm help ls install:4 npm apihelp npm install:4 npm help rm install:4 npm apihelp edit install:4 npm help submodule install:4 npm apihelp submodule install:4 npm apihelp link install:3 npm help explore install:3 npm help update install:3 npm help bundle install:3 npm help cache install:2 npm help rebuild install:2 npm help disputes install:2 npm apihelp deprecate install:2 npm help outdated install:2 npm help build install:2 npm apihelp explore install:2 npm apihelp update install:1 npm apihelp publish install:1 npm help bin install:1 npm apihelp rebuild install:1 npm help pack install:1 npm apihelp bin install:1 npm apihelp tag install:1 npm help dedupe install:1 npm help deprecate install:1 npm apihelp test install:1 npm apihelp pack install:1 npm help test install:1 npm help semver install:1 ———————————————————————————————————————————————————————————————————————————— (run with -l or --long to see more context)

blahah commented 10 years ago

err sorry, got it the wrong way round. -H should have been after sudo. Try:

sudo -H npm install --global casperjs quickscrape
rossmounce commented 10 years ago

Definitely getting closer to success now... :)

ross@ross-x3:~/Documents/corpuses/quickscrape$ quickscrape --url https://peerj.com/articles/384 --scraper journal-scrapers/peerj.json --output peerj-384 info: quickscrape launched with... info: - URL: https://peerj.com/articles/384 info: - Scraper: journal-scrapers/peerj.json info: - Rate limit: 3 per minute info: - Log level: info info: urls to scrape: 1 info: processing URL: https://peerj.com/articles/384

events.js:72 throw er; // Unhandled 'error' event ^ Error: Child terminated with non-zero exit code 1 at Spooky. (/usr/lib/node_modules/quickscrape/node_modules/spooky/lib/spooky.js:180:17) at ChildProcess.EventEmitter.emit (events.js:98:17) at Process.ChildProcess._handle.onexit (child_process.js:807:12)

rossmounce commented 10 years ago

I should also report it's created two empty nested subfolders now as part of the above terminated process (but they're completely empty)

./peerj-384/https_peerj.com_articles_384/

blahah commented 10 years ago

Great - install worked! Now the program itself is crashing. Can you re-run the quickscrape command with --loglevel debug? i.e.:

quickscrape --url https://peerj.com/articles/384 --scraper journal-scrapers/peerj.json --output peerj-384 --loglevel debug
rossmounce commented 10 years ago

ross@ross-x3:~/Documents/corpuses/quickscrape$ quickscrape --url https://peerj.com/articles/384 --scraper journal-scrapers/peerj.json --output peerj-384 --loglevel debug debug: phantomjs installation found at /usr/bin/phantomjs debug: casperjs installation found at /usr/bin/casperjs info: quickscrape launched with... info: - URL: https://peerj.com/articles/384 info: - Scraper: journal-scrapers/peerj.json info: - Rate limit: 3 per minute info: - Log level: debug info: urls to scrape: 1 info: processing URL: https://peerj.com/articles/384

TypeError: Cannot read property '1' of null at Spooky. (/usr/lib/node_modules/quickscrape/lib/scrape.js:87:64) at Spooky.EventEmitter.emit (events.js:95:17) at Spooky. (/usr/lib/node_modules/quickscrape/node_modules/spooky/lib/spooky.js:173:14) at FilteredStream.EventEmitter.emit (events.js:95:17) at FilteredStream. (/usr/lib/node_modules/quickscrape/node_modules/spooky/node_modules/readable-stream/lib/_streamreadable.js:768:14) at FilteredStream.EventEmitter.emit (events.js:92:17) at emitReadable (/usr/lib/node_modules/quickscrape/node_modules/spooky/node_modules/readable-stream/lib/_stream_readable.js:430:10) at emitReadable (/usr/lib/node_modules/quickscrape/node_modules/spooky/node_modules/readable-stream/lib/_stream_readable.js:426:5) at readableAddChunk (/usr/lib/node_modules/quickscrape/node_modules/spooky/node_modules/readable-stream/lib/_stream_readable.js:187:9) at FilteredStream.Readable.push (/usr/lib/node_modules/quickscrape/node_modules/spooky/node_modules/readable-stream/lib/_stream_readable.js:149:10) ross@ross-x3:~/Documents/corpuses/quickscrape$

blahah commented 10 years ago

Ok I just pushed an update that should have fixed this. Update the installation:

sudo -H npm install --global quickscrape

Then you should be able to run the original quickscrape command.

rossmounce commented 10 years ago

quickscrape$ sudo -H npm install quickscrape npm WARN install Refusing to install quickscrape as a dependency of itself

rossmounce commented 10 years ago

ok I've done sudo -H npm uninstall quickscrape

now trying to re-install

rossmounce commented 10 years ago

did you increment the version number? when I uninstall it correctly reports that I've uninstalled 0.1.5 but when I do

quickscrape --version it reports 0.1.4 !

node install.js

PhantomJS detected, but wrong version 1.4.0 @ /usr/bin/phantomjs. Downloading https://bitbucket.org/ariya/phantomjs/downloads/phantomjs-1.9.7-linux-x86_64.tar.bz2 Saving to /home/ross/Documents/corpuses/quickscrape/node_modules/casperjs/node_modules/phantomjs/phantomjs/phantomjs-1.9.7-linux-x86_64.tar.bz2 Receiving... Received 12852K total. Extracting tar contents (via spawned process) Copying extracted folder /home/ross/Documents/corpuses/quickscrape/node_modules/casperjs/node_modules/phantomjs/phantomjs/phantomjs-1.9.7-linux-x86_64.tar.bz2-extract-1401797341412/phantomjs-1.9.7-linux-x86_64 -> /home/ross/Documents/corpuses/quickscrape/node_modules/casperjs/node_modules/phantomjs/lib/phantom Writing location.js file Done. Phantomjs binary available at /home/ross/Documents/corpuses/quickscrape/node_modules/casperjs/node_modules/phantomjs/lib/phantom/bin/phantomjs quickscrape@0.1.5 node_modules/quickscrape ├── which@1.0.5 ├── commander@2.2.0 ├── xpath@0.0.6 ├── spooky@0.2.4 (duplexer@0.0.4, carrier@0.1.14, async@0.1.22, readable-stream@1.0.27-1, tiny-jsonrpc@0.2.1, underscore@1.3.3) ├── winston@0.7.3 (cycle@1.0.3, stack-trace@0.0.9, eyes@0.1.8, colors@0.6.2, async@0.2.10, pkginfo@0.3.0, request@2.16.6) ├── download@0.1.17 (get-stdin@0.1.0, each-async@0.1.3, get-urls@0.1.2, mkdirp@0.3.5, nopt@2.2.1, through2@0.4.2, request@2.36.0, decompress@0.2.4) └── jsdom@0.10.6 (xmlhttprequest@1.6.0, cssom@0.3.0, nwmatcher@1.3.3, htmlparser2@3.7.2, request@2.36.0, cssstyle@0.2.14, contextify@0.1.8)

casperjs@1.1.0-beta3 node_modules/casperjs └── phantomjs@1.9.7-8 (which@1.0.5, rimraf@2.2.8, kew@0.1.7, ncp@0.4.2, mkdirp@0.3.5, adm-zip@0.2.1, npmconf@0.0.24, request@2.36.0) ross@ross-x3:~/Documents/corpuses/quickscrape$ quickscrape --version 0.1.4 ross@ross-x3:~/Documents/corpuses/quickscrape$ quickscrape --url https://peerj.com/articles/384 --scraper journal-scrapers/peerj.json --output peerj-384 --loglevel debug debug: phantomjs installation found at /usr/bin/phantomjs debug: casperjs installation found at /usr/bin/casperjs info: quickscrape launched with... info: - URL: https://peerj.com/articles/384 info: - Scraper: journal-scrapers/peerj.json info: - Rate limit: 3 per minute info: - Log level: debug info: urls to scrape: 1 info: processing URL: https://peerj.com/articles/384

TypeError: Cannot read property '1' of null at Spooky. (/usr/lib/node_modules/quickscrape/lib/scrape.js:87:64) at Spooky.EventEmitter.emit (events.js:95:17) at Spooky. (/usr/lib/node_modules/quickscrape/node_modules/spooky/lib/spooky.js:173:14) at FilteredStream.EventEmitter.emit (events.js:95:17) at FilteredStream. (/usr/lib/node_modules/quickscrape/node_modules/spooky/node_modules/readable-stream/lib/_streamreadable.js:768:14) at FilteredStream.EventEmitter.emit (events.js:92:17) at emitReadable (/usr/lib/node_modules/quickscrape/node_modules/spooky/node_modules/readable-stream/lib/_stream_readable.js:430:10) at emitReadable (/usr/lib/node_modules/quickscrape/node_modules/spooky/node_modules/readable-stream/lib/_stream_readable.js:426:5) at readableAddChunk (/usr/lib/node_modules/quickscrape/node_modules/spooky/node_modules/readable-stream/lib/_stream_readable.js:187:9) at FilteredStream.Readable.push (/usr/lib/node_modules/quickscrape/node_modules/spooky/node_modules/readable-stream/lib/_stream_readable.js:149:10)

blahah commented 10 years ago

You don't need to uninstall first - the latest one will install over the old one. The errors now are because you need the --global in the install command (sorry - I missed it out at first and then edited it to add it back in, but you must have seen the original).

OK, so the exact command to run:

sudo -H npm install --global quickscrape

Phew! Nearly there.

rossmounce commented 10 years ago

ok. did the global (without uninstalling beforehand)

ross@ross-x3:~/Documents/corpuses/quickscrape$ quickscrape --url https://peerj.com/articles/384 --scraper journal-scrapers/peerj.json --output peerj-384 --loglevel debug debug: phantomjs installation found at /usr/bin/phantomjs debug: casperjs installation found at /usr/bin/casperjs info: quickscrape launched with... info: - URL: https://peerj.com/articles/384 info: - Scraper: journal-scrapers/peerj.json info: - Rate limit: 3 per minute info: - Log level: debug info: urls to scrape: 1 debug: scraperJSON definition is valid info: processing URL: https://peerj.com/articles/384 debug: needs at least PhantomJS v1.8 or later.

events.js:72 throw er; // Unhandled 'error' event ^ Error: Child terminated with non-zero exit code 1 at Spooky. (/usr/lib/node_modules/quickscrape/node_modules/spooky/lib/spooky.js:180:17) at ChildProcess.EventEmitter.emit (events.js:98:17) at Process.ChildProcess._handle.onexit (child_process.js:807:12)

blahah commented 10 years ago

that's odd - CasperJS should install the latest Phantom.

can you do:

phantomjs --version
sudo -H npm install --global casperjs
phantomjs --version
rossmounce commented 10 years ago

phantomjs version is reported as the same both before & after:

1.4.0

rossmounce commented 10 years ago

excerpt from terminal...

npm http 304 https://registry.npmjs.org/cryptiles

phantomjs@1.9.7-8 install /usr/lib/node_modules/casperjs/node_modules/phantomjs node install.js

Downloading https://bitbucket.org/ariya/phantomjs/downloads/phantomjs-1.9.7-linux-x86_64.tar.bz2 Saving to /usr/lib/node_modules/casperjs/node_modules/phantomjs/phantomjs/phantomjs-1.9.7-linux-x86_64.tar.bz2 Receiving... Received 12852K total. Extracting tar contents (via spawned process) Copying extracted folder /usr/lib/node_modules/casperjs/node_modules/phantomjs/phantomjs/phantomjs-1.9.7-linux-x86_64.tar.bz2-extract-1401798816470/phantomjs-1.9.7-linux-x86_64 -> /usr/lib/node_modules/casperjs/node_modules/phantomjs/lib/phantom Writing location.js file Done. Phantomjs binary available at /usr/lib/node_modules/casperjs/node_modules/phantomjs/lib/phantom/bin/phantomjs /usr/bin/casperjs -> /usr/lib/node_modules/casperjs/bin/casperjs casperjs@1.1.0-beta3 /usr/lib/node_modules/casperjs └── phantomjs@1.9.7-8 (which@1.0.5, rimraf@2.2.8, kew@0.1.7, ncp@0.4.2, mkdirp@0.3.5, adm-zip@0.2.1, npmconf@0.0.24, request@2.36.0) ross@ross-x3:~/Documents/corpuses/quickscrape$ phantomjs --version 1.4.0

blahah commented 10 years ago

what if you do:

sudo apt-get remove phantomjs

then try to run the quickscrape?

rossmounce commented 10 years ago

ross@ross-x3:~/Documents/corpuses/quickscrape$ sudo apt-get remove phantomjs Reading package lists... Done Building dependency tree
Reading state information... Done The following packages will be REMOVED phantomjs 0 to upgrade, 0 to newly install, 1 to remove and 0 not to upgrade. After this operation, 474 kB disk space will be freed. Do you want to continue [Y/n]? y (Reading database ... 803749 files and directories currently installed.) Removing phantomjs ... ross@ross-x3:~/Documents/corpuses/quickscrape$ quickscrape --url https://peerj.com/articles/384 --scraper journal-scrapers/peerj.json --output peerj-384 --loglevel debug

/usr/lib/node_modules/quickscrape/bin/quickscrape.js:64 throw new Error(msg); ^ Error: Nophantomjs installation found.See installation instructions at https://github.com/ContentMine/quickscrape at fs.readFileSync.encoding (/usr/lib/node_modules/quickscrape/bin/quickscrape.js:64:11) at Array.forEach (native) at Object. (/usr/lib/node_modules/quickscrape/bin/quickscrape.js:57:27) at Module._compile (module.js:456:26) at Object.Module._extensions..js (module.js:474:10) at Module.load (module.js:356:32) at Function.Module._load (module.js:312:12) at Function.Module.runMain (module.js:497:10) at startup (node.js:119:16) at node.js:906:3

rossmounce commented 10 years ago

... and i've just tried this: sudo -H npm install --global casperjs

quickscrape$ phantomjs --version bash: /usr/bin/phantomjs: No such file or directory

blahah commented 10 years ago

OK, I'll add a phantom install script one sec.

blahah commented 10 years ago

OK, made a quick phantom install script.

Run it like this:

curl -sSL https://gist.githubusercontent.com/Blahah/a8084c0b005e93bd77cb/raw/35587616e55c28d7427973a6565a9e54d264f9d4/install_phantomJS.sh | sudo bash

then check phantomjs is installed:

phantomjs --version

It should be 1.9.7.

Then try the quickscrape command.

rossmounce commented 10 years ago

Connecting to bbuseruploads.s3.amazonaws.com (bbuseruploads.s3.amazonaws.com)|176.32.99.177|:443... connected. HTTP request sent, awaiting response... 200 OK Length: 13161396 (13M) [application/x-tar] Saving to: `phantomjs-1.9.7-linux-x86_64.tar.bz2'

100%[=========================================>] 13,161,396 3.61M/s in 4.0s

2014-06-03 14:10:52 (3.11 MB/s) - `phantomjs-1.9.7-linux-x86_64.tar.bz2' saved [13161396/13161396]

phantomjs-1.9.7-linux-x86_64/ phantomjs-1.9.7-linux-x86_64/bin/ phantomjs-1.9.7-linux-x86_64/bin/phantomjs phantomjs-1.9.7-linux-x86_64/examples/ phantomjs-1.9.7-linux-x86_64/examples/scandir.js phantomjs-1.9.7-linux-x86_64/examples/technews.coffee phantomjs-1.9.7-linux-x86_64/examples/tweets.js phantomjs-1.9.7-linux-x86_64/examples/rasterize.coffee phantomjs-1.9.7-linux-x86_64/examples/pagecallback.js phantomjs-1.9.7-linux-x86_64/examples/printheaderfooter.js phantomjs-1.9.7-linux-x86_64/examples/follow.js phantomjs-1.9.7-linux-x86_64/examples/run-jasmine.coffee phantomjs-1.9.7-linux-x86_64/examples/module.js phantomjs-1.9.7-linux-x86_64/examples/waitfor.coffee phantomjs-1.9.7-linux-x86_64/examples/stdin-stdout-stderr.coffee phantomjs-1.9.7-linux-x86_64/examples/pizza.js phantomjs-1.9.7-linux-x86_64/examples/seasonfood.coffee phantomjs-1.9.7-linux-x86_64/examples/unrandomize.js phantomjs-1.9.7-linux-x86_64/examples/modernizr.js phantomjs-1.9.7-linux-x86_64/examples/waitfor.js phantomjs-1.9.7-linux-x86_64/examples/direction.js phantomjs-1.9.7-linux-x86_64/examples/arguments.coffee phantomjs-1.9.7-linux-x86_64/examples/render_multi_url.js phantomjs-1.9.7-linux-x86_64/examples/run-qunit.js phantomjs-1.9.7-linux-x86_64/examples/printheaderfooter.coffee phantomjs-1.9.7-linux-x86_64/examples/ipgeocode.js phantomjs-1.9.7-linux-x86_64/examples/ipgeocode.coffee phantomjs-1.9.7-linux-x86_64/examples/version.js phantomjs-1.9.7-linux-x86_64/examples/movies.js phantomjs-1.9.7-linux-x86_64/examples/child_process-examples.js phantomjs-1.9.7-linux-x86_64/examples/loadurlwithoutcss.coffee phantomjs-1.9.7-linux-x86_64/examples/version.coffee phantomjs-1.9.7-linux-x86_64/examples/seasonfood.js phantomjs-1.9.7-linux-x86_64/examples/server.js phantomjs-1.9.7-linux-x86_64/examples/countdown.js phantomjs-1.9.7-linux-x86_64/examples/rasterize.js phantomjs-1.9.7-linux-x86_64/examples/injectme.js phantomjs-1.9.7-linux-x86_64/examples/run-jasmine.js phantomjs-1.9.7-linux-x86_64/examples/post.js phantomjs-1.9.7-linux-x86_64/examples/imagebin.coffee phantomjs-1.9.7-linux-x86_64/examples/pizza.coffee phantomjs-1.9.7-linux-x86_64/examples/hello.coffee phantomjs-1.9.7-linux-x86_64/examples/features.js phantomjs-1.9.7-linux-x86_64/examples/movies.coffee phantomjs-1.9.7-linux-x86_64/examples/tweets.coffee phantomjs-1.9.7-linux-x86_64/examples/injectme.coffee phantomjs-1.9.7-linux-x86_64/examples/features.coffee phantomjs-1.9.7-linux-x86_64/examples/colorwheel.coffee phantomjs-1.9.7-linux-x86_64/examples/walk_through_frames.js phantomjs-1.9.7-linux-x86_64/examples/printmargins.coffee phantomjs-1.9.7-linux-x86_64/examples/printmargins.js phantomjs-1.9.7-linux-x86_64/examples/scandir.coffee phantomjs-1.9.7-linux-x86_64/examples/loadspeed.coffee phantomjs-1.9.7-linux-x86_64/examples/printenv.js phantomjs-1.9.7-linux-x86_64/examples/serverkeepalive.coffee phantomjs-1.9.7-linux-x86_64/examples/fibo.coffee phantomjs-1.9.7-linux-x86_64/examples/echoToFile.coffee phantomjs-1.9.7-linux-x86_64/examples/netlog.js phantomjs-1.9.7-linux-x86_64/examples/useragent.coffee phantomjs-1.9.7-linux-x86_64/examples/child_process-examples.coffee phantomjs-1.9.7-linux-x86_64/examples/weather.coffee phantomjs-1.9.7-linux-x86_64/examples/direction.coffee phantomjs-1.9.7-linux-x86_64/examples/module.coffee phantomjs-1.9.7-linux-x86_64/examples/printenv.coffee phantomjs-1.9.7-linux-x86_64/examples/simpleserver.js phantomjs-1.9.7-linux-x86_64/examples/fibo.js phantomjs-1.9.7-linux-x86_64/examples/imagebin.js phantomjs-1.9.7-linux-x86_64/examples/colorwheel.js phantomjs-1.9.7-linux-x86_64/examples/technews.js phantomjs-1.9.7-linux-x86_64/examples/hello.js phantomjs-1.9.7-linux-x86_64/examples/echoToFile.js phantomjs-1.9.7-linux-x86_64/examples/postserver.coffee phantomjs-1.9.7-linux-x86_64/examples/page_events.coffee phantomjs-1.9.7-linux-x86_64/examples/postserver.js phantomjs-1.9.7-linux-x86_64/examples/weather.js phantomjs-1.9.7-linux-x86_64/examples/countdown.coffee phantomjs-1.9.7-linux-x86_64/examples/netsniff.coffee phantomjs-1.9.7-linux-x86_64/examples/detectsniff.js phantomjs-1.9.7-linux-x86_64/examples/render_multi_url.coffee phantomjs-1.9.7-linux-x86_64/examples/useragent.js phantomjs-1.9.7-linux-x86_64/examples/walk_through_frames.coffee phantomjs-1.9.7-linux-x86_64/examples/post.coffee phantomjs-1.9.7-linux-x86_64/examples/arguments.js phantomjs-1.9.7-linux-x86_64/examples/simpleserver.coffee phantomjs-1.9.7-linux-x86_64/examples/run-qunit.coffee phantomjs-1.9.7-linux-x86_64/examples/outputEncoding.coffee phantomjs-1.9.7-linux-x86_64/examples/phantomwebintro.js phantomjs-1.9.7-linux-x86_64/examples/follow.coffee phantomjs-1.9.7-linux-x86_64/examples/loadspeed.js phantomjs-1.9.7-linux-x86_64/examples/page_events.js phantomjs-1.9.7-linux-x86_64/examples/loadurlwithoutcss.js phantomjs-1.9.7-linux-x86_64/examples/sleepsort.js phantomjs-1.9.7-linux-x86_64/examples/sleepsort.coffee phantomjs-1.9.7-linux-x86_64/examples/netlog.coffee phantomjs-1.9.7-linux-x86_64/examples/outputEncoding.js phantomjs-1.9.7-linux-x86_64/examples/serverkeepalive.js phantomjs-1.9.7-linux-x86_64/examples/phantomwebintro.coffee phantomjs-1.9.7-linux-x86_64/examples/server.coffee phantomjs-1.9.7-linux-x86_64/examples/universe.js phantomjs-1.9.7-linux-x86_64/examples/pagecallback.coffee phantomjs-1.9.7-linux-x86_64/examples/stdin-stdout-stderr.js phantomjs-1.9.7-linux-x86_64/examples/detectsniff.coffee phantomjs-1.9.7-linux-x86_64/examples/unrandomize.coffee phantomjs-1.9.7-linux-x86_64/examples/netsniff.js phantomjs-1.9.7-linux-x86_64/ChangeLog phantomjs-1.9.7-linux-x86_64/README.md phantomjs-1.9.7-linux-x86_64/LICENSE.BSD phantomjs-1.9.7-linux-x86_64/third-party.txt ross@ross-x3:~/Documents/corpuses/quickscrape$ phantomjs --version bash: /usr/bin/phantomjs: No such file or directory

blahah commented 10 years ago

I think there's some polluted link left over from the previous install:

Let's see if we can figure out where:

sudo apt-get purge phantomjs
which phantomjs
rossmounce commented 10 years ago

quickscrape$ sudo apt-get purge phantomjs Reading package lists... Done Building dependency tree
Reading state information... Done Package phantomjs is not installed, so not removed 0 to upgrade, 0 to newly install, 0 to remove and 0 not to upgrade. ross@ross-x3:~/Documents/corpuses/quickscrape$ which phantomjs /usr/local/bin/phantomjs

blahah commented 10 years ago

ok... confusing... but running phantomjs --version doesn't work?

rossmounce commented 10 years ago

quickscrape$ phantomjs --version bash: /usr/bin/phantomjs: No such file or directory

rossmounce commented 10 years ago

/usr/local/bin$ ls custom-catfish jsonpretty opencv_haartraining phantomjs easy_install lolcat opencv_performance easy_install-2.7 opencv_createsamples opencv_traincascade

blahah commented 10 years ago

sudo ln -s /usr/local/bin/phantomjs /usr/bin/phantomjs

rossmounce commented 10 years ago

ah ha!

sudo ln -s /usr/local/bin/phantomjs /usr/bin/phantomjsln: failed to create symbolic link `/usr/bin/phantomjs': File exists

rossmounce commented 10 years ago

oh, wait, i actually did that twice

blahah commented 10 years ago

lol...

sudo rm /usr/bin/phantomjs
sudo ln -s /usr/local/bin/phantomjs /usr/bin/phantomjs
phantomjs --version
rossmounce commented 10 years ago

$ phantomjs --version 1.9.7

putting in /usr/bin seems to have worked :)

blahah commented 10 years ago

excellent - now to try the quickscrape command!

rossmounce commented 10 years ago

finally!!!

$ quickscrape --url https://peerj.com/articles/384 --scraper journal-scrapers/peerj.json --output peerj-384 --loglevel debug debug: phantomjs installation found at /usr/local/bin/phantomjs debug: casperjs installation found at /usr/bin/casperjs info: quickscrape launched with... info: - URL: https://peerj.com/articles/384 info: - Scraper: journal-scrapers/peerj.json info: - Rate limit: 3 per minute info: - Log level: debug info: urls to scrape: 1 debug: scraperJSON definition is valid info: processing URL: https://peerj.com/articles/384 debug: [phantom] Starting... debug: [phantom] Running suite: 3 steps debug: [phantom] opening url: https://peerj.com/articles/384, HTTP GET debug: [phantom] Navigation requested: url=https://peerj.com/articles/384, type=Other, willNavigate=true, isMainFrame=true debug: [phantom] Navigation requested: url=https://peerj.com/articles/384/, type=Other, willNavigate=true, isMainFrame=true debug: [phantom] url changed to "https://peerj.com/articles/384/" debug: [phantom] Successfully injected Casper client-side utilities debug: [phantom] start page is loaded debug: [phantom] Step anonymous 3/3 https://peerj.com/articles/384/ (HTTP 200) debug: page downloaded and rendered debug: Ticker created with length 2 debug: saving rendered HTML debug: writing file: rendered.html debug: scraping rendered DOM debug: scraping element: fulltext_pdf debug: found 1 matches data: fulltext_pdf: https://peerj.com/articles/384.pdf debug: downloading in background from https://peerj.com/articles/384.pdf debug: scraping element: fulltext_html debug: found 1 matches data: fulltext_html: https://peerj.com/articles/384 debug: downloading in background from https://peerj.com/articles/384 debug: scraping element: supplementary_material debug: found 0 matches debug: scraping element: title debug: found 1 matches data: title: Mutation analysis of the SLC26A4, FOXI1 and KCNJ10 genes in individuals with congenital hearing loss debug: scraping element: author debug: found 6 matches data: author: Lynn M. Pique data: author: Marie-Luise Brennan data: author: Colin J. Davidson data: author: Frederick Schaefer data: author: John Greinwald Jr data: author: Iris Schrijver debug: scraping element: date debug: found 1 matches data: date: 2014-05-08 debug: scraping element: doi debug: found 1 matches data: doi: 10.7717/peerj.384 debug: scraping element: volume debug: found 1 matches data: volume: 2 debug: scraping element: issue debug: found 0 matches debug: scraping element: firstpage debug: found 1 matches data: firstpage: e384 debug: scraping element: description debug: found 1 matches data: description: Pendred syndrome (PDS) and DFNB4 comprise a phenotypic spectrum of sensorineural hearing loss disorders that typically result from biallelic mutations of the SLC26A4 gene. Although PDS and DFNB4 are recessively inherited, sequencing of the coding regions and splice sites of SLC26A4 in individuals suspected to be affected with these conditions often fails to identify two mutations. We investigated the potential contribution of large SLC26A4 deletions and duplications to sensorineural hearing loss (SNHL) by screening 107 probands with one known SLC26A4 mutation by Multiplex Ligation-dependent Probe Amplification (MLPA). A heterozygous deletion, spanning exons 4–6, was detected in only one individual, accounting for approximately 1% of the missing mutations in our cohort. This low frequency is consistent with previously published MLPA results. We also examined the potential involvement of digenic inheritance in PDS/DFNB4 by sequencing the coding regions of FOXI1 and KCNJ10. Of the 29 probands who were sequenced, three carried nonsynonymous variants including one novel sequence change in FOXI1 and two polymorphisms in KCNJ10. We performed a review of prior studies and, in conjunction with our current data, conclude that the frequency of FOXI1 (1.4%) and KCNJ10 (3.6%) variants in PDS/DFNB4 individuals is low. Our results, in combination with previously published reports, indicate that large SLC26A4 deletions and duplications as well as mutations of FOXI1 and KCNJ10 play limited roles in the pathogenesis of SNHL and suggest that other genetic factors likely contribute to the phenotype. info: waiting for 2 downloads to complete in background debug: writing file: results.json debug: [phantom] Step anonymous 3/3: done in 2869ms. debug: [phantom] Done 3 steps in 2871ms debug: Ticker progress: 1 of 4 debug: Ticker progress: 2 of 4 debug: download done debug: Ticker progress: 3 of 4 debug: download done debug: Ticker progress: 4 of 4 debug: Ticker finished debug: changing back to top-level directory info: all tasks completed

blahah commented 10 years ago

Well that was horrible! I'll have to make this installation process a lot nicer :(

Thanks for your patience.

petermr commented 10 years ago

Well done Spooooooky!

FWIW I have cleared my "bug" in species so I think we have a full setv if tools if a bit fragile.

On Tue, Jun 3, 2014 at 3:28 PM, Richard Smith-Unna <notifications@github.com

wrote:

Closed #5 https://github.com/ContentMine/quickscrape/issues/5.

— Reply to this email directly or view it on GitHub https://github.com/ContentMine/quickscrape/issues/5#event-127463853.

Peter Murray-Rust Reader in Molecular Informatics Unilever Centre, Dep. Of Chemistry University of Cambridge CB2 1EW, UK +44-1223-763069