ContentMine / quickscrape

A scraping command line tool for the modern web
MIT License
259 stars 42 forks source link

Fails on ACS #87

Open bjonnh opened 8 years ago

bjonnh commented 8 years ago

quickscrape --scraper acs.json --url http://pubs.acs.org/doi/abs/10.1021/acs.jnatprod.6b00118 --output acs.jnatprod.6b00118 --outformat bibjson

It is stuck here: info: quickscrape 0.4.7 launched with... info: - URL: http://pubs.acs.org/doi/abs/10.1021/acs.jnatprod.6b00118 info: - Scraper: acs.json info: - Rate limit: 3 per minute info: - Log level: info info: urls to scrape: 1 info: processing URL: http://pubs.acs.org/doi/abs/10.1021/acs.jnatprod.6b00118 Unsafe JavaScript attempt to access frame with URL about:blank from frame with URL casperjs/bin/bootstrap.js. Domains, protocols and ports must match.

(I used the last acs.json)

tarrow commented 8 years ago

Thanks so much for this bug report. I'll investigate and get back to you about this tomorrow.

On 15 Aug 2016 7:58 p.m., "bjonnh" notifications@github.com wrote:

quickscrape --scraper acs.json --url http://pubs.acs.org/doi/abs/ 10.1021/acs.jnatprod.6b00118 --output acs.jnatprod.6b00118 --outformat bibjson

It is stuck here: info: quickscrape 0.4.7 launched with... info: - URL: http://pubs.acs.org/doi/abs/10.1021/acs.jnatprod.6b00118 info: - Scraper: acs.json info: - Rate limit: 3 per minute info: - Log level: info info: urls to scrape: 1 info: processing URL: http://pubs.acs.org/doi/abs/ 10.1021/acs.jnatprod.6b00118 Unsafe JavaScript attempt to access frame with URL about:blank from frame with URL casperjs/bin/bootstrap.js. Domains, protocols and ports must match.

(I used the last acs.json)

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/ContentMine/quickscrape/issues/87, or mute the thread https://github.com/notifications/unsubscribe-auth/AHA020pkttRatKCdzafSSTqRNXozpgfdks5qgLa8gaJpZM4JkqgY .

blahah commented 8 years ago

in general casperjs/phantom are no longer relevant or supported, and we should migrate to electron via nightmarejs. @tarrow whenever you feel is a good time to start looking at this move, ping me and I can help.

petermr commented 8 years ago

Thanks Richard,

The nightmarejs web page says:

"

NightmareJS is a means to communicate between a CasperJS runtime and a Node server to allow executing of Node functions and data modification features.

NightmareJS is a means to connect CasperJS with NodeJS. There is no need to rewrite existing CasperJS code. Instead, it passes data over a socket.io connection between the Casper object and the Node server, allowing you to query specific data or execute specific functions that node offers but are unavailable in the web browser that Casper operates through. " Does this mean we get it working with the current system (Casper) and then introduce electron and then remove casper? What sort of effort will this be?

Thanks

On Tue, Aug 16, 2016 at 12:49 AM, Richard Smith-Unna < notifications@github.com> wrote:

in general casperjs/phantom are no longer relevant or supported, and we should migrate to electron via nightmarejs http://nightmarejs.org. @tarrow https://github.com/tarrow whenever you feel is a good time to start looking at this move, ping me and I can help.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/ContentMine/quickscrape/issues/87#issuecomment-239963303, or mute the thread https://github.com/notifications/unsubscribe-auth/AAsxS0o7qC2lpekP0tNIZ5OQbZiMnN-cks5qgPrvgaJpZM4JkqgY .

Peter Murray-Rust Reader in Molecular Informatics Unilever Centre, Dep. Of Chemistry University of Cambridge CB2 1EW, UK +44-1223-763069

tarrow commented 8 years ago

This may well be a duplicate of #62