ContentMine / quickscrape

A scraping command line tool for the modern web
MIT License
259 stars 42 forks source link

Quickscrape hangs and emits warnings #76

Open petermr opened 8 years ago

petermr commented 8 years ago
dynamic-0-4:phyto pm286$ quickscrape -u http://pubs.acs.org/doi/full/10.1021/np200645p -s ../../workspace/journal-scrapers/scrapers/acs1.json -o jnp -i 5
info: quickscrape 0.4.7 launched with...
info: - URL: http://pubs.acs.org/doi/full/10.1021/np200645p
info: - Scraper: /Users/pm286/workspace/journal-scrapers/scrapers/acs1.json
info: - Rate limit: 5 per minute
info: - Log level: info
info: urls to scrape: 1
info: processing URL: http://pubs.acs.org/doi/full/10.1021/np200645p
[info] [phantom] Starting...
[info] [phantom] Running suite: 3 steps
[debug] [phantom] opening url: http://pubs.acs.org/doi/full/10.1021/np200645p, HTTP GET
[debug] [phantom] Navigation requested: url=http://pubs.acs.org/doi/full/10.1021/np200645p, type=Other, willNavigate=true, isMainFrame=true
[debug] [phantom] url changed to "http://pubs.acs.org/doi/full/10.1021/np200645p"
[debug] [phantom] Navigation requested: url=about:blank, type=Other, willNavigate=true, isMainFrame=false
[debug] [phantom] Navigation requested: url=about:blank, type=Other, willNavigate=true, isMainFrame=false
[debug] [phantom] Navigation requested: url=about:blank, type=Other, willNavigate=true, isMainFrame=false
[debug] [phantom] Successfully injected Casper client-side utilities
[debug] [phantom] start page is loaded
[info] [phantom] Step anonymous 3/3 http://pubs.acs.org/doi/full/10.1021/np200645p (HTTP 200)
info: [scraper]. URL rendered. http://pubs.acs.org/doi/full/10.1021/np200645p.
[info] [phantom] Step anonymous 3/3: done in 6770ms.
[info] [phantom] Done 3 steps in 6772ms
Unsafe JavaScript attempt to access frame with URL about:blank from frame with URL file:///Users/pm286/.nvm/v0.10.38/lib/node_modules/quickscrape/node_modules/thresher/node_modules/casperjs/bin/bootstrap.js. Domains, protocols and ports must match.
Unsafe JavaScript attempt to access frame with URL about:blank from frame with URL file:///Users/pm286/.nvm/v0.10.38/lib/node_modules/quickscrape/node_modules/thresher/node_modules/casperjs/bin/bootstrap.js. Domains, protocols and ports must match.
Unsafe JavaScript attempt to access frame with URL about:blank from frame with URL file:///Users/pm286/.nvm/v0.10.38/lib/node_modules/quickscrape/node_modules/thresher/node_modules/casperjs/bin/bootstrap.js. Domains, protocols and ports must match.
Unsafe JavaScript attempt to access frame with URL about:blank from frame with URL file:///Users/pm286/.nvm/v0.10.38/lib/node_modules/quickscrape/node_modules/thresher/node_modules/casperjs/bin/bootstrap.js. Domains, protocols and ports must match.
Unsafe JavaScript attempt to access frame with URL about:blank from frame with URL file:///Users/pm286/.nvm/v0.10.38/lib/node_modules/quickscrape/node_modules/thresher/node_modules/casperjs/bin/bootstrap.js. Domains, protocols and ports must match.
Unsafe JavaScript attempt to access frame with URL about:blank from frame with URL file:///Users/pm286/.nvm/v0.10.38/lib/node_modules/quickscrape/node_modules/thresher/node_modules/casperjs/bin/bootstrap.js. Domains, protocols and ports must match.

and hangs at this point

tarrow commented 8 years ago

This may be solved by #90 or it is an example of #62