ContentMine / quickscrape

A scraping command line tool for the modern web
MIT License
260 stars 43 forks source link

Installation via npm misses tiny-jsonrpc dependency #103

Open bcarradini opened 6 years ago

bcarradini commented 6 years ago

After installing quickscrape and journal-scrapers via npm, as instructed, and getting the basic example to work (which required a workaround, see my comment on #102), I hit another snag.

My attempts to scrape http://journals.sagepub.com/toc/bcqe/57/3 using a custom scraper kept failing without any feedback. I dug down into the Thresher dependency to add more logging (side note: You should add logging to Thresher!) and figured out that there was a problem affecting headless.js:

"HeadlessRenderer.prototype.render: Error: Cannot find module '/Users/barbara/.nvm/versions/node/v8.9.1/lib/node_modules/quickscrape/node_modules/spooky/lib/../node_modules/tiny-jsonrpc/lib/tiny-jsonrpc'"

I was able to successfully scrape http://journals.sagepub.com/toc/bcqe/57/3 using my custom scraper once I locally installed tiny-jsonrpc under: /Users/barbara/.nvm/versions/node/v8.9.1/lib/node_modules/quickscrape/node_modules/spooky/

blahah commented 6 years ago

Thanks for this feedback @bcarradini, and for your patience.

@tarrow is qs on your radar right now? Or should I add this to my list?

tarrow commented 6 years ago

@blahah Right now QS isn't directly on my radar. If you want to take a crack at it then please do.

If I suddenly get time to look (and you haven't assigned the issue to you) I'll assign it to me.