ContentMine / journal-scrapers

Journal scraper definitions for the ContentMine framework
66 stars 33 forks source link

brill scraper no longer functioning #50

Open rossmounce opened 7 years ago

rossmounce commented 7 years ago

Tried first with a DOI URL, then with the resolved URL (for the same article), both fail to extract anything about the article, not even meta elements.

$ quickscrape   --url http://dx.doi.org/10.1163/15685381-00003067   --scraper journal-scrapers/scrapers/brill.json   --output brill 
info: quickscrape 0.4.7 launched with...
info: - URL: http://dx.doi.org/10.1163/15685381-00003067
info: - Scraper: /home/ross/Downloads/pica/journal-scrapers/scrapers/brill.json
info: - Rate limit: 3 per minute
info: - Log level: info
info: urls to scrape: 1
info: processing URL: http://dx.doi.org/10.1163/15685381-00003067
error: Error: Parse Error so moving on to next url in list
info: all tasks completed

ross@ross-x3:~/Downloads/pica$ quickscrape   --url http://booksandjournals.brillonline.com/content/journals/10.1163/15685381-00003067   --scraper journal-scrapers/scrapers/brill.json   --output brillinfo: quickscrape 0.4.7 launched with...
info: - URL: http://booksandjournals.brillonline.com/content/journals/10.1163/15685381-00003067
info: - Scraper: /home/ross/Downloads/pica/journal-scrapers/scrapers/brill.json
info: - Rate limit: 3 per minute
info: - Log level: info
info: urls to scrape: 1
info: processing URL: http://booksandjournals.brillonline.com/content/journals/10.1163/15685381-00003067

/home/ross/.nvm/versions/node/v4.0.0/lib/node_modules/quickscrape/node_modules/thresher/node_modules/spooky/node_modules/tiny-jsonrpc/lib/tiny-jsonrpc/server.js:70
            throw 'Cannot parse function: ' + functionSnippet(fn);
            ^
Cannot parse function: function bound () { ...