ContentMine / journal-scrapers

Journal scraper definitions for the ContentMine framework
66 stars 33 forks source link

More major publisher scrapers #8

Open noamross opened 10 years ago

noamross commented 10 years ago

(In anticipation of open science sprint)

petermr commented 10 years ago

Noam, This is brilliant.

Some of the project proposers are talking on Thursday with Kay Thaney of Mozilla who runs the 2 day hack and this makes a truly wonderful hack topic. It will get great exposure and we can count on Mozilla to help spread what you have done. I don't have your email - and think it may be a good idea to mail or have a skype in the next day or so.

My email is fairly easy to discover! peter.murray.rust at G

On Tue, Jul 8, 2014 at 1:54 AM, Noam Ross notifications@github.com wrote:

(In anticipation of open science sprint)

  • Wiley
  • Springer
  • Taylor and Francis
  • JSTOR

— Reply to this email directly or view it on GitHub https://github.com/ContentMine/journal-scrapers/issues/8.

Peter Murray-Rust Reader in Molecular Informatics Unilever Centre, Dep. Of Chemistry University of Cambridge CB2 1EW, UK +44-1223-763069

blahah commented 10 years ago

I've been saving JSTOR - it feels like a big deal to enable taming that particular beast.

robintw commented 8 years ago

Has there been any progress in creating a scraper for Taylor and Francis? I'm keen to scrape a T&F journal, and might have a go at writing a scraper, but don't want to duplicate work

chartgerink commented 8 years ago

It's on my list for next week! I forked the repo here

robintw commented 8 years ago

Great :-)

I actually started working on a T&F scraper tonight (I was bored...). I have a 'first go' at a scraper (see https://gist.github.com/robintw/2701f0f87b7232bd0f73e02b8333a841) which works on the few example articles I've tested it on.

(I haven't installed the proper testing framework yet - but it'll be interesting to see how it works on the list of test T&F URLs that I see you've added to your fork of the repo.