Scrapers for pre-print services

ContentMine / journal-scrapers

Journal scraper definitions for the ContentMine framework

66 stars 33 forks source link

Scrapers for pre-print services #7

Closed noamross closed 7 years ago

noamross commented 10 years ago

(Posting in anticipation of Mozilla Open Science Spring)

Scrapers for:

arXiv
bioRxiv
PeerJ preprints (is it different than PeerJ)
Figshare
F1000 Research

rossmounce commented 7 years ago

I've made scrapers for biorxiv, figshare and f1000 research PeerJ Preprints doesn't need a seperate scraper file IMO : peerj.json scraper does just fine.

arXiv I'm leaving because that's better catered for by getpapers

petermr commented 7 years ago

Thanks. Note there is a problem with arXiv because trying to scrape it gets you cut off (happened to me) so we should probably remove this option.

On Mon, Dec 5, 2016 at 1:49 PM, Ross Mounce notifications@github.com wrote:

I've made scrapers for biorxiv, figshare and f1000 research PeerJ Preprints doesn't need a seperate scraper file IMO : peerj.json scraper does just fine.

arXiv I'm leaving because that's better catered for by getpapers

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/ContentMine/journal-scrapers/issues/7#issuecomment-264858207, or mute the thread https://github.com/notifications/unsubscribe-auth/AAsxS1XrG2y-1y4MxSJRr2evJ0mi24F4ks5rFBZPgaJpZM4CK-7g .

-- Peter Murray-Rust Reader Emeritus in Molecular Informatics Unilever Centre, Dept. Of Chemistry University of Cambridge CB2 1EW, UK +44-1223-763069

blahah commented 7 years ago

@petermr do you mean using getpapers with arxiv causes the problem? If so it's because we're not respecting their rate limit and page constraints. Easily fixed I think.

petermr commented 7 years ago

Ah! I thought they banned it. I've been having good discussions with Paul Ginsparg so should be fine. He's done some nice work.

On Mon, Dec 5, 2016 at 4:11 PM, Richard Smith-Unna <notifications@github.com

wrote:

@petermr https://github.com/petermr do you mean using getpapers with arxiv causes the problem? If so it's because we're not respecting their rate limit and page constraints. Easily fixed I think.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/ContentMine/journal-scrapers/issues/7#issuecomment-264896124, or mute the thread https://github.com/notifications/unsubscribe-auth/AAsxS6GfypIxNCtnEprH3DAOUDmq_Hixks5rFDevgaJpZM4CK-7g .

-- Peter Murray-Rust Reader Emeritus University of Cambridge +44-1223-763069 and ContentMine Ltd