Grab DOI's from pdfs, look them up in a database

BeagleLab / voyage

Planning for the Beagle Project

4 stars 1 forks source link

Grab DOI's from pdfs, look them up in a database #4

Open RichardLitt opened 9 years ago

RichardLitt commented 9 years ago

First and easiest way to get metadata

RichardLitt commented 9 years ago

[x] Grab doi's from a text glob
[x] Parse using altmetrics database

Other databases I should be looking into? Arxiv? Pubmed? CiteSeer? Mendeley?

jbenet commented 9 years ago

awesome! screenshot:

RichardLitt commented 9 years ago

Best I can do for Google Scholar is mimic // use this Python API https://github.com/ckreibich/scholar.py/blob/master/scholar.py

I don't like it much, though. Only scrapes the first page, and isn't the easiest to replicate if I wanted to do it in js.

RichardLitt commented 9 years ago

Yeah... let's not go there. http://stackoverflow.com/a/7587994

So... what other databases should I use? We could start our own, hypothetically.

jbenet commented 9 years ago

@RichardLitt yikes, yeah let's not. Maybe the Academia.edu people or standardanalytics have something.

jbenet commented 9 years ago

Hey @sballesteros -- do you guys have an API for DOI -> paper metadata?

RichardLitt commented 9 years ago

From @adammarblestone:

http://www.ncbi.nlm.nih.gov/pmc/pmctopmid/ We can use this to extract titles from DOI's, if the article is in pubmed. Would be one of multiple ways to get title and author data from a PDF, without relying on detailed PDF parsing.

RichardLitt commented 9 years ago

Resources: