BeagleLab / voyage

Planning for the Beagle Project
4 stars 1 forks source link

Feature: Google Scholar "Cited By" Module #18

Open chrisgervang opened 9 years ago

chrisgervang commented 9 years ago

Search google scholar for the PDF loaded into Beagle, and the reveal the "cited by" information.

Have:

Need:

  1. Reliable way to identify the PDF for a service like, say, Google Scholar.
    • Getting unique identification of a paper from the PDF text and meta data. For example, extracting direct paper title.
    • grab the top text of the page.
    • meta data could be useful.
    • font size.
    • DOI -> database query -> direct title. For example, Alt Metrics.

Consideration: These can be organized into npm modules

adammarblestone-zz commented 9 years ago

DOI-->PubMed http://www.ncbi.nlm.nih.gov/pmc/pmctopmid/

We can use this to extract titles from DOI's, if the article is in pubmed.

Would be one of multiple ways to get title and author data from a PDF, without relying on detailed PDF parsing.

adammarblestone-zz commented 9 years ago

Maybe it is possible to search Scholar by DOI:

http://academicanswers.waldenu.edu/a.php?qid=290302

adammarblestone-zz commented 9 years ago

We should have multiple ways to get a scholar search going from a PDF-in-browser... could include getting the title via the PDF, via pubmed, via other sources and searching scholar for the title... could also include direct scholar search of the DOI...?

adammarblestone-zz commented 9 years ago

I am not sure the above search scholar by DOI thing really is something that works well... got to test it.

adammarblestone-zz commented 9 years ago

http://www.crossref.org/

is probably the best solution for DOI --> metadata

adammarblestone-zz commented 9 years ago

http://search.crossref.org/?q=10.1177%2F0741713611402046

adammarblestone-zz commented 9 years ago

Crossref = over 68 million journal articles