Investigate pulling data from Google Scholar or other databases to populate Wikidata

baskaufs commented 2 years ago

This is related to an email from Andy on 2020-08-12 and Rebecca Jerome 2020-08-11

It is not clear whether I ever replied to Rebecca about this.

From: Andrew Wesolek andrew.j.wesolek@vanderbilt.edu Date: Wednesday, August 12, 2020 at 8:29 AM To: Steve Baskauf steve.baskauf@Vanderbilt.Edu Subject: FW: advice on a publication listing challenge

Hi Steve,

Were you involved in the first iteration of this project? I don’t believe I was.

Andy

From: "Jerome, Rebecca" rebecca.jerome@vumc.org Date: Tuesday, August 11, 2020 at 8:56 AM To: "Wesolek, Andrew Joseph" andrew.j.wesolek@vanderbilt.edu, "Baskauf, Steven James" steve.baskauf@Vanderbilt.Edu Subject: RE: advice on a publication listing challenge

Hi Andrew and Steven!

We talked about a year ago about some possible solutions related to an automated aggregation approach for citations on the REDCap project, and I just had a similar question come up about the Right from the Start project, a longitudinal study run by Katherine Hartmann on the medical center side. I wondered if any new solutions have popped up recently related to Google Scholar or another database tool for doing this kind of thing, to track publications + # of citations to those publications on an ongoing basis for the project.

Let me know if you’ve heard of anything that might work and thank you! hope you all are well.

Becky

Rebecca Jerome, MLIS, MPH Manager, Translational Research Vanderbilt Institute for Clinical and Translational Research Vanderbilt University Medical Center 2525 West End Avenue, 6th floor Nashville, TN 37203-8820 (615)343-1267 rebecca.jerome@vumc.org

From: Wesolek, Andrew Joseph andrew.j.wesolek@vanderbilt.edu Sent: Monday, March 25, 2019 2:53 PM To: Jerome, Rebecca rebecca.jerome@vumc.org Cc: Baskauf, Steven James steve.baskauf@Vanderbilt.Edu; Shook, Elisabeth R elisabeth.r.shook@vanderbilt.edu Subject: Re: advice on a publication listing challenge

Hi Rebecca,

Thanks for reaching out! This might be something we can help with, but let’s sit down to chat about it. I’m looping in our data and scholarly communications gurus, Steve and Elisabeth, to help. How does Thursday before 3:00 look for everyone?

Andy

Andrew Wesolek Director Digital Scholarship and Scholarly Communications Vanderbilt University Libraries Office: 615.343.1075 https://orcid.org/0000-0002-0061-5182

From: "Jerome, Rebecca" rebecca.jerome@vumc.org Date: Monday, March 25, 2019 at 12:52 PM To: "Wesolek, Andrew Joseph" andrew.j.wesolek@vanderbilt.edu Subject: advice on a publication listing challenge

Hi Andrew!

I’m a former EBL librarian, now with the Vanderbilt Institute for Clinical and Translational Research, and I received a request for help from Paul Harris on behalf of his REDCap team -- I’m stumped about how to figure out how to automate a process they have been using.

Their goal is an ongoing list of publications that cite use of REDCap, to show how widely the tool is used plus the wide range of topics etc. We periodically use the data in grants and presentations too.

They currently use Google Scholar notifications to populate a REDCap database, with manual curation to map to PubMed IDs to enable linking out to PubMed for each citation that has a PubMed record corresponding with the paper identified in Google Scholar – with the end product being this periodically updated list of publications resulting from use of REDCap - https://projectredcap.org/resources/citations/

So in general it looks kind of like this: 1) Google Scholar alert triggers with new papers mentioning REDCap 2) These get manually imported into a REDCap publications project 3) Someone adds PMIDs to those new records 4) The project REDCap webpage draws from that REDCap project to list publications

Paul shared re: where he’d love to land with a revised process: “2+3 are a manual process where we’re simply adding relevant information into a REDCap database that feeds 4. I’d like to replace 1 with a batch process and automate 2+3.”

I chatted with Philip Walker about this challenge and we looked at doing an RSS feed from Web of Science but since it’s a proprietary resource it seemed like it was lacking some of the functionality we’d want, though maybe we could use Zotero to build our own feed somehow.. this is clearly not my area of expertise 😊 He mentioned that perhaps your team may have advice or ideas about how to streamline the approach or even replace it with a different method.

If you or your team have any thoughts or recommendations I would appreciate them so much, and happy to talk further if that might be an option.

Thanks for considering! Becky

Rebecca Jerome, MLIS, MPH Manager, Translational Research Vanderbilt Institute for Clinical and Translational Research Vanderbilt University Medical Center 2525 West End Avenue, 6th floor Nashville, TN 37203-8820 (615)343-1267 rebecca.jerome@vumc.org Please note that my email address has changed and update your records if needed.

Refer to Sarah's text mining libguide: https://researchguides.library.vanderbilt.edu/c.php?g=1112228&p=8109407&preview=f3722ebe16c010a87d13a131f795afc3 for a list of possible APIs

Investigated Elsevier API, which includes Scopus, ScienceDirect, and SciVal. Notes in the DiSC OneDrive

baskaufs commented 2 years ago

Here's links for Internet Archive:

https://blog.archive.org/2021/10/19/internet-archive-releases-refcat-the-ia-scholar-index-of-over-1-3-billion-scholarly-citations/

https://blog.archive.org/2021/03/09/search-scholarly-materials-preserved-in-the-internet-archive/

baskaufs commented 2 years ago

LENS database (public) http://dx.doi.org/10.5195/jmla.2020.918

However, it does sound like they charge for access: https://www.lens.org/lens/user/subscriptions#scholar

baskaufs commented 2 years ago

List of public APIs from Francisco: https://github.com/public-apis/public-apis

HeardLibrary / vandycite

Investigate pulling data from Google Scholar or other databases to populate Wikidata #48