CDLUC3 / dmsp_aws_prototype

Sceptre CloudFormation templates for DMPHub v2
MIT License
1 stars 0 forks source link

COKI Observatory Network integration #77

Open briri opened 6 months ago

briri commented 6 months ago

Build a Lambda that calls out to the COKI Observatory Network's BigQuery system to search for related works.

Most of the tables are updated on a weekly basis, so the system should not be called more frequently than that for a given DMP. One thought is to find the best entry point to finding related works (e.g. ORCIDs) and then have the Lambda gather all relevant ORCID ids, call BigQuery, and then process the results using your algorithm to determine if the record is a match to one of the DMPs. This will ensure we're making as few calls as possible.

Sample queries that search various tables for a set of ORCIDs:

# OpenAlex Query:
# --------------------------------------------
SELECT doi, title, type, publication_date, authorships, ids, grants
FROM `academic-observatory.observatory_intermediate.openalex20231119`
WHERE EXISTS(SELECT 1 FROM UNNEST(grants) WHERE funder = 'https://openalex.org/F4320332161')
AND EXISTS(SELECT 1 FROM UNNEST(authorships) WHERE author.orcid IN ('ORCID 1', 'ORCID 2'))
AND publication_year > 2020;

# Openaire Query:
# --------------------------------------------
SELECT pid, type, maintitle, publicationdate, publisher, description, subjects,
author
FROM `academic-observatory.openaire.dataset20230817`
WHERE publicationdate >= '2022-01-01'
AND EXISTS (SELECT 1 FROM UNNEST(author) WHERE pid.id.value IN ('ORCID 1', 'ORCID 2'));

# Pubmed Article Query (not sure how to get to related datasets):
# ---------------------------------------------
SELECT MedlineCitation.PMID, MedlineCitation.Article.ArticleTitle, MedlineCitation.Article.ArticleDate, pubdate, MedlineCitation.Article.Abstract, authors, MedlineCitation.Article.GrantList, MedlineCitation.Article.Journal, MedlineCitation.Article.DataBankList
FROM `academic-observatory.pubmed.pubmed`,
UNNEST(MedlineCitation.Article.ArticleDate) as pubdate,
UNNEST(MedlineCitation.Article.AuthorList) as authors
WHERE pubdate.Year > 2022
AND EXISTS (SELECT 1 FROM UNNEST(authors.Identifier) WHERE value IN ('ORCID 1', 'ORCID 2'));

ORCID WORKS QUERY:
# ----------------------------------------------
SELECT  orcid_identifier.uri,
work_group.work_summary
FROM `academic-observatory.orcid.orcid`,
UNNEST(activities_summary.works.group) as work_group
WHERE orcid_identifier.path IN ('ORCID 1', 'ORCID 2');
briri commented 2 months ago

Instead of OpenAlex which has few datasets, consider working with OX or OpenAire instead

mariapraetzellis commented 1 week ago

@briri will write up a description of the work COKI could do to help create these tables for us. Brian will create additional tickets once he writes up the description of work to be done. This will include work he would have to do & work we are hoping to get from COKI.