demand-driven-open-data / ddod-intake

"DDOD Intake" tracks DDOD Use Cases using GitHub issues. View the main DDOD site here
http://ddod.us
28 stars 11 forks source link

Mapping of NPI number to PubMed ID #24

Closed yashgad closed 8 years ago

yashgad commented 9 years ago

Context: In order to relate the academic publications of a physician or other healthcare provider to all the other metadata available through other HHS offerings, it is necessary to have a convenient way to map the Pubmed ID (PMID) number of an article or other NLM author ID back to the physician. This information will help give broader insight into the publications and types of subject areas that a doctor or healthcare provider is involved in.

Issue: Develop a database that lists a physician or healthcare provider NPI number, and correlates that with a comma delimited list of Pubmed IDs. This list should be generated dynamically from Pubmed, to provide the most up to date information possible.

dportnoy commented 9 years ago

Use case specs page created: https://github.com/demand-driven-open-data/ddod-intake/wiki/Use-Case-24

Update: Moved page for full use case specifications and solution: http://hhs.ddod.us/wiki/Use_Case_24

dportnoy commented 9 years ago

@yashgad, please update items with "(?)" in https://github.com/demand-driven-open-data/ddod-intake/issues/24.

ftrotter commented 9 years ago

It should be noted that this data just does not exist yet. This is asking for totally new data and it should be categorized as "very hard" as a result.

Also there are multiple efforts to consolidate researher ids and just to ensure that pubmed itself has unique ids for researchers.. .stuff like http://www.researcherid.com/ etc

Not sure this is an HHS problem to solve. Although they should do it no one else can...

betshsu commented 9 years ago

Another big player in the researcher ID space is ORCID (http://orcid.org), which also supports an API for their database (at varying access levels for the public to paying members http://orcid.org/organizations/integrators/API). A lot of journals are encouraging, if not requiring, authors to have an ORCID when they submit a manuscript.

While NIH is linking to ORCID with the new SciENcv (http://www.nlm.nih.gov/pubs/techbull/so14/so14_sciencv_orcid.html), it is not yet requiring grantees to have an ORCID.

Mapping from a PubMed article back to an NPI without something like a researcher ID would be extremely challenging because of issues with name disambiguation.

If the physician is NIH-funded (e.g., the publications of interest are the results of NIH-funded grants), NIH has robust links between NIH-funded grants to PubMed publications (via acknowledgment of the NIH grant number), and this data is publicly available on RePORTER (http://projectreporter.nih.gov). In theory, you could search for the physician's name and then after the results are returned, click on the publication tab to see all the publications. The one caveat is that these are all the publications that are linked to the grant that the physician is associated with, so the physician may not actually be an author on all the publications.

betshsu commented 9 years ago

@yashgad any thoughts on the workaround proposed above involving NIH RePORTER?

betshsu commented 9 years ago

Edit -- I should add that the workaround using NIH RePORTER isn't limited to physicians funded by NIH -- it will work for other HHS agencies that also use the same grants management system as NIH, so ACF, AHRQ, CDC HRSA, and FDA (and also VA)

dportnoy commented 9 years ago

@betshsu can you confirm that matching NPI to PubMed (via RePORTER) is a technical challenge, rather a legal, regulatory or privacy limitation?

Also, is this linking possible via ORCID and is it free to use?

betshsu commented 9 years ago

@dportnoy To clarify, my workaround does not directly match NPI to PubMed -- it is based on using names, which will have fuzzy matching issues on both ends (the NPI and the RePORTER end). I can't speak from the perspective of CMS and NPI, but there is no legal, regulatory, or privacy limitation in looking up a person in RePORTER to obtain their publication list from the NIH end. That is all public data.

In order to link via ORCID or some other researcher ID would be dependent on the physician having registered for one of these IDs and PubMed (or other index, such as Scopus or Web of Science) capturing those numbers as well (I'm not positive it if is dependent on the journal requiring the ID; it may, depending on how the index pulls in the info). As far as I know, these IDs are not yet universally mandated though their use is growing. It would be the preferable way to do such a matching though since it would remove the disambiguation concern with matching names. These IDs are free to register for as an individual and to use.

betshsu commented 8 years ago

@yashgad Having re-read your original request again, it seems that while the database you'd like does not exist, what you want to achieve is already possible with existing open data sources with the caveat that it is reliant on name matching in two different databases (NPI and PubMed), where we know there are issues with name disambiguation. Unless I am misunderstanding what you are after, the process would be to: 1) Pull the NPI of the physician of interest 2) Run a PubMed author search of the physician of interest and pull back PubMed IDs of the articles (note from all the prior comments that there is no such thing as an NLM author ID, though NLM does support outside author identifiers like ORCID when supplied by the journal. See https://www.nlm.nih.gov/pubs/techbull/nd10/nd10_pm_author_id.html). The hit list is exportable as a PMID list (text file). As mentioned, there are name disambiguation issues with both databases, but that is an issue that will best be solved by the researcher ID efforts.

Please let us know if we've misinterpreted something in what you're after, otherwise we will close this issue as documented. Thanks.

yashgad commented 8 years ago

Agree that this may not be an HHS problem to solve @betshsu. We are currently solving this problem the way you have outlined (basically mapping the npi to whatever author or PMID we can find). But what I was hoping for was a more standardized and universal ID database, either generated by pubmed or by hhs in coordination with NIH or some other body where there would be an established linking at some level between the NPI and whatever this ID would be. I am fine with closing this issue, but with the understanding that the hacky solution is not really a great one because of all the caveats that you all articulated.

betshsu commented 8 years ago

@yashgad Agreed that the solution is hacky; unfortunately, I think the best push forward for a universal ID (at least from the author/researcher angle) is via journals that are requiring authors to register for ORCID and other similar IDs. It is a movement that is slowly gaining momentum and PubMed is ready to ingest the IDs, so hopefully it will lead to improvements on the author disambiguation issue in the future, though it will take time for researchers to adopt the researcher IDs.

I will update the wiki and document all the caveats outlined here, thanks.

betshsu commented 8 years ago

Information documented on wiki here: http://hhs.ddod.us/wiki/Use_Case_24:_Mapping_of_NPI_number_to_PubMed_ID#Solution