biocaddie / prototype_issues

Used to report and track bioCADDIE prototype issues
3 stars 5 forks source link

Follow up with Pubmed about grant citations #185

Open ianfore opened 7 years ago

ianfore commented 7 years ago

During testing of Datamed 1.5 we found that grants were listed multiple times for a study.

It turned out that this is because the grant information returned from PubMed contains these duplicates. The duplicates was fixed in Datamed prior to its 1.5 release. (How?). The duplicates are still there in what is obtained from Pubmed, but the duplicates are eliminated somehow.

It seems worth following up with the Pubmed team on this. Among other things it would open a dialog about how Datamed is using Pubmed.

This is the Datamed entry for the dataset https://datamed.org/display-item.php?repository=0002&id=57d0cffae4b070ffc1ae9786&query=4ll0 See the list at right for the Grant Support

This is the Pubmed record https://www.ncbi.nlm.nih.gov/pubmed/?term=24019492 This is the list of grants that Pubmed lists for that Pubmed id Note that every grant is duplicated. R01 CA079992/CA/NCI NIH HHS/United States U54 CA143798/CA/NCI NIH HHS/United States R01-CA121210/CA/NCI NIH HHS/United States CA90949/CA/NCI NIH HHS/United States P01 CA154303/CA/NCI NIH HHS/United States P30-CA68485/CA/NCI NIH HHS/United States R01 CA121210/CA/NCI NIH HHS/United States P50 CA090949/CA/NCI NIH HHS/United States R01 CA116020/CA/NCI NIH HHS/United States P30 CA068485/CA/NCI NIH HHS/United States R01-CA116020/CA/NCI NIH HHS/United States P01-CA129243/CA/NCI NIH HHS/United States P01-CA154303/CA/NCI NIH HHS/United States P01 CA129243/CA/NCI NIH HHS/United States R01-CA079992/CA/NCI NIH HHS/United States U54-CA143798/CA/NCI NIH HHS/United States

naturalbeau commented 7 years ago

In the DataMed, we get the list of grants from Pubmed, and then according to the grant number, delete any duplicate ones.

For example, "R01 CA079992/CA/NCI NIH HHS/United States" and "R01-CA079992/CA/NCI NIH HHS/United States" have the same grant number; "CA90949/CA/NCI NIH HHS/United States" and "P50 CA090949/CA/NCI NIH HHS/United States" have the same grant number. For those grants, we only show once in DataMed.

DCGenomics commented 7 years ago

Grant ids can also be extracted from BioProject through our EDirect API.

If you would like a script for that, submit an issue here: https://github.com/NCBI-Hackathons/EDirect_EUtils_API_Cookbook