bokulich-lab / q2-fondue

Functions for reproducibly Obtaining and Normalizing Data re-Used from Elsewhere
BSD 3-Clause "New" or "Revised" License
20 stars 6 forks source link

ENH: fetch publication metadata with `get-metadata` #102

Closed nbokulich closed 2 years ago

nbokulich commented 2 years ago

As a plugin user, I would like to also fetch publication metadata when I use get-metadata, so that I know about any publications linked to the studies and samples. This would significantly improve traceability. i.e., that citation information is also fetched and preserved alongside other (meta)data.

I would like to grab any Pubmed ID(s) linked to the BioProject ID. If possible, this could be linked additionally to a DOI (as a separate metadata column).

I imagine that we could either (a) embed the citation directly in provenance or (b) just retrieve the pubmed ID/doi and place it in the metadata file for easy parsing later.

PubMed ID appears to be one of the optional metadata categories, and even searching BioProjects by PubMed ID is possible in SRA.

This could also be part of a separate action if PubMed ID needs to be fetched via a separate query.

misialq commented 2 years ago

Hmmm, however exciting that sounded before, I am not sure if this is possible. Even though they say in that paper that these records are linked I cannot find a single example of where this would be the case. I've tried with 10 BioProjects and for none of them I could find links to PubMed (using the browser interface). Also, for two or three examples I tried the reverse (go to the paper and find linked BioProject) - also not possible. If someone can get me a working example (again, in the browser, in any way) I can probably convert it to a cli version. Without it, I'm not sure it's possible. Will dig a bit more though.

nbokulich commented 2 years ago

it might just be that the pubmed IDs are most often not linked? i.e., this probably needs to be updated by the authors?

misialq commented 2 years ago

that's what I thought too... I just mean that it's a bit difficult to work on a feature without a working example 😓

adamovanja commented 2 years ago

I suggest we close this issues as fetching DOI names was addressed in #113 with the scrape-collection action.

misialq commented 2 years ago

This issue describes a slightly different scenario than the one addressed in #113. Here we would need to fetch DOIs/Pubmed IDs given an existing list of BioProject IDs - without the step of going through a Zotero library (not all the IDs necessarily need to originate from publications). However, based on my brief investigation, unfortunately, I don't think it is possible to actually do it (see comment above) so I am indeed closing this issue.