Closed Melissa37 closed 5 years ago
On further reading of the PubMed information online, we have clarified that this is the correct way to add data citations to our PubMed deliveries: My Question: Regarding submitting datasets, I am a bit confused by the documentation. Here an object type value of "Dataset" is provided, but here Dataset repository names are provided as the Dataset object type value.
Using the first example, I'd assume the tagging should be:
<Object Type="Dataset">
<Param Name="type">Dryad</Param>
<Param Name="id">10.5061/dryad.2f050</Param>
</Object>
But for the second the tagging should be:
<Object Type="Dryad">
<Param Name="id">10.5061/dryad.2f050</Param>
</Object>
</ObjectList>
PubMed response The first example that you cite from the help would be used to create a linking pair of citations. We commonly use this structure to link comments and corrections to their original article, and the other linking pairs have been grouped in the same category. If you had a PubMed citation to an article, and a second PubMed citation describing a dataset related to the original article, you could create a link between the two. The XML would look like:
<Object Type="dataset">
<Param Name="type">pmid</Param>
<Param Name="id">25264877</Param>
</Object>
This would link the article citation to the dataset citation. (You could also make the link using the dataset ctiation’s DOI rather than the PMID.)
The second example is the XML that you would submit to create an external link to the dataset. …
<Object Type="Dryad">
<Param Name="id">10.5061/dryad.2f050</Param>
</Object>
So, it seems we can only submit datasets from their list for Object, and just using the one example above, I have found a gap of proteomecentral.proteomexchange or PXD: Keyword Comment Dataset Erratum Originalreport Partialretraction Patientsummary Reprint Republished Retraction Update ANZCTR BioProject ClinicalTrials.gov CRiS CTRI ChiCTR DRKS Dryad EudraCT Figshare GDB IRCT ISRCTN JapicCTI JMACCT JPRN NTR Omim PACTR PDB PIR RPCEC ReBec SLCTR SwissProt TCTR UMINCTR UniMES UniParc UniProtKB UniRef NCBI:dbgap NCBI:dbvar NCBI:genbank NCBI:genome NCBI:gensat NCBI:geo NCBI:homologene NCBI:nucleotide NCBI:popset NCBI:protein NCBI:pubchem-bioassay NCBI:pubchem-compound NCBI:pubchem-substance NCBI:refseq NCBI:snp NCBI:sra NCBI:structure NCBI:taxonomy NCBI:unigene NCBI:unists
PubMed: Yes, you are correct, this is a controlled list of allowable values for the secondary source ID list. We are cautious in expanding the list because we are responsible for vetting, reviewing, monitoring and maintaining any links from PubMed. In this case – where the journal participates fully in PMC -- the links are available from the full text there.
This task is awaiting a change in structure output from EJP.
We're ready for this @gnott and @Melissa37
What I see in the kitchen sink XML and a recent published article XML is an assigning authority for the dataset identifier, examples in the kitchen sink being
<pub-id assigning-authority="Dryad" pub-id-type="doi">10.5061/dryad.kj1f3v4</pub-id>
and
<pub-id assigning-authority="NCBI" pub-id-type="accession" xlink:href="https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE48760">GSE48760</pub-id>
The parser is not yet picking up the assigning-authority="NCBI"
value, and that will be a first place to start I think.
The eLife article object Dataset
is being populated by the dataset JSON data the XML parser makes available. I don't think there is a dataset property in the API schema to fit the assigning-authority
value (I want to mention to @thewilkybarkid, in case is has some implications).
The simplest way for me to include it in PubMed outputs is to just add a new property to the JSON output the parser is creating, and this new property should just be ignored by the API schema parser and will cause no harm.
The tentative plan I have so far is: [moved checkbox list to the first comment]
First two checkbox steps are in PR https://github.com/elifesciences/elife-tools/pull/297 awaiting review.
Next two checkboxes are completed in PR https://github.com/elifesciences/elife-article/pull/50.
Drafting the logic in the https://github.com/elifesciences/elife-pubmed-xml-generation library, I have some details and questions.
Current status: Using the latest kitchen sink XML, https://github.com/elifesciences/XML-mapping/blob/master/elife-00666.xml, as the example, here is the new XML included in the Pubmed deposit for the datasets:
<Object Type="Dryad">
<Param Name="id">10.5061/dryad.kj1f3v4</Param>
</Object>
<Object Type="NCBI">
<Param Name="id">GSE48760</Param>
</Object>
This includes only doi
and accession_id
values of the Dataset
object.
I tried including the uri
value of the third dataset which has the assigning_authority
of "other"
in the XML, however since other
is not a value Pubmed will accept, I've omitted including any plain uri
yet, not having a good example use case. Fortunately, if Pubmed does not recognise the value in the <Object>
tag's Type
attribute, it just ignores it and does not cause any errors on their deposit validation tool.
Issue/question 1: Regarding <Object Type="NCBI">
, it is not on the accepted value list specified by Pubmed. If I change it to <Object Type="NCBI:geo">
then Pubmed does display it. @Melissa37, will you be providing more specific NCBI assigning authority values in the XML? If not, do you know how we can determine and include the :geo
portion of the Type
name?
Issue/question 2: The example with <Object Type="Dryad">
seems to work correctly, it shows like this:
Do you have examples of datasets that would be Figshare
type, and will those also have a DOI value? Are there other examples of dataset <pub-id>
tags you could share that are possible to add to the Pubmed deposits?
@gnott I will contact PubMed about issue 1 and cc you in. I'd rather they accept NCBI than add further tasks to our production process, but sorry for not being thorough enough in my investigations at set up!
@FAtherden-eLife would you be able to find any examples that Graham is after in the recent archive?
M
@gnott, yes the figshare citations have dois
Example
<element-citation xmlns:ali="http://www.niso.org/schemas/ali/1.0/" xmlns:xlink="http://www.w3.org/1999/xlink" id="dataset1" publication-type="data" specific-use="isSupplementedBy">
<person-group person-group-type="author">
<name>
<surname>Kazunori</surname>
<given-names>Yoshizawa</given-names>
</name>
<name>
<surname>Yoshitaka</surname>
<given-names>Kamimura</given-names>
</name>
<name>
<surname>Rodrigo</surname>
<given-names>L Ferreira</given-names>
</name>
<name>
<surname>Charles</surname>
<given-names>Lienhard</given-names>
</name>
<name>
<surname>Alexander</surname>
<given-names>Blanke</given-names>
</name>
</person-group>
<year iso-8601-date="2018">2018</year>
<data-title>Biological Switching Valve</data-title>
<source>Figshare</source>
<pub-id assigning-authority="figshare" pub-id-type="doi">10.6084/m9.figshare.6741857</pub-id>
</element-citation>
Thanks @FAtherden-eLife, it looks like DOI value will work for Figshare, although I might need to capitalise the F
- I will test it out.
Are you able to find any additional assigning-authority
values in datasets we might be able to specify to PubMed?
Tested assigning-authority="figshare"
and it works now. Before I think Figshare
, with capital F
worked, now lowercase is ok.
Thanks for the additional examples @Melissa37 in the Google sheet. From the start we can support these based on the ones you are potentially using now (when looking at the uri to choose the specific NCBI assigning authority:
NCBI:geo
NCBI:dbgap
NCBI:nucleotide
NCBI:sra
If you want to add additional NCBI:xxxx
values in the future, we'll need to expand the examples of uri to assigning authority mappings.
Code is merged into the elife-pubmed-xml-generation
project, and I will go through the steps to get it deployed for eLife.
Example deposited last week showing datasets on https://www.ncbi.nlm.nih.gov/pubmed/30735131
@Melissa37 do you think we completed this issue, or is there more to do before we close it?
That's great, thanks! We can close.
In future, it would be good to take anything listed in the Major datasets generated section:
and convert, example:
However, in this section I don't think we are storing the information in a way it can be parsed for this data, WDYT @gnott ? This links t the work ongoing regarding Data Availability Statements.
Plan I have as at Feb 1, 2019:
assigning-authority
value from datasets in the article XML as a new value in the datasets JSON output namedassigningAuthority
dataId
values from thepub-id
tag when appropriate (currently it only takes them fromobject-id
"art-access-id"
type tags)elife-article
Dataset
object, something likeassigning_authority
assigningAuthority
value to populateDataset
objectsassigning_authority
value when parsing from XML to eLife article objectselife-pubmed-xml-generation
, add<Object>
tags for the datasets, including either the DOI value or accession id / dataId value, as is specified in the original article XML<Object>
<Param Name="type">
value in the PubMed deposit for where the dataset is located