IQSS / dataverse

Open source research data repository software
http://dataverse.org
Other
878 stars 489 forks source link

Allow depositors/curators to import JATS xml to more easily add related publication metadata #5815

Closed jggautier closed 1 month ago

jggautier commented 5 years ago

In a discussion about how to use Dataverse's related publication fields, MIT Press's Editorial & Production Manager recommended that Dataverse let users upload an XML file that already contains structured metadata about datasets' related publications, and then Dataverse would use this structured metadata to fill metadata fields. Lots of journals like MIT Press's use a publishing system that exports metadata in a schema called JATS (Journal Archiving and Interchange Tag Set) for describing "the content and metadata of journal articles" in XML. (JATS informed Dataverse's journal metadata block. https://github.com/IQSS/dataverse/issues/1166)

What might work for us, and publishers like us, is a way for us to upload XML metadata. This would ensure that all the metadata regarding the related article is correct.

The intent is to improve how quickly depositors and curators can enter metadata about related journal articles. Because articles are often published months after datasets are published, dataset depositors can't include information about the article (like the title, author, publisher, DOI) because it isn't available when the dataset is published. This leads to no or little dataset metadata about related publications when the dataset is published and a lot of future curation work to enter that metadata when it becomes available, when the articles are published.

Instead of entering related publication metadata dataset by dataset, months after dataset publication, a journal dataverse administrator could upload a JATS XML file that contains metadata about multiple articles and their related datasets, Dataverse would use the dataset DOIs in the XML to figure out which articles are related to which datasets (see example XML at https://jats4r.org/data-citations), and fill in the related publication metadata fields for multiple datasets in their dataverse.

The intent is also to improve the quality of the related publication metadata. If Dataverse can use JATS, users won't have to enter metadata into the related publication fields, which can be confusing (https://github.com/IQSS/dataverse/issues/5277).

JATS also contains metadata about the journal itself (e.g. Which issue and volume was the related article published in?), which Dataverse asks for in its journal metadata block. So Dataverse could also import that metadata for multiple datasets.

cmbz commented 1 month ago

To focus on the most important features and bugs, we are closing issues created before 2020 (version 5.0) that are not new feature requests with the label 'Type: Feature'.

If you created this issue and you feel the team should revisit this decision, please reopen the issue and leave a comment.