IQSS / dataverse

Open source research data repository software
http://dataverse.org
Other
876 stars 484 forks source link

Change Dataverse / Dublin Core mapping to improve OAI-PMH harvesting #8129

Closed philippconzett closed 3 days ago

philippconzett commented 2 years ago

Note: dc:rights is being handled in #5920 and https://github.com/IQSS/dataverse/issues/4176 but the original description of this issue has been preserved.

Based on a semi-systematic survey of how DataverseNO metadata is harvested in Bielefeld Academic Search Engine (BASE; https://www.base-search.net/Search/Advanced), a major search engine for research outputs, we have noticed some issues related to the way the Dataverse software provides Dublin Core metadata for OAI-PMH harvesting.

dc:type BASE harvests multiple types of research output, e.g. publications and datasets. Searching BASE you can filter/limit the search result to only include datasets by selecting Dataset in the Document Type section of advanced search: image

However, only very few metadata records harvested directly from DataverseNO are marked as Document Type = Dataset. It seems that in the oai_dc format, which BASE uses for harvesting, Document Type is based on the dc:type field. According to the Dataverse Metadata Crosswalk, dc:type corresponds to the Dataverse metadata field Kind of Data. But this field may contain very different values, e.g., “survey data”, “survey”, “observations” etc. Dublin Core (see https://www.dublincore.org/specifications/dublin-core/dcmi-terms/#http://purl.org/dc/terms/type) recommends “to use a controlled vocabulary such as the DCMI Type Vocabulary” for dc:type. The DCMI Type Vocabulary has “dataset” as one of its values. I therefore suggest changing the Dataverse / DC Element (oai_dc) mapping, so that dc:type is hard-coded as “dataset” for all dataset metadata in Dataverse.

dc:date The Dataverse metadata field Publication Date is available as dcterms:issued, but it doesn’t seem to be among the oai_dc fields Dataverse exposes for OAI-PMH harvesting. According to the Dataverse Metadata Crosswalk, dc:date corresponds to the Dataverse metadata field Deposit Date, but all the random samples I tested in BASE indicate that dc:date, which BASE uses as input for their metadata field Year of Publication, corresponds to the Dataverse field Date of Production. I suggest changing the Dataverse / DC Element (oai_dc) mapping, so that dc:date is mapped with Publication Date. This is also in line with citation recommendations. The publication date is the preferred date when citing research data; see, e.g., page 12 in The Tromsø Recommendations for Citation of Research Data in Linguistics; https://doi.org/10.15497/rda00040.

dc:rights For some of the sources included in BASE, there is an indication of the degree of Open Access. Among them are some Dataverse-based repositories. On the other side, for DataverseNO and other Dataverse-based repositories, this information is not available / unknown (“unbekannt”): image

The Open Access information in BASE is based on the Dublin Core field dc:rights. Dataverse does not provide the field dc:rights. A correct value in this field would enable BASE to indicate the degree of Open Access (see more information at https://www.base-search.net/about/en/faq_oai.php#dc-rights). For datasets without access restriction, the dc:rights field could look like this: info:eu-repo/semantics/openAccess (see more information at https://guidelines.openaire.eu/en/latest/data/field_rights.html#rightsuri-ma).

poikilotherm commented 2 years ago

Related:

pdurbin commented 1 year ago

I suggest changing the Dataverse / DC Element (oai_dc) mapping, so that dc:date is mapped with Publication Date.

I believe that @tcoupin fixed this in the following pull request, which we just merged and will be available in the next version of Dataverse (5.13 as of this writing):

By the way, thank you @philippconzett for the extensive write up! It's a lot to go through. Very thorough. 😄

jggautier commented 5 months ago

Related:

cmbz commented 4 months ago

2024/05/08

DS-INRA commented 1 month ago

Another related issue:

pdurbin commented 1 month ago

@philippconzett (and any others watching this issue), I create a pull request to address the points you made above:

Please take a look and feel free to leave comments or a review on the pull request. Thanks.

philippconzett commented 1 month ago

@pdurbin Thanks! I just left a comment on the PR.

pdurbin commented 3 days ago

This issue was just closed because we merged the following pull request:

As explained above, changes to dc:rights were not included in the scope of the pull request. Please look instead to these issues: