IQSS / dataverse

Open source research data repository software
http://dataverse.org
Other
882 stars 494 forks source link

Adding DataCite or other PIDs at the dataverse collection level #5930

Closed CCMumma closed 2 months ago

CCMumma commented 5 years ago

One of our TDR institutional liaisons has a researcher asking if she can assign a DataCite DOI to a dataverse. The dataverse has several datasets within it with their own DOIs, but she would prefer to have a citation and DOI at the dataverse level collectively.

For this particular user, I've assigned a DOI externally out of the institution's own DataCite account, but adding this issue here for future consideration.

Link to discussion on list: https://groups.google.com/d/msgid/dataverse-community/CAKx6RmtXE-OajMU0y%2BWPkv0_%3DsPhO30pNgyLy8xtV539w%3DOHcQ%40mail.gmail.com

philippconzett commented 4 years ago

We also have had a similar request from a research center at one of our partner institutions. As for now, we also plan to do this by assigning a DOI through our DataCite account. I'd like Dataverse and the Dataverse community to consider the possibility to be able to create a DOI for a dataverse through the Dataverse software, and also discuss how a dataverse reference would look like; cf. this discussion in the Dataverse Google Group.

mankoff commented 4 years ago

+1 for this feature request.

In addition to automatically (or optionally?) issuing DOIs for dataverses, it would also be nice to have the citation box available when viewing the dataverse.

This issue raises some questions, such as - would the dataverse version # change anytime data below it is changed? Or only if a new data set is added? Or only if the dataverse metadata is changed? Etc.

pdurbin commented 4 years ago

@mankoff these are good questions. Dataverses currently don't have versions but it was brought up for consideration at #6112.

poikilotherm commented 2 years ago

We just had a request for this from one of our institutes:

Wir möchten gerne die Daten zu einem Paper im dataverse veröffentlichen. Das sind Messungen von mehreren Proben am ILL an einem Instrument, und Messungen an 2 Laborgeräten bei uns.

Wenn man das mit unserem online-Laborbuch iffsamples anlegt, würde jede Messung einzeln angelegt werden. Jede einzelne Messung könnte man als Datensatz automatisch in ein dataverse exportieren. Dann hätten wir aber für dieses Paper etwa 6 Datensätze. Kann man die in einer Collection zusammenfassen und die Collection mit doi veröffentlichen und zitieren? Ist eine Collection dann das selbe wie ein (Unter-)Dataverse? Oder wäre es sinnvoller, einen einzelnen Datensatz zu erzeugen, der dann alle Daten enthält? Bei uns sind das nicht so große Datensätze (~MByte), das wäre schon auch möglich.

Translated & slightly adapted: We would like to publish the data for a paper in [our institutes] Dataverse [collection]. These are measurements of multiple samples at ILL on one instrument, and measurements on 2 lab instruments at our site.

If you create this with our online lab book iffsamples [using SampleDB integrated with automated exports to Jülich DATA], each measurement would be created individually. Each individual measurement could be exported as a dataset automatically into a [D]ataverse [collection].

But then we would have about 6 datasets for this paper. Can we combine them into a [Dataverse] collection and publish and cite the collection with [a DOI]? Is [this] collection then the same as a (sub)dataverse [collection]? Or would it make more sense to create a single dataset that then contains all the data? In our case, the datasets are not that large (~MByte), so that would also be possible.

philippconzett commented 2 years ago

Meanwhile, we have created two collection DOIs in DataverseNO through DataCite Fabrica: https://doi.org/10.18710/AJ4S-X394 https://doi.org/10.18710/HTM6-F146

In the Description field of the Dataverse collection, the depositors/collection managers have added a citation they want others to use when they refer to the collection. Here's an example:

To refer to the whole collection, please use the following information: Oksavik, Kjellmar, 2020. The University of Bergen Global Navigation Satellite System Data Collection. DataverseNO. https://doi.org/10.18710/AJ4S-X394.

We also have got a request from the National Library of Norway, who would like to harvest through OAI-PMH metadata about our TROLLing repository to be included in the resource catalog of The Norwegian Language Bank (https://www.nb.no/sprakbanken/en/resource-catalogue/).

Another use case I can think of is haversting repository (and maybe also collection) metadata by re3data (https://www.re3data.org/), FAIRsharing (https://fairsharing.org/), and other metadata registries dealing with information about research data (repositories).

pdurbin commented 2 years ago

@philippconzett I like the initiative! If a feature doesn't exist, create a manual workaround. 😄

You seem to be following the diagram at https://dataverse.org/best-practices/data-citation (and below) with the exception that instead of "Dataset Title" you have "Collection Title" (and there's no version).

generic-citation

philippconzett commented 2 years ago

To be honest, the version part, I completely forgot... :speak_no_evil: I guess before the collection citation and versioning feature is implemented in the Dataverse software, we should have look at what thoughts other groups have about this. I just skimmed through the RDA groups and found at least three that could be relevant:

I think the collection citation/versioning issue is also related to the dataset versioning issue; cf. DOIs for Dataset versions #4499.

poikilotherm commented 4 months ago

This might be giving relevant arguments to implement this: https://doi.org/10.5334/dsj-2021-012

Principle 3: Identification of Data Collections (Granularity)

A collection of data may be the result of successively generated datasets. The full set of aggregated data (data collection) can be seen as ‘works of works’, and may be organised in a number of sub-collections to be served by a data repository or archive. The collection of works must be identified and versioned, and so shall be its constituent datasets or individual works.

philippconzett commented 4 months ago

Thanks for sharing, @poikilotherm! It seems there can be quite different types of collections, from homogeneous ones like a time series consisting of new datasets being added continuously, to heterogeneous ones like institutional collections within a national repository such as DataverseNO. I wonder whether the versioning aspect is more important for collections of the type time series than for collections of the type institutional collections?

cmbz commented 2 months ago

To focus on the most important features and bugs, we are closing issues created before 2020 (version 5.0) that are not new feature requests with the label 'Type: Feature'.

If you created this issue and you feel the team should revisit this decision, please reopen the issue and leave a comment.