IQSS / dataverse

Open source research data repository software
http://dataverse.org
Other
878 stars 485 forks source link

Dataset Citation: provide flexible options for information displayed in citation metadata #2297

Closed sbarbosadataverse closed 1 month ago

sbarbosadataverse commented 9 years ago

Several dataverse users have requested more flexibility in what is displayed for dataset citations. Not so much changing information display order, but actually choosing what to display for their dataset citation @mcrosas please add additional thoughts on this and which milestone should this go in?

mercecrosas commented 9 years ago

We should bring back a data citation "widget" or tool that allows to configure some of the fields in data citation, in particular:

To be reviewed with @eaquigley

eaquigley commented 9 years ago

@mcrosas would this be something we could add to the "selecting metadata fields" portion of the general information page of a dataverse or should this be on the dataset level?

posixeleni commented 9 years ago

Related or Duplicate: https://github.com/IQSS/dataverse/issues/2146

mercecrosas commented 9 years ago

More interest in this from HMS: "dataset collection year would be much more useful."

posixeleni commented 8 years ago

Met with @eaquigley @sbarbosadataverse @scolapasta to plan out this feature.

An FRD will need to be created but at a high-level we will need to be more flexible with allowing users to select a different date in the citation other than the default publication year. This is especially important for historical datasets.

Admins will be able to set at their Dataverse-level how they want their data citations to display.

posixeleni commented 8 years ago

Just spoke with @scolapasta and @eaquigley that there may be a use case where someone would like to use different dates depending on the dataset in their dataverse, rather than just one kind of date across the board. For example in 3.6 we allowed people to use either distribution or production date for the citation so they would have two different kinds of dates in their citation within a single dataverse.

scolapasta commented 8 years ago

We should also make this consistent with facets and metadatablocks and have inheritance for this. So a checkbox to say is "citation customization root" or something like that.

If this is something stored on dvobject, then it could be inherited by datasets by default, but you could override for a specific dataset (if we encounter a use case like @posixeleni described).

kcondon commented 8 years ago

OK, backend changes are in but please note that the additional fields need to be ordered and currently are not or the order is unclear. This will need to be decided both in backend (order column) and UI.

mheppler commented 7 years ago

Related to #2146.

jggautier commented 7 years ago

There's recent discussion about this issue in this Google Groups thread.

parsr commented 6 years ago

Regarding:

Date: select whether is published date or distribution date or other dates

Having the publication year in the citation based upon when the data are released in this or that Dataverse is not very accurate re: citation imho.

Main use case that is problematic: many dataverses are comprised of or start with previously published datasets that are being added to Dataverse for best practices. These would be, for example, datasets that are being moved from a website listing with zero metadata etc. So when we move a number of datasets into into Dataverse for "best practices" they all get citations displayed as "current year" (but they were published / released on internet from 2012-2015!! not 2017).

Then you also have the sorting issue - we want the most recent ("newest") dataset on top but adding previous years' datasets messes with the the sort order (which should be by "newest" by true publication date not "newest" in terms of being added to dataverse).

Very annoying problem - when we're trying to release something "new" while at the same time add older datasets to a dataverse. Everything appears "new" but only one dataset is published in current year.

pdurbin commented 6 years ago

@parsr is there any workaround? Do you have to hack on the database or something? I hope not!

parsr commented 6 years ago

hi @pdurbin - none that I'm aware of (also checked with our ScholarsPortal support team to confirm and they're not aware of a workaround either at this time).

would be nice if there was one.

pdurbin commented 6 years ago

@donsizemore you were talking about "Odum can use the native API to fix existing datasets" at #3369 ... is this something completely different? You and @akio-sone were talking about dates at least.

jggautier commented 6 years ago

The workaround would be for changing the date in the dataset citation, right? And changing that date wouldn't change how "Newest" sorts datasets. @parsr, if it's okay, I'm going to copy your comments about sorting in this github issue about sorting (https://github.com/IQSS/dataverse/issues/3066).

parsr commented 6 years ago

@jggautier - yes, workaround for changing the date in citation. And good to share over with (#3066)

In the Metadata tab for one of our datasets, I can see "Publication Date: 2017-10-31" but in editor I can only see inputs editable for:

Distribution date Deposit date Production date

and none of them have "2017" in the date data

jggautier commented 6 years ago

@scolapasta pointed me to an API command in the Dataverse native API guide that can be used to change the citation date from the system-generated publication date (the date when the dataset was first published in a Dataverse installation) to another date metadata field, like distribution date, deposit date or production date:

Sets the dataset field type to be used as the citation date for the given dataset (if the dataset does not include the dataset field type, the default logic is used). The name of the dataset field type should be sent in the body of the reqeust. To revert to the default logic, use :publicationDate as the $datasetFieldTypeName. Note that the dataset field used has to be a date field:

PUT http://$SERVER/api/datasets/$id/citationdate?key=$apiKey

pdurbin commented 6 years ago

@jggautier wow I completely forgot about that but sure enough it shipped in Dataverse 4.3 in pull request #3000 for issue #2606.

jggautier commented 5 years ago

This issue has been about the dataset citation that's displayed on the dataset page, and now the file page. But when I change the date used in that displayed dataset citation, should the date be changed in the citation files (RIS, BibTeX and EndNote XML) and in the HTML metatags (which some reference managers can use to populate metadata for creating citations)?

mercecrosas commented 5 years ago

yes, it should

On Tue, Dec 18, 2018 at 11:14 AM Julian Gautier notifications@github.com wrote:

This issue has been about the dataset citation that's displayed on the dataset page, and now the file page. But when I change the date used in that displayed dataset citation, should the date be changed in the citation files (RIS, BibTeX and EndNote XML) and in the HTML metatags (which some reference managers can use to populate metadata for creating citations)?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_IQSS_dataverse_issues_2297-23issuecomment-2D448276685&d=DwMFaQ&c=WO-RGvefibhHBZq3fL85hQ&r=n9HCCtgqDPssu5vpqjbO3q4h2g6vMeTOp0Ez7NsdVFM&m=kqWVuDcUezUVEzB4GG4f_Rc0EEJoIInHnLobS_FrcGc&s=BjYlqqPlxYXCAcbhQePpQBoecBVQjhThLJdewN5NFjw&e=, or mute the thread https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_notifications_unsubscribe-2Dauth_AApQyGRBJ-5Fqmog7Cs-2DohbgjetN2hzxTlks5u6RR5gaJpZM4FPFVs&d=DwMFaQ&c=WO-RGvefibhHBZq3fL85hQ&r=n9HCCtgqDPssu5vpqjbO3q4h2g6vMeTOp0Ez7NsdVFM&m=kqWVuDcUezUVEzB4GG4f_Rc0EEJoIInHnLobS_FrcGc&s=pcdpbqVC4xuFO7FppPwqBIZxZI4AbwSy67dUGAo_NBY&e= .

-- Mercè Crosas, Ph.D., Chief Data Science and Technology Officer, IQSS, Harvard University mcrosas@g.harvard.edu | @mercecrosas https://twitter.com/mercecrosas | scholar.harvard.edu/mercecrosas

RightInTwo commented 4 years ago

I'm still wondering why it's not just the value of "Distributor" showing up in the citation, but always the name of the root dataverse...

pdurbin commented 4 years ago

always the name of the root dataverse

@RightInTwo please see #2146 and #5841 for more discussion on this.

RightInTwo commented 4 years ago

@pdurbin Yes, I did. I understand the discussion in #2146 to be about using the name of a sub-dataverse instead, but since the name would be manually set, provenance info would improve and be independent of where that dataset resides (and not get lost like that issue would suggest). The default value when adding a dataset could of couse still be the name of the root dataverse and would only differ if it was manually changed.

The solution for #5841 was to change the name of the root dataverse, which of course is not an option if there are various distributors to be represented, like Gesis Data Archive, Mendeley, Zenodo and the likes....

Sorry for bringing this back, but as I understood, we are not the only users that would like to use dataverse for datasets published primarily in other places.

pdurbin commented 4 years ago

we are not the only users that would like to use dataverse for datasets published primarily in other places

Right, an example of this is https://dataverse.harvard.edu/dataverse/HarvardSubscriptionData which is described at https://dataverse.org/blog/harvard%E2%80%99s-subscription-data-dataverse

RightInTwo commented 4 years ago

Thanks for the pointers! Sonia describes it pretty much like we want to use it. That is a good example. The ILO page on SAKERNAS 2015 (Indonesian Labour Force Survey) mentions the producer:

Producer(s): "Central Bureau of Statistics - Government of Indonesia"

If people use that data, I would expect them to cite the data using that producer and not "Harvard Dataverse" like on the Dataverse page on SAKERNAS 2015.

pdurbin commented 4 years ago

@RightInTwo ok, so to make this a little more concrete, you think the citation for the dataset at https://doi.org/10.7910/DVN/KTNOY8 should be changed from...

Before

Screen Shot 2019-12-05 at 10 28 52 AM

... to this...

After

Screen Shot 2019-12-05 at 10 33 36 AM

... to better indicate that the data came from the producer indicated at https://www.ilo.org/surveydata/index.php/catalog/1565/study-description (I'm not sure how you figured that part out but I trust you 😄 ).

RightInTwo commented 4 years ago

Well, don't trust me too much - I'm just trusting that ILO page :)

Another example where it is more clear: https://doi.org/10.17632/ym23rrm63f.1

On the landing page, Mendeley prompts me to cite the data with:

van Veldhuizen, Roel (2017), “Data and Analysis Files for "Clean up your own Mess"”, Mendeley Data, v1, http://dx.doi.org/10.17632/ym23rrm63f.1

While I think that it's not neccessary to replicate this exactly, as I would remove the "dx." and use https for the DOI link, I think the main info author/year/title/distributor/doi should be what we display to our users as well.

jggautier commented 4 years ago

Just for clarification, Mendeley Data is the repository, isn't it? Is the citation above an example of how citations should look when no producer name is provided, so the repository name is used instead?

RightInTwo commented 4 years ago

@jggautier Hey Julian, nice to see you! Well yes, Mendeley Data is the repository where the data actually resides and where the doi lookup points to. But in Dataverse, I didn't know a field "repository" exists in the metadata. Aren't we talking about "distributor"?

In any way, the root name can be used as the default, but if I explicitly provide a different producer/distributor/repository/which-ever-field-is-correct, I want the citation to reflect that.

Another example: The field $.publisher at https://api.datacite.org/dois/application/vnd.datacite.datacite+json/10.7802/1.2121 is what I would expect in the citation when I use it to populate the respective field in the dataverse metadata.

pdurbin commented 4 years ago

@RightInTwo here's a thought. What if you create a dataset in https://github.com/IQSS/dataverse-sample-data that illustrates which Dataverse metadata fields you'd use? You could create the dataset using https://demo.dataverse.org and then I could help you export the dataset as JSON and get it into that "sample data" repo. Actually, a good first step would probably be for you to create an issue at https://github.com/IQSS/dataverse-sample-data/issues to explain how the dataset comes from somewhere else, etc.

jggautier commented 4 years ago

Hey @RightInTwo. No there's no metadata field called "repository", as you've probably already confirmed :)

I think I was confused because I forgot that you'd like to index the metadata of datasets that will continue to live outside of dataverse (similiar to oai-pmh harvesting, but you can't use that as you've written elsewhere). So I agree that showing the root repository's name in the citation in the search results would be wrong when the data is actually in another repository.

I agree with @pdurbin about seeing which metadata fields you'd use.

RightInTwo commented 4 years ago

@pdurbin @jggautier It is just "publisher" (same in dublin core, datacite, native dvn) that would need to be accepted by dataverse on the ddi import (which is afaik still the only way to get existing dois into the system). That would actually be enough for our purpose, but it might make sense to also make the field editable in the gui and other apis (which might accept existing dois in the future?) for more diverse use cases.

pdurbin commented 4 years ago

ddi import (which is afaik still the only way to get existing dois into the system)

In addition to DDI, you can also get existing DOIs into Dataverse with JSON: http://guides.dataverse.org/en/4.18.1/api/native-api.html#import-a-dataset-into-a-dataverse

One can also get existing DOIs into Dataverse by harvesting them via OAI-PMH.

scolapasta commented 4 years ago

In terms of harvesting (i.e. allowing for search of datasets in other repositories; no dataset page available through Dataverse*), we had always talked about not generating citations (since it's really not our responsibility) and having the citation be one of the things we actually harvest. (currently we do generate a citation using the distributor as the publisher)

(*) which is how it should be when the data is actually published somewhere else

RightInTwo commented 4 years ago

@scolapasta

having the citation be one of the things we actually harvest

Well, being able to import the whole citation would be even better for our use case! Then we could just sync the whole citation in our own format.

But there is a drawback. In the Harvard Subscription Data Dataverse (and we are planning something similar), you'd not be able to set the correct publisher for those datasets, as the Harvard Dataverse is the authority for that metadata and no citation can be imported.

@pdurbin

In addition to DDI, you can also get existing DOIs into Dataverse with JSON

Perfect! Maybe that has always worked and I just missed the &release=yes in my code :bug: Why we can't simply use a OAI-PMH harvesting is discussed in #5402. Though, yes, I'm sorry for just ignoring the main way of metadata exchange between repositories :D

RightInTwo commented 4 years ago

@pdurbin @scolapasta @jggautier Thanks for the fruitful discussion! Can we maybe wrap it up in some way? I always hesitate to wake issues like this from the stale pile, because I know it takes a lot of effort from everyone involved to think about all the dependencies for such features so close to the core.

jggautier commented 4 years ago

It might be helpful to summarize the needs related to changing dataset citations discussed in this issue and related needs discussed in other issues. Please feel free to suggest edits or additions:

  1. As a researcher, I want others to cite my data in a way that acknowledges the producer that funded or otherwise supported the research (in addition to the Dataverse repository responsible for its preservation).
  2. As a researcher, I want others to cite my data in a way that acknowledges when the data was first published (as opposed to when it was first published in the Dataverse repository that it's published in now).
    • A citation's publication date is often interpreted as the date when the data was collected or when the research was done, and having the publication date be updated as it moves from one repository to another can be misleading.
  3. As a curator, I want others to cite data in a way that acknowledges who has responsibility for preserving the data files.
    • Outside of OAI-PMH harvesting, Dataverse effectively assumes that it is responsible for preserving the data files of any dataset metadata that it indexes, even when those files are preserved in another repository.

Discussed in other issues:

  1. As a researcher, I want others to be able to cite a particular version of my dataset regardless of where the data lives. https://github.com/IQSS/dataverse/issues/4570
    • If versions 1 and 2 of my dataset are published in one repository, then the dataset is moved to another repository, I need people to know that the latest version of the dataset published in the new repository is version 2. Right now, moving a dataset from one repository into a Dataverse repository effectively resets the version number displayed in the citation. This is also an issue with depositing software.
  2. As a researcher, I want the publication date in the suggested citation to reflect when the latest version (or latest major version) was published. https://github.com/IQSS/dataverse/issues/2298
    • Right now, the citation's publication date is the date when the dataset was first published in the Dataverse repository.
RightInTwo commented 4 years ago

I added a code example in #5402 to import metadata from Datacite through a python script that produces DDI-XML quick-and-dirty. When using this with &release=yes, I would like for Dataverse to just use existing fields (like <distrbtr>, <version>, <distDate> and the whole custom citation in <biblCit>) instead of populating them, which I think should just happen when Dataverse publishes data, not when it is imported as "released".

cmbz commented 1 month ago

To focus on the most important features and bugs, we are closing issues created before 2020 (version 5.0) that are not new feature requests with the label 'Type: Feature'.

If you created this issue and you feel the team should revisit this decision, please reopen the issue and leave a comment.