Closed sbarbosadataverse closed 1 month ago
We should bring back a data citation "widget" or tool that allows to configure some of the fields in data citation, in particular:
To be reviewed with @eaquigley
@mcrosas would this be something we could add to the "selecting metadata fields" portion of the general information page of a dataverse or should this be on the dataset level?
Related or Duplicate: https://github.com/IQSS/dataverse/issues/2146
More interest in this from HMS: "dataset collection year would be much more useful."
Met with @eaquigley @sbarbosadataverse @scolapasta to plan out this feature.
An FRD will need to be created but at a high-level we will need to be more flexible with allowing users to select a different date in the citation other than the default publication year. This is especially important for historical datasets.
Admins will be able to set at their Dataverse-level how they want their data citations to display.
Just spoke with @scolapasta and @eaquigley that there may be a use case where someone would like to use different dates depending on the dataset in their dataverse, rather than just one kind of date across the board. For example in 3.6 we allowed people to use either distribution or production date for the citation so they would have two different kinds of dates in their citation within a single dataverse.
We should also make this consistent with facets and metadatablocks and have inheritance for this. So a checkbox to say is "citation customization root" or something like that.
If this is something stored on dvobject, then it could be inherited by datasets by default, but you could override for a specific dataset (if we encounter a use case like @posixeleni described).
OK, backend changes are in but please note that the additional fields need to be ordered and currently are not or the order is unclear. This will need to be decided both in backend (order column) and UI.
Related to #2146.
There's recent discussion about this issue in this Google Groups thread.
Regarding:
Date: select whether is published date or distribution date or other dates
Having the publication year in the citation based upon when the data are released in this or that Dataverse is not very accurate re: citation imho.
Main use case that is problematic: many dataverses are comprised of or start with previously published datasets that are being added to Dataverse for best practices. These would be, for example, datasets that are being moved from a website listing with zero metadata etc. So when we move a number of datasets into into Dataverse for "best practices" they all get citations displayed as "current year" (but they were published / released on internet from 2012-2015!! not 2017).
Then you also have the sorting issue - we want the most recent ("newest") dataset on top but adding previous years' datasets messes with the the sort order (which should be by "newest" by true publication date not "newest" in terms of being added to dataverse).
Very annoying problem - when we're trying to release something "new" while at the same time add older datasets to a dataverse. Everything appears "new" but only one dataset is published in current year.
@parsr is there any workaround? Do you have to hack on the database or something? I hope not!
hi @pdurbin - none that I'm aware of (also checked with our ScholarsPortal support team to confirm and they're not aware of a workaround either at this time).
would be nice if there was one.
@donsizemore you were talking about "Odum can use the native API to fix existing datasets" at #3369 ... is this something completely different? You and @akio-sone were talking about dates at least.
The workaround would be for changing the date in the dataset citation, right? And changing that date wouldn't change how "Newest" sorts datasets. @parsr, if it's okay, I'm going to copy your comments about sorting in this github issue about sorting (https://github.com/IQSS/dataverse/issues/3066).
@jggautier - yes, workaround for changing the date in citation. And good to share over with (#3066)
In the Metadata tab for one of our datasets, I can see "Publication Date: 2017-10-31" but in editor I can only see inputs editable for:
Distribution date Deposit date Production date
and none of them have "2017" in the date data
@scolapasta pointed me to an API command in the Dataverse native API guide that can be used to change the citation date from the system-generated publication date (the date when the dataset was first published in a Dataverse installation) to another date metadata field, like distribution date, deposit date or production date:
Sets the dataset field type to be used as the citation date for the given dataset (if the dataset does not include the dataset field type, the default logic is used). The name of the dataset field type should be sent in the body of the reqeust. To revert to the default logic, use :publicationDate as the $datasetFieldTypeName. Note that the dataset field used has to be a date field:
PUT http://$SERVER/api/datasets/$id/citationdate?key=$apiKey
@jggautier wow I completely forgot about that but sure enough it shipped in Dataverse 4.3 in pull request #3000 for issue #2606.
This issue has been about the dataset citation that's displayed on the dataset page, and now the file page. But when I change the date used in that displayed dataset citation, should the date be changed in the citation files (RIS, BibTeX and EndNote XML) and in the HTML metatags (which some reference managers can use to populate metadata for creating citations)?
yes, it should
On Tue, Dec 18, 2018 at 11:14 AM Julian Gautier notifications@github.com wrote:
This issue has been about the dataset citation that's displayed on the dataset page, and now the file page. But when I change the date used in that displayed dataset citation, should the date be changed in the citation files (RIS, BibTeX and EndNote XML) and in the HTML metatags (which some reference managers can use to populate metadata for creating citations)?
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_IQSS_dataverse_issues_2297-23issuecomment-2D448276685&d=DwMFaQ&c=WO-RGvefibhHBZq3fL85hQ&r=n9HCCtgqDPssu5vpqjbO3q4h2g6vMeTOp0Ez7NsdVFM&m=kqWVuDcUezUVEzB4GG4f_Rc0EEJoIInHnLobS_FrcGc&s=BjYlqqPlxYXCAcbhQePpQBoecBVQjhThLJdewN5NFjw&e=, or mute the thread https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_notifications_unsubscribe-2Dauth_AApQyGRBJ-5Fqmog7Cs-2DohbgjetN2hzxTlks5u6RR5gaJpZM4FPFVs&d=DwMFaQ&c=WO-RGvefibhHBZq3fL85hQ&r=n9HCCtgqDPssu5vpqjbO3q4h2g6vMeTOp0Ez7NsdVFM&m=kqWVuDcUezUVEzB4GG4f_Rc0EEJoIInHnLobS_FrcGc&s=pcdpbqVC4xuFO7FppPwqBIZxZI4AbwSy67dUGAo_NBY&e= .
-- Mercè Crosas, Ph.D., Chief Data Science and Technology Officer, IQSS, Harvard University mcrosas@g.harvard.edu | @mercecrosas https://twitter.com/mercecrosas | scholar.harvard.edu/mercecrosas
I'm still wondering why it's not just the value of "Distributor" showing up in the citation, but always the name of the root dataverse...
always the name of the root dataverse
@RightInTwo please see #2146 and #5841 for more discussion on this.
@pdurbin Yes, I did. I understand the discussion in #2146 to be about using the name of a sub-dataverse instead, but since the name would be manually set, provenance info would improve and be independent of where that dataset resides (and not get lost like that issue would suggest). The default value when adding a dataset could of couse still be the name of the root dataverse and would only differ if it was manually changed.
The solution for #5841 was to change the name of the root dataverse, which of course is not an option if there are various distributors to be represented, like Gesis Data Archive, Mendeley, Zenodo and the likes....
Sorry for bringing this back, but as I understood, we are not the only users that would like to use dataverse for datasets published primarily in other places.
we are not the only users that would like to use dataverse for datasets published primarily in other places
Right, an example of this is https://dataverse.harvard.edu/dataverse/HarvardSubscriptionData which is described at https://dataverse.org/blog/harvard%E2%80%99s-subscription-data-dataverse
Thanks for the pointers! Sonia describes it pretty much like we want to use it. That is a good example. The ILO page on SAKERNAS 2015 (Indonesian Labour Force Survey) mentions the producer:
Producer(s): "Central Bureau of Statistics - Government of Indonesia"
If people use that data, I would expect them to cite the data using that producer and not "Harvard Dataverse" like on the Dataverse page on SAKERNAS 2015.
@RightInTwo ok, so to make this a little more concrete, you think the citation for the dataset at https://doi.org/10.7910/DVN/KTNOY8 should be changed from...
... to this...
... to better indicate that the data came from the producer indicated at https://www.ilo.org/surveydata/index.php/catalog/1565/study-description (I'm not sure how you figured that part out but I trust you 😄 ).
Well, don't trust me too much - I'm just trusting that ILO page :)
Another example where it is more clear: https://doi.org/10.17632/ym23rrm63f.1
On the landing page, Mendeley prompts me to cite the data with:
van Veldhuizen, Roel (2017), “Data and Analysis Files for "Clean up your own Mess"”, Mendeley Data, v1, http://dx.doi.org/10.17632/ym23rrm63f.1
While I think that it's not neccessary to replicate this exactly, as I would remove the "dx." and use https for the DOI link, I think the main info author/year/title/distributor/doi should be what we display to our users as well.
Just for clarification, Mendeley Data is the repository, isn't it? Is the citation above an example of how citations should look when no producer name is provided, so the repository name is used instead?
@jggautier Hey Julian, nice to see you! Well yes, Mendeley Data is the repository where the data actually resides and where the doi lookup points to. But in Dataverse, I didn't know a field "repository" exists in the metadata. Aren't we talking about "distributor"?
In any way, the root name can be used as the default, but if I explicitly provide a different producer/distributor/repository/which-ever-field-is-correct, I want the citation to reflect that.
Another example: The field $.publisher at https://api.datacite.org/dois/application/vnd.datacite.datacite+json/10.7802/1.2121 is what I would expect in the citation when I use it to populate the respective field in the dataverse metadata.
@RightInTwo here's a thought. What if you create a dataset in https://github.com/IQSS/dataverse-sample-data that illustrates which Dataverse metadata fields you'd use? You could create the dataset using https://demo.dataverse.org and then I could help you export the dataset as JSON and get it into that "sample data" repo. Actually, a good first step would probably be for you to create an issue at https://github.com/IQSS/dataverse-sample-data/issues to explain how the dataset comes from somewhere else, etc.
Hey @RightInTwo. No there's no metadata field called "repository", as you've probably already confirmed :)
I think I was confused because I forgot that you'd like to index the metadata of datasets that will continue to live outside of dataverse (similiar to oai-pmh harvesting, but you can't use that as you've written elsewhere). So I agree that showing the root repository's name in the citation in the search results would be wrong when the data is actually in another repository.
I agree with @pdurbin about seeing which metadata fields you'd use.
@pdurbin @jggautier It is just "publisher" (same in dublin core, datacite, native dvn) that would need to be accepted by dataverse on the ddi import (which is afaik still the only way to get existing dois into the system). That would actually be enough for our purpose, but it might make sense to also make the field editable in the gui and other apis (which might accept existing dois in the future?) for more diverse use cases.
ddi import (which is afaik still the only way to get existing dois into the system)
In addition to DDI, you can also get existing DOIs into Dataverse with JSON: http://guides.dataverse.org/en/4.18.1/api/native-api.html#import-a-dataset-into-a-dataverse
One can also get existing DOIs into Dataverse by harvesting them via OAI-PMH.
In terms of harvesting (i.e. allowing for search of datasets in other repositories; no dataset page available through Dataverse*), we had always talked about not generating citations (since it's really not our responsibility) and having the citation be one of the things we actually harvest. (currently we do generate a citation using the distributor as the publisher)
(*) which is how it should be when the data is actually published somewhere else
@scolapasta
having the citation be one of the things we actually harvest
Well, being able to import the whole citation would be even better for our use case! Then we could just sync the whole citation in our own format.
But there is a drawback. In the Harvard Subscription Data Dataverse (and we are planning something similar), you'd not be able to set the correct publisher for those datasets, as the Harvard Dataverse is the authority for that metadata and no citation can be imported.
@pdurbin
In addition to DDI, you can also get existing DOIs into Dataverse with JSON
Perfect! Maybe that has always worked and I just missed the &release=yes in my code :bug: Why we can't simply use a OAI-PMH harvesting is discussed in #5402. Though, yes, I'm sorry for just ignoring the main way of metadata exchange between repositories :D
@pdurbin @scolapasta @jggautier Thanks for the fruitful discussion! Can we maybe wrap it up in some way? I always hesitate to wake issues like this from the stale pile, because I know it takes a lot of effort from everyone involved to think about all the dependencies for such features so close to the core.
It might be helpful to summarize the needs related to changing dataset citations discussed in this issue and related needs discussed in other issues. Please feel free to suggest edits or additions:
Discussed in other issues:
I added a code example in #5402 to import metadata from Datacite through a python script that produces DDI-XML quick-and-dirty. When using this with &release=yes, I would like for Dataverse to just use existing fields (like <distrbtr>, <version>, <distDate>
and the whole custom citation in <biblCit>
) instead of populating them, which I think should just happen when Dataverse publishes data, not when it is imported as "released".
To focus on the most important features and bugs, we are closing issues created before 2020 (version 5.0) that are not new feature requests with the label 'Type: Feature'.
If you created this issue and you feel the team should revisit this decision, please reopen the issue and leave a comment.
Several dataverse users have requested more flexibility in what is displayed for dataset citations. Not so much changing information display order, but actually choosing what to display for their dataset citation @mcrosas please add additional thoughts on this and which milestone should this go in?