Support versions before a dataset is published

mercecrosas commented 10 years ago

We want to be able to support multiple versions of a pre-published dataset.

Questions:

what happens to the pre-published versions when you publish the dataset

posixeleni commented 10 years ago

I just assume that anything prepublished would be version 0, 0.1, 0.2, etc and once you publish it bumps up to version 1?

sbarbosadataverse commented 10 years ago

Thinking we can get rid of the prepublication versioning, or give user the option to keep them although I don't see the benefit of the latter at this point. The prepublication versioning was to allow them to make their first draft "complete" before publishing---I wouldn't even consider it versioning as it relates to publishing -- prepublication is more managing/cleaning/organizing the data for the purpose of publishing

On Mon, Oct 27, 2014 at 2:30 PM, Eleni Castro notifications@github.com wrote:

I just assume that anything prepublished would be version 0, 0.1, 0.2, etc and once you publish it bumps up to version 1?

— Reply to this email directly or view it on GitHub https://github.com/IQSS/dataverse/issues/1002#issuecomment-60645307.

mercecrosas commented 10 years ago

This is 4.1+. We are discussing this with Jon (ODUM) and Thomas (ICSPR).

On Mon, Oct 27, 2014 at 2:37 PM, sbarbosadataverse <notifications@github.com

wrote:

Thinking we can get rid of the prepublication versioning, or give user the option to keep them although I don't see the benefit of the latter at this point. The prepublication versioning was to allow them to make their first draft "complete" before publishing---I wouldn't even consider it versioning as it relates to publishing -- prepublication is more managing/cleaning/organizing the data for the purpose of publishing

On Mon, Oct 27, 2014 at 2:30 PM, Eleni Castro notifications@github.com wrote:

I just assume that anything prepublished would be version 0, 0.1, 0.2, etc and once you publish it bumps up to version 1?

Reply to this email directly or view it on GitHub https://github.com/IQSS/dataverse/issues/1002#issuecomment-60645307.

Reply to this email directly or view it on GitHub https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_IQSS_dataverse_issues_1002-23issuecomment-2D60646461&d=AAMFaQ&c=WO-RGvefibhHBZq3fL85hQ&r=MoES6dokjPLLcKaEAd7qaCuTcYZ4jLjEOBQnbbJ9BaA&m=IeiWzZpmJ7v5LBG8mkB61KVnNGOOxlYSoQ0wZw4voIk&s=hjYJYyy-lIxMF6cIqCvbx9PDuCfyup3aeR4JOMd6f8Y&e= .

pdurbin commented 7 years ago

@mercecrosas is this still something you're interested in?

RightInTwo commented 5 years ago

I would love to see this in regards to #5764. The use case would be a) triggering an update of the CRIS on new versions of unpublished data and b) tracking changes when passing on the data between different people in the curation process.

pdurbin commented 5 years ago

b) tracking changes when passing on the data between different people in the curation process

There is some amount of tracking when a curator returns a dataset to an author. Via API a curator can include reasonForReturn as explained at http://guides.dataverse.org/en/4.15.1/api/native-api.html#return-a-dataset-to-author and note is saved in the workflowcomment database table: http://phoenix.dataverse.org/schemaspy/latest/tables/workflowcomment.html . This was implemented in #3943 and there's a lot of conversation about it at 'Help Wanted: stories related to "Submit for Review" and "Return to Author"' thread at https://groups.google.com/d/msg/dataverse-community/bGlCU2pbQpE/h3d9enX9AAAJ . There is demand for a GUI for the curator to be able to write this note, reflected in the following issue: Return to Author: When returning datasets that were submitted for review, I want to add a feedback note. #3702

qqmyers commented 5 years ago

FWIW: In QDR we've had some discussion on this - still thinking through the overall curation strategy, but if curators work in Dataverse to do things like replace files with archival versions, etc., having pre-publication versions would be a good way to track their efforts.

RightInTwo commented 5 years ago

And also, we will definitely have data that we cannot publish but still want to curate and archive. This would be a nice way to facilitate that.

janetm commented 2 years ago

hi All, I would like to re-engage with this issue. I searched the community group for reference to dataset versioning PRE publication. The Australian Data Archive (ADA) uses a Dataverse installation dedicated for DEPOSITs only - so we do not want to publish data received as a SIP, but we would like to set a point in the dataset when it is accepted as a 'complete' SIP. Data at the deposit stage has not been curated and should not be published. We also have a TEST and PRODUCTION Dataverse installation as part of our archival workflow. Our use case closely matches QDR's functionality requirements described by qqmyers above: "if curators work in Dataverse to do things like replace files with archival versions, etc., having pre-publication versions would be a good way to track their efforts.".

For ADA, it would be very useful to be able to 'set' the unpublished deposit dataset as Version 1 for example, when the deposit is accepted as a SIP and is moved to archival storage - without the requirement to 'publish'. The main reason is to enable revised data files, or new waves/releases of a dataset to be deposited into the same DEPOSIT dataset to be recognised by Dataverse versioning for our automated workflow.

Does anyone else have a similar need for pre-publication versioning?

pdurbin commented 2 years ago

@janetm I don't know if this helps at all but from the software development perspective, I'm reminded of tags in Git. GitHub has conflated tags and releases a bit but with plain Git you can use tags to indicate the completion of some work, a milestone, or a significant event. I could make a tag called "ready-for-ui-review" or something. Git tags always have a date associated with them (as well as a commit, so you know its state exactly). Anyway, I can understand the need for this.

I guess my take is that maybe every edit within a draft is not so significant but at some point you might want to say, "Ok, I've fiddled with this dataset enough. I'd like the flag it now." Hmm, that makes me wonder. Have you played with the Curation Status Labels added in 5.7? I don't have a screenshot of the final product but here's one from pull request #7967 that gives the idea of how they look:

126700233-639d56b0-214f-4172-8a29-92f2020bec45

qqmyers commented 2 years ago

FWIW: The fact that those curation status labels can be read/set via API might help an external app keep track of pre-publication state. I.e. such an app could record the state of the metadata/files when the curation state changes to be able to show changes. (A nice addition would be to add an event trigger for such an app - currently changes to the curation state alerts curators/admins via notification/email but we don't yet have notification for apps beyond polling.)

In contrast, managing pre/no-publication versions within Dataverse could be more complex than it first appears. The basic idea of adding versions isn't so complex but I'm not sure if all the rules for versioning should apply, i.e. do prepublication versions include restriction and embargo settings that then become fixed (e.g. non-admins can't change the embargo for previously 'released' files which currently means files in non-draft versions). Should metadata exports be created? Do citations show any date? Is DataCite updated. What version numbers are allowed (0.1, 0.2, ...?), Do workflows run? Etc. I think these questions could probably be answered but making all of this functionality work with real versions in Dataverse makes this route a bigger dev task than it might seem.

stevenmce commented 2 years ago

Hi folks, it would be good to draw a distinction here between the process of curation and the preservation of versions. The tags would allow knowing where we are at - but the "tagged releases"/versions would allow preservation of the content at that point.

For ADA, we are interested as well in being able to distinguish SIPs, AIPs and DIPs - this is where the tagged versions would be of benefit.

cmbz commented 2 months ago

To focus on the most important features and bugs, we are closing issues created before 2020 (version 5.0) that are not new feature requests with the label 'Type: Feature'.

If you created this issue and you feel the team should revisit this decision, please reopen the issue and leave a comment.

IQSS / dataverse

Support versions before a dataset is published #1002