IQSS / dataverse

Open source research data repository software
http://dataverse.org
Other
878 stars 490 forks source link

Edit Dataset: Investigate and correct multiple draft issue. #2132

Closed kcondon closed 8 years ago

kcondon commented 9 years ago

One user has two datasets that may have a published version mislabeled as draft in the version history.

kcondon commented 9 years ago

Steps to reproduce: In two separate browsers, open a published dataset. In one browser edit metadata. In the other browser, edit terms and add a guestbook. Save the metadata view, then save the guestbook view. See multiple drafts.

sekmiller commented 9 years ago

This has happened at least once in build. I haven't been able to recreate it locally. Kevin was unable to get it to happen on build. He thought it might be occurring in the presence of a long-running ingest, but even that wasn't enough to trigger a repeat.

sekmiller commented 9 years ago

Tried being logged in as different users - that didn't force multiple drafts. Also tried saving in one window while a ingest was occurring in the original window. - that didn't do it either. Previously tried editing same and differing sections (metadata, files, terms.) - that didn't yield multiple versions.

sekmiller commented 9 years ago

To add a unique constraint - this should prevent bad data from being added:

ALTER TABLE datasetversion ADD CONSTRAINT uq_datasetversion UNIQUE(dataset_id, versionnumber, minorversionnumber);

sekmiller commented 9 years ago

Added a unique constraint to datasetversion object so new installations will have the constraint added automatically.

eaquigley commented 9 years ago

Discovered this was happening in this dataset on beta:

https://beta.dataverse.org/dataset.xhtml?persistentId=doi%3A10.5072/FK2/XBGJGI&version=DRAFT

Also shows no user assigned to the changes in second draft showing

pdurbin commented 9 years ago

How is this possible if the unique constraint was added on beta?

sbarbosadataverse commented 8 years ago

https://dataverse.harvard.edu/dataset.xhtml?persistentId=hdl:1902.1/NGQCIPIDUK

Here is an example of the "drafts" issue. In my case, I was not able to save the draft, the blue spinner would not resolve, so I would hit save again, and apparently the drafts were being created in the back end with each "save." Once I select a draft, I can't publish it (no error message). Gustavo hasn't been able to figure out how to delete the drafts and go back to the original version.

pdurbin commented 8 years ago

Added a unique constraint to datasetversion object so new installations will have the constraint added automatically.

@sekmiller you added this constraint on the datasetversion object in 86e4236 but was the SQL update ever run in production? (More of a question for @kcondon ). When will this issue be put in QA? For now I'm going to put this in the 4.2.2. milestone so we can discuss it since @sbarbosadataverse mentioned this to me as a critical issue. From your comment at https://github.com/IQSS/dataverse/issues/2132#issuecomment-101393393 it sounds like you were having trouble reproducing this issue on your laptop but that was before you put the constraint in place. Now it should be impossible for to reproduce this bug locally (creating multiple drafts) since I have the constraint on my laptop (but I can try if you like):

screen shot 2015-11-10 at 9 19 33 am

sbarbosadataverse commented 8 years ago

So this will prevent the multiple draft from happening...but the bigger problem was figuring out how to delete the drafts once they have been created. Just an FYI

On Tue, Nov 10, 2015 at 9:23 AM, Philip Durbin notifications@github.com wrote:

Added a unique constraint to datasetversion object so new installations will have the constraint added automatically.

@sekmiller https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_sekmiller&d=CwMCaQ&c=WO-RGvefibhHBZq3fL85hQ&r=8R6PzVqt1PEocauQgZMGXsGz29-nb19M7eqlo1d8EVs&m=x9nHAFtmac3G33pPwAqh_ZvE-jfLE2ERYuhqBfcVo6s&s=_yG0Q0UH6sjlonkuZzAGZ_2CgvQtj8ww2EKA18JO7qs&e= you added this constraint on the datasetversion object in 86e4236 https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_IQSS_dataverse_commit_86e4236aa99fa02bbd3ea13cea874ee19b5a7d33&d=CwMCaQ&c=WO-RGvefibhHBZq3fL85hQ&r=8R6PzVqt1PEocauQgZMGXsGz29-nb19M7eqlo1d8EVs&m=x9nHAFtmac3G33pPwAqh_ZvE-jfLE2ERYuhqBfcVo6s&s=edF9bvs3sCAmj2T7U2kysmeaGWX2ZnPws7axk0BhAJ8&e= but was the SQL update ever run in production? (More of a question for @kcondon https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_kcondon&d=CwMCaQ&c=WO-RGvefibhHBZq3fL85hQ&r=8R6PzVqt1PEocauQgZMGXsGz29-nb19M7eqlo1d8EVs&m=x9nHAFtmac3G33pPwAqh_ZvE-jfLE2ERYuhqBfcVo6s&s=m65ql5Qar_xXWbn6D-zHy2tqAtjVw0GMuJF4nDQm5oc&e= ). When will this issue be put in QA? For now I'm going to put this in the 4.2.2. milestone so we can discuss it since @sbarbosadataverse https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_sbarbosadataverse&d=CwMCaQ&c=WO-RGvefibhHBZq3fL85hQ&r=8R6PzVqt1PEocauQgZMGXsGz29-nb19M7eqlo1d8EVs&m=x9nHAFtmac3G33pPwAqh_ZvE-jfLE2ERYuhqBfcVo6s&s=wjy2f0FVNKKt0ad6idfm2rpde--EEFvDlML3A8J5kB8&e= mentioned this to me as a critical issue. From your comment at #2132 (comment) https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_IQSS_dataverse_issues_2132-23issuecomment-2D101393393&d=CwMCaQ&c=WO-RGvefibhHBZq3fL85hQ&r=8R6PzVqt1PEocauQgZMGXsGz29-nb19M7eqlo1d8EVs&m=x9nHAFtmac3G33pPwAqh_ZvE-jfLE2ERYuhqBfcVo6s&s=Hp-3kIwcbPZtoxGN0jpNI239B3safypfY7s2SRc-77M&e= it sounds like you were having trouble reproducing this issue on your laptop but that was before you put the constraint in place . Now it should be impossible for to reproduce this bug locally (creating multiple drafts) since I have the constraint on my laptop (but I can try if you like):

[image: screen shot 2015-11-10 at 9 19 33 am] https://urldefense.proofpoint.com/v2/url?u=https-3A__cloud.githubusercontent.com_assets_21006_11064609_422b6b36-2D878c-2D11e5-2D9040-2D77b72ecbd400.png&d=CwMCaQ&c=WO-RGvefibhHBZq3fL85hQ&r=8R6PzVqt1PEocauQgZMGXsGz29-nb19M7eqlo1d8EVs&m=x9nHAFtmac3G33pPwAqh_ZvE-jfLE2ERYuhqBfcVo6s&s=8j-QnHcwA9sXjikECiU9UebSFIJkRMiyrGBC86d_R-g&e=

— Reply to this email directly or view it on GitHub https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_IQSS_dataverse_issues_2132-23issuecomment-2D155433679&d=CwMCaQ&c=WO-RGvefibhHBZq3fL85hQ&r=8R6PzVqt1PEocauQgZMGXsGz29-nb19M7eqlo1d8EVs&m=x9nHAFtmac3G33pPwAqh_ZvE-jfLE2ERYuhqBfcVo6s&s=8W36PyLP0EDKCPvVdNPONgBvCDpHHOs_B5Et3hPXcMA&e= .

Sonia Barbosa Manager of Data Curation, IQSS Dataverse Network Manager of the Murray Research Archive, IQSS Data Science Harvard University

Dataverse 4.0 is now available for use! http://dataverse.harvard.edu

All test dataverses should be created in 4.0 Demo! http://dataverse-demo.iq.harvard.edu/

Join our Dataverse Community! https://groups.google.com/forum/#!forum/dataverse-community

pdurbin commented 8 years ago

the bigger problem was figuring out how to delete the drafts once they have been created

Right, I believe @scolapasta is still working on that? It sounds like some time will be spent on Thursdays, generally, to work on production issues that require direct calls to the database.

I discussed how to prevent the multiple draft from happening with @sekmiller this morning. In short, we think that the constraint at 86e4236 isn't working. We need a constraint that does work and I'm stealing this issue to see what I can come up with but ideas are certainly welcome!

@sekmiller has been unable to reproduce the creation of multiple drafts on his laptop but I'm going to give it a shot. He suggested having a file file ingest. I see that @kcondon was able to reproduce the bug with modifications to the guestbook: https://github.com/IQSS/dataverse/issues/2132#issuecomment-99151062

pdurbin commented 8 years ago

Steps to reproduce: In two separate browsers, open a published dataset. In one browser edit metadata. In the other browser, edit terms and add a guestbook. Save the metadata view, then save the guestbook view. See multiple drafts.

On my first try I wasn't able to reproduce the bug (create multiple drafts) but I did see "The dataset terms could not be updated. Please contact support" in angry red and the following exception (this is when I clicked "Save Changes" when editing terms if that isn't clear):

Caused by: Exception [EclipseLink-4002](Eclipse Persistence Services - 2.5.2.v20140319-9ad6abd): org.eclipse.persistence.exceptions.DatabaseException Internal Exception: org.postgresql.util.PSQLException: ERROR: update or delete on table "datasetversion" violates foreign key constraint "fk_filemetadata_datasetversion_id" on table "filemetadata" Detail: Key (id)=(4) is still referenced from table "filemetadata". Error Code: 0 Call: DELETE FROM DATASETVERSION WHERE ((ID = ?) AND (VERSION = ?)) bind => [2 parameters bound] Query: DeleteObjectQuery([DatasetVersion id:4])

spruce_goose_-_spruce_dataverse_-_2015-11-10_13 14 45

Maybe this is a new issue? Or maybe there's already an issue for this possibly unrelated bug?

sbarbosadataverse commented 8 years ago

Do you want me to try to reproduce it? I can go into the datasets where this started for me, which is the same one not allowing me to set permissions for a user, and I can try one of those and see if I can make it happen again?

On Tue, Nov 10, 2015 at 1:17 PM, Philip Durbin notifications@github.com wrote:

Steps to reproduce: In two separate browsers, open a published dataset. In one browser edit metadata. In the other browser, edit terms and add a guestbook. Save the metadata view, then save the guestbook view. See multiple drafts.

On my first try I wasn't able to reproduce the bug (create multiple drafts) but I did see "The dataset terms could not be updated. Please contact support" in angry red and the following exception (this is when I clicked "Save Changes" when editing terms if that isn't clear):

Caused by: Exception EclipseLink-4002 http://Eclipse%20Persistence%20Services%20-%202.5.2.v20140319-9ad6abd: org.eclipse.persistence.exceptions.DatabaseException Internal Exception: org.postgresql.util.PSQLException: ERROR: update or delete on table "datasetversion" violates foreign key constraint "fk_filemetadata_datasetversion_id" on table "filemetadata" Detail: Key (id)=(4) is still referenced from table "filemetadata". Error Code: 0 Call: DELETE FROM DATASETVERSION WHERE ((ID = ?) AND (VERSION = ?)) bind => [2 parameters bound] Query: DeleteObjectQuery([DatasetVersion id:4])

[image: sprucegoose-_sprucedataverse-_2015-11-10_13 14 45] https://urldefense.proofpoint.com/v2/url?u=https-3A__cloud.githubusercontent.com_assets_21006_11071175_532f898c-2D87ad-2D11e5-2D8055-2D55f91e292617.png&d=CwMCaQ&c=WO-RGvefibhHBZq3fL85hQ&r=8R6PzVqt1PEocauQgZMGXsGz29-nb19M7eqlo1d8EVs&m=zATCdB91W_6CdBf7RUTbxBpYF41Gw-WYHqOpEO8rM7I&s=TiPgaGSFRh2M_BBzQ1V1KOJJzAaeAUtoa67heS2DoBc&e=

Maybe this is a new issue? Or maybe there's already an issue for this possibly unrelated bug?

— Reply to this email directly or view it on GitHub https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_IQSS_dataverse_issues_2132-23issuecomment-2D155520607&d=CwMCaQ&c=WO-RGvefibhHBZq3fL85hQ&r=8R6PzVqt1PEocauQgZMGXsGz29-nb19M7eqlo1d8EVs&m=zATCdB91W_6CdBf7RUTbxBpYF41Gw-WYHqOpEO8rM7I&s=L_r3arcSbFTIuKuPBQIagKTioQELwsim73QpavlZmvQ&e= .

Sonia Barbosa Manager of Data Curation, IQSS Dataverse Network Manager of the Murray Research Archive, IQSS Data Science Harvard University

Dataverse 4.0 is now available for use! http://dataverse.harvard.edu

All test dataverses should be created in 4.0 Demo! http://dataverse-demo.iq.harvard.edu/

Join our Dataverse Community! https://groups.google.com/forum/#!forum/dataverse-community

pdurbin commented 8 years ago

@sbarbosadataverse yes! Can you please try to reproduce it on http://pdurbin.pagekite.me ? That's my laptop.

pdurbin commented 8 years ago

I was telling @sbarbosadataverse that I'd try forcing errors to happen deep in the system on my laptop (i.e. throwing runtime exceptions at various places) in an effort to get multiple drafts to appear but I couldn't. I'm passing this to @scolapasta because from this morning's meeting he seemed to have some ideas about what the constraint should be at least, even if we're having trouble reproducing it. @sekmiller mentioned also the current constraint may not work because versionnumber can be a null value.

pdurbin commented 8 years ago

I tried a little more to reproduce this bug but couldn't. Basically, I'm trying to throw exceptions in odd places and see if I can make it so they can't be caught, like this:

diff --git a/src/main/java/edu/harvard/iq/dataverse/engine/command/impl/UpdateDatasetCommand.java b/src/main/java/edu/harvard/iq/dataverse/engine/command/impl/UpdateDatasetCommand.java
index e967035..39a5585 100644
--- a/src/main/java/edu/harvard/iq/dataverse/engine/command/impl/UpdateDatasetCommand.java
+++ b/src/main/java/edu/harvard/iq/dataverse/engine/command/impl/UpdateDatasetCommand.java
@@ -190,6 +190,10 @@ public class UpdateDatasetCommand extends AbstractCommand<Dataset> {

         Dataset savedDataset = ctxt.em().merge(theDataset);
         ctxt.em().flush();
+        if (true) {
+            throw new OutOfMemoryError("fake out of memory");
+//            throw new RuntimeException("bloops! https://github.com/IQSS/dataverse/issues/2132");
+        }

@sbarbosadataverse said multiple drafts are created when she clicks "Save Changes" over and over after an error. For me, new drafts aren't created. I still have only the 1.0 published version.

Oh, and by the way, in my testing I sometimes found the "block UI" logic added in #2566 prevented me from clicking "Save Changes" multiple times.

scolapasta commented 8 years ago

Before applying to production, we need to make sure there are no current datasets with multiple drafts (currently there are a few of these, so the multiple drafts need to be removed from the db).

pdurbin commented 8 years ago

@sekmiller as discussed with @scolapasta , I'm passing this issue to you. I owe you a longer comment but for now, the approach we'd like to try next is http://stackoverflow.com/questions/16236365/postgresql-conditionally-unique-constraint . In this thread you can even see an example of a system that has drafts (though they use a boolean to indicate draft status).

Here's a SQL query to show which datasets have multiple drafts: select dataset_id, count(*) from datasetversion where versionstate='DRAFT' group by dataset_id having count(*) >1;

As @scolapasta said above, we need to correct the production data before the code is deployed. I'm not actually sure we know how to correct the production data so any fix for this issue should probably go into a separate branch rather than 4.2.2 so we can decide when we want to deploy the fix to production and add the constraint (we'll need an official update script).

pdurbin commented 8 years ago

@sekmiller if you're interested in seeing this bug in the wild dvn-vm5 has an instance of a dataset (id 52587) with multiple drafts. Four of them. The persistent ID is hdl:1902.1/BHS-20102011

sekmiller commented 8 years ago

Here is the update for the database:

CREATE UNIQUE INDEX one_draft_version_per_dataset ON datasetversion
(dataset_id) WHERE versionstate='DRAFT';

It may not be applied while there is invalid data - that is if there are multiple draft versions for a given dataset.

pdurbin commented 8 years ago

@sekmiller I can confirm that the new constraint you wrote (9bc9ec0) works on my (somewhat stale) copy of production data:

murphy:dataverse pdurbin$ psql dataverse_db -f scripts/issues/2132/one-draft-version-per-dataset-constraint.sql 
psql:scripts/issues/2132/one-draft-version-per-dataset-constraint.sql:1: ERROR:  could not create unique index "one_draft_version_per_dataset"
DETAIL:  Key (dataset_id)=(52587) is duplicated.
pdurbin commented 8 years ago

@sekmiller sadly, it sounds like neither one of us can figure out how to express in JPA the new constraints we're adding for this issue and #2598 so we'll both be updating the reference data script and the migration script. As of 1ccac36 I'm done editing both of those files if you'd like to add your changes. (I already pushed a migration script from 4.2.1 to 4.2.2.)

At https://twitter.com/philipdurbin/status/667772604070146048 I'm asking how to make a feature request in JPA. :)

sekmiller commented 8 years ago

Assigning to Gustavo. I've added the Unique Index sql script to the 4.2.2 migration script and reference data (run at deployment) script. It cannot be run against a database where there are multiple draft versions for a given dataset.

pdurbin commented 8 years ago

@landreev it was just pointed out to me by @scolapasta that as part of draft cleanup of hdl:1902.1/00097-8 on dvn-vm5 we are seeing NullPointerException at for (DataVariable var : variables) { here...

... and variables gets set by List<DataVariable> variables = fileMetadata.getDataFile().getDataTable().getDataVariables();

pdurbin commented 8 years ago

In this morning's standup @scolapasta mentioned the constraint was added to production. I'm not sure why this isn't in the 4.2.2 milestone though so I'll move it there.