IQSS / dataverse

Open source research data repository software
http://dataverse.org
Other
878 stars 487 forks source link

As a researcher, I want to enter or upload the provenance for my file so that I can provide other researchers with additional information about the history of my data. #4343

Closed djbrooke closed 6 years ago

djbrooke commented 6 years ago

The mockups supporting this user story are available here:

https://drive.google.com/file/d/1AO_L0utJwjHEBxpj5b2xNiVkOjOG545D/view

This story covers the submission and validation of a JSON file, and the submission of manually entered provenance.

pdurbin commented 6 years ago

In cc09abf I stubbed out the validation that the prov file is JSON and can be parsed an a JSON object rather than a JSON array.

dlmurphy commented 6 years ago

In our usability testing of our Provenance mockups yesterday, we had one major piece of UI feedback that applied to this mockup:

prov2

When you upload a provenance file, the "Upload Provenance File" box is still displayed and active. This led the participant to think that it was possible to upload more than one provenance file. Because he is very familiar with provenance, he was startled to think that we'd messed this up and allowed multiple file upload.

An easy fix is to simply remove the file upload box once a provenance file has been uploaded, and only display it again if the user clicks the "X Remove" button.

pdurbin commented 6 years ago

As of 7948579 we are storing both types of provenance data:

To be clear, this is where the two types will go after you click "Save Changes". There's still staging work to do for the UI. Clicking "Cancel" means nothing that's been staged will be persisted at all.

dlmurphy commented 6 years ago

Usability testing yesterday also got us feedback on some of the instructional text for this popup:

prov1

Based on our participant's feedback, I recommend changing this message:

"Please add any information required to accurately document the history of this data file, including how it was created and how it was transformed."

TO

"In addition to a bundle file, or in place of one, you may add any information required to accurately document the history of your data file, including how it was created and how it was transformed."

I also recommend changing Description to Provenance Description.

dlmurphy commented 6 years ago

Lastly, @mheppler pointed out during usability testing that we need a "Cancel" button added to this popup modal: prov3

mheppler commented 6 years ago

What we are building

Include recommendations from testing as outlined above by @dlmurphy and ok'd by @taniaschlatter and @djbrooke and the following:

File Upload Pg

Provenance (Edit) Popup

Provenance (Confirmation) Popup

TaniaSchlatter commented 6 years ago

When we have a better sense of the graph, it may make more sense and streamline the flow to include the Preview in a modal.

mheppler commented 6 years ago

Mockup of the Edit dropdown button in the table header for File Upload pg, as referenced in checklist above.

This is intended to bring the functionality of the same Edit Files dropdown button on the dataset pg to the File Upload pg.

provenance-fileupload-editselected-v3

matthew-a-dunlap commented 6 years ago

Sorry, meant to delete a comment not close the story!

matthew-a-dunlap commented 6 years ago

Questions:

matthew-a-dunlap commented 6 years ago

One more:

mheppler commented 6 years ago

@matthew-a-dunlap

matthew-a-dunlap commented 6 years ago

High level To-Do:

Known bugs and issues:

mheppler commented 6 years ago

Cleaned up the layout of the new Edit button in the table header on File Upload pg.

NOTE: Discovered that the file upload does not work. Once you select a file from your computer, it does not display in the file table, and there are no errors logged by glassfish. I assume that @matthew-a-dunlap is aware of this, based on the lengthy to-do checklist in his comment above.

Also, merged develop branch and resolved conflicts in structure.css in order to get bug fixes from @sekmiller in the work on #4051.

screen shot 2017-12-20 at 1 23 46 pm

dlmurphy commented 6 years ago

I've finished the parts of the Provenance user guide that I can write from just the mockups. Once this reaches a point where I can demo the feature, I'll want to go back and make a few additions.

matthew-a-dunlap commented 6 years ago

Working on the upload functionality now and thinking about the flow. Here are some questions I have in the same vein:

TaniaSchlatter commented 6 years ago

Hi Mathew,

Comments inline below:

On Jan 9, 2018, at 7:41 PM, matthew-a-dunlap notifications@github.com wrote:

If a user has uploaded provenance data to a file the past and goes to look at provenance for the file, how should that be represented and how should the popup look?

If its an uploaded provenance file, should it be shown similar to #4343 (comment)? Does there need to be some difference to communicate that this file is already there?

It should look like the mock-up in the comment. It shows a provenance file (test-prov.json) has been uploaded and it there. The upload box should not show, however, as per Derek’s comment.

If a user goes to this popup and doesn't input any changes, should we still say "save changes" (which may confuse the user that they have made changes that they haven't)?

It should say “save changes” as shown in the mock-up in the comment. The user can cancel if they don’t put in any changes.

Thanks!

matthew-a-dunlap commented 6 years ago

@TaniaSchlatter The problem that I see is that there are two different states:

  1. A user has uploaded a provenance file but not saved it
  2. A user previously uploaded a provenance filed, saved it, and is now looking at the file through this screen

Is our intention to have the same visual state (as shown in the mockup) for both? Won't our users be confused if they say... upload the file, click save, reopen the popup to be sure and see the same UI as before they clicked save?

TaniaSchlatter commented 6 years ago

The same visual state works for both. We could say “save” on the button in both rather than “save changes,” but I don’t think it is necessary, and would not want to do that without checking for consistency of wording on save-type buttons in general.

On Jan 9, 2018, at 8:18 PM, matthew-a-dunlap notifications@github.com wrote:

@TaniaSchlatter The problem that I see is that there are two different states:

A user has uploaded a provenance file but not saved it A user previously uploaded a provenance filed, saved it, and is now looking at the file through this screen Is our intention to have the same visual state (as shown in the mockup) for both? Won't our users be confused if they upload the file, click save, reopen the popup to be sure and see the same UI as before?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or mute the thread.

matthew-a-dunlap commented 6 years ago

I have updated my checklist above with the state of this story. All the main functionality works and is wired up. The confirm popup is not committed but mostly done on my local branch. Hopefully will be able to wrap this up on Tuesday.

mheppler commented 6 years ago

@matthew-a-dunlap in your to-do above, you ask, What should happen if either form of provenance data fails to save?

I suggest using an error message from the bundle, like we do for Solr, that reads something like, Due to an internal error, your provenance information was not saved.

matthew-a-dunlap commented 6 years ago

My most recent commit was not tagged correctly to show up here, but was committed to the branch ( https://github.com/IQSS/dataverse/compare/4343-prov-upload-valid ). The confirm popup is now functional.

I have also updated the list above with a few more bugs/issues that need to be worked on. None of them look major at all.

mheppler commented 6 years ago

Cleaned up UI of Provenance workflow on File Upload pg. Checked off the following to-do items:

Was not able to replicate the following bug:

matthew-a-dunlap commented 6 years ago

This branch is ready for code review. I decided not to expose the provenance get commands via api at this point as I think there is still some flux in that space (though likely it would be just returning the json file each time). I have added a note to the relevant provenance story.

matthew-a-dunlap commented 6 years ago

@dlmurphy was looking this over for documentation and noticed an issue where the provenance data will not save correctly when editing is accessed from the file upload and create dataset pages. I had lost track of this need when programming, so I am pulling this back into development.

This also made me realize another issue: on the edit files page: it is confusing what all the "Save Changes" buttons are doing. In the provenance popup, should clicking "save changes" here save the provenance data to the database? screen shot 2018-01-19 at 5 44 46 pm

or should it just stage the changes until save changes is then presssed again here? screen shot 2018-01-19 at 5 44 31 pm

If we want the provenance data to not be committed until the Edit Files Page save changes button is clicked, it still feels to me to be a bit ambiguous.

matthew-a-dunlap commented 6 years ago

I discussed the above issue with @mheppler , the save should happen when the page (not the popup) is saved. We will be changing the popup "save changes" to "continue" as clarification.

Digging into this deeper, because of the misunderstanding of where we were saving, the code was designed to only stage the provenance metadata for one file at a time. I will have to rewrite the code to stage multiple files provenance data and ensure the popup is refreshed correctly when switching between files. My hopes of having this done today are highly unlikely.

matthew-a-dunlap commented 6 years ago

A second clarification from @mheppler :

"When you open the Provenance popup, that form should still have a Save Changes button. The button text that I suggested changing was on the confirmation popup that warns you about previewing before publishing. That confirmation popup button text should be changed Continue."

We also discussed confusion around all the metadata editing paths saying "Save Changes" even tho these changes are not actually saved until the page is saved. This will be reviewed at a later date (5.0?).

matthew-a-dunlap commented 6 years ago

I need to tweak how one of our commands is called and do code cleanup. Otherwise its good.

I deployed the current state to https://apitest.dataverse.org , which is close enough to be looked over @dlmurphy @mheppler @TaniaSchlatter

Note: The alert is also non functional, will fix 2moro

matthew-a-dunlap commented 6 years ago

Had a discussion with the design team this morning. They've identified the need to add the popup to the file page. This is not as simple as just wiring the popup up for another page, as the popup in this page needs to trigger saving within itself. Its not that much work in the end, but I bet it won't be done today.

I pointed out a design issue I saw, that the saving of provenance changes is inconsistent depending on the page. In the file page saving happens in the popup while on the other pages users can complete the popup and leave without their work being saved. I also pointed out that our approach of wiring up these interactions in many places introduces complexity to the codebase which overall can impact the ability to work with the code.

matthew-a-dunlap commented 6 years ago

This branch is complete besides one minor issue due to how we calculate DataFile equality. We check if DataFile.id is equal, but because DataFiles can have null as their id this check will return true for different DataFiles. I could work around it but I think it is best to talk to @scolapasta on Monday if we could expand on the equals method.

mheppler commented 6 years ago

Comment by @jacksonokuhn in #4381

On the upload page, can we refer to the provenance as "Documents" rather than "Bundles"? We're storing the data as bundles on the backend, but they're gonna be uploaded as standard PROV-JSON docs. The current wording may confuse people.

dlmurphy commented 6 years ago

After discussing with Jackson, we're going with "Provenance File" as our terminology. I've updated the UI to reflect this, and I'm currently updating the documentation.

matthew-a-dunlap commented 6 years ago

There are a few "outstanding issues" related to the popup work:

These may require pulling the popup code back into development, but I think the code in https://github.com/IQSS/dataverse/pull/4422 should be reviewed as-is for other issues first.

matthew-a-dunlap commented 6 years ago

Had discussion with @scolapasta & @ekraffmiller around equality and entity objects today. Tomorrow I plan to move forward on those.

matthew-a-dunlap commented 6 years ago

I have rewritten functionality for the provenance popup to better handle issues around testing equality of datafiles. Also, provenance dropdown options and popup will only show up if a provenance address has been set via rest call. I plan to ask around tomorrow about how best to check if a datafile in a previous version had provenance and tie that into render logic.

I have turned off rendering of dropdown elements for tags/restrict/unrestrict/delete that were added to the editDataFiles.xhml on the branch. The work to make these functional is non-trivial, especially due to the sprawling nature of the file pages. Even with restrict/unrestrict/delete, where there are already methods to do an action based on selected checkboxes (and we are looking to allow a user to directly act on one file) I still feel this work should be handled in its own story. We have discussed that for #3488 we may want to refactor the tags functions, and a similar discussion should be had for restrict/unrestrict/delete.

matthew-a-dunlap commented 6 years ago

This is the current look of the provenance popup after the provenance json has been sent to CPL and can no longer be edited. If anything, we should probably use different wording for the freeform string @dlmurphy @mheppler

screen shot 2018-02-08 at 1 04 47 pm

I was unsure whether any of the prov json options should be shown. I went with no because it seemed confusing to allow the user to preview the original prov file after there have been new additions. Let me know if I should take another approach.

dlmurphy commented 6 years ago

@mheppler and I have put together some screenshots depicting new messaging for the provenance popup, along with what it should look like in the use case @matthew-a-dunlap just mentioned. Please excuse any minor layout weirdness in these mockups, the only real difference here is the text. Also, the "User Guide" text should be a hyperlink to the Provenance section of the Dataset + File Management page of the user guide.

prov_popup_1

prov_popup_2

prov_popup_3

dlmurphy commented 6 years ago

Just collaborated with @TaniaSchlatter to rewrite the text in that yellow warning popup. Here's a mockup:

prov_popup_4

And here's the text for easy copy/paste:

Once you publish this dataset, your provenance file can not be edited or replaced.

Select "Cancel" to return the previous page, where you can preview your provenance file to confirm it is correct.

jacksonokuhn commented 6 years ago

I'm realizing that we need some more text on the upload screen :/ Basically we need to tell users that the title they give their datafile should be the same as the title it has in the prov they're uploading. The other option (which is way more of a pain in the butt) is to actually add another input box where they can give us the name of their datafile in the uploaded provenance. We need this in order to connect the prov getting uploaded to the prov in our system.

matthew-a-dunlap commented 6 years ago

@jacksonokuhn We can get the name of their datafile from the db and pass it in the rest call when we send the provenance to CPL. Is there some reason we need the user to specify it instead?

sekmiller commented 6 years ago

@jacksonokuhn - instead of relying on the name being the same, which could be problematic across datasets and installations, can we use the file DOI/PID?

jacksonokuhn commented 6 years ago

either way works actually, so long as we make it clear to the user

On Mon, Feb 12, 2018 at 11:50 AM, Stephen Kraffmiller < notifications@github.com> wrote:

@jacksonokuhn https://github.com/jacksonokuhn - instead of relying on the name being the same, which could be problematic across datasets and installations, can we use the file DOI/PID?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/IQSS/dataverse/issues/4343#issuecomment-364985805, or mute the thread https://github.com/notifications/unsubscribe-auth/ANf1lMBWRoCngMBwyM5r-XFDqOPmlkcpks5tUGvxgaJpZM4Q1aPb .

matthew-a-dunlap commented 6 years ago

This is being pulled back into development as we need the user to provide us extra information for use when sending their upload prov.json to CPL. Specifically, we are unable to tell in many cases which file mentioned in the provenance json relates to the DataFile we are sending provenance for. This can often be inferred by comparing file names but if names have been changed for some reason the user should still be able to work with CPL.

To solve this, when the user uploads their provenance json we will mine the filenames out of the json and use them to help the user tell us which file relates to the DataFile they are on. This interaction will happen under the file upload dialog in the main popup. At a later date we may also show the user a preview of their provenance graph to assist them, but this enhancement can be done during/after work to interface with the visualization library in #4346.

There are some outstanding questions on this work:

matthew-a-dunlap commented 6 years ago

ToDo list to tackle the json parsing, dropdown work, and code review from before this was pulled back into develop.

UI:

Behind:

matthew-a-dunlap commented 6 years ago

The current state of affairs (before my long weekend): No code has been started for this story but after discussing the need in our meeting yesterday I investigated the use of an autocomplete dropdown for the task. The code for the autocomplete in permissions was able to copy-pasted into the popup fragment cleanly and is a good starting point. The next step is understand the w3c provenance spec a tiny bit better and parse the json. The basic need is to parse all the sub-tags to the activity tag and present those as options using the name tag inside each sub-tag if it exists ("*:name" to be specific, for example "rdt:name"). I was also going to check to see if the DataFile Name matches the sub-tag name and present it first in the suggestion list.

With this information in hand, the last piece will be to store the needed name to be sent off at pubish. I'm not quite sure how we should save this, and I half think the best option is to store it as part of the prov.json filename just to keep them bundled together.

There is still an open question of what to do if two nodes share the same name. My idea for an approach was present them both as options, but maybe try to parse out some other identifying piece of info. Doing this in a way that is useful generically is likely challenging. Aside from this edge case I do not see this work as taking that much time, tho I have not had to do much json parsing in java (I have done xml parsing and remember dealing with those libraries to be more painful than I expected).

matthew-a-dunlap commented 6 years ago

@mheppler This is ready for you to look over. There is remaining work, but its either in the backend or API work. The one exception to this is the UI issue related to upload => remove => upload again. I'm pretty stumped on that one.

matthew-a-dunlap commented 6 years ago

screen shot 2018-02-27 at 12 42 24 pm @mheppler @dlmurphy Here is the error dialog that happens when there is an error during uploading and parsing of the provenance json. Let me know what you think of the wording. We will also need another error (or warning?) for when the provenance uploaded by the user does not have any entities (files) that can be parsed out for the user to select as connected to their DataFile.

matthew-a-dunlap commented 6 years ago

screen shot 2018-02-27 at 1 46 09 pm Here is a sample of the provenance autocomplete for selecting the entity @dlmurphy @jacksonokuhn

matthew-a-dunlap commented 6 years ago

All that looks to be left is fixing the IT tests and maybe writing another test or two for the parser code!

matthew-a-dunlap commented 6 years ago

Deployment notes:

Testing notes: