IQSS / dataverse

Open source research data repository software
http://dataverse.org
Other
876 stars 484 forks source link

As a researcher, I want to view and export the provenance for a file so that I can know additional information about the file #4346

Closed djbrooke closed 5 years ago

djbrooke commented 6 years ago

Users will be able to view a provenance graph and export the provenance. The mockups support this are available here:

https://drive.google.com/file/d/1vMWm78rBdSelfuSzBtbIRqf9iAqbaxji/view

matthew-a-dunlap commented 6 years ago

Note: This story includes work to allow view/export via API. Commands already exist to get the provenance data and need to be wired up.

pdurbin commented 6 years ago

In retrospective this afternoon I said that it might be useful to see what the end game looks like for provenance, what value is being delivered within Dataverse for a researcher who is interested in the provenance of the file. The answer is that there is a JSON file the researcher can export and a visualization. Here's what it looks like according to the last page of the mockup above:

screen shot 2018-02-14 at 4 03 58 pm

I know we plan to use a specific Javascript library ( https://github.com/CamFlow/cytoscape.js-prov I think) for the visualization but it might be nice to see it loaded up with somewhat real provenance data so that people can get a better sense of what it might look like that in the screenshot above, which is pretty zoomed in (a dot with "process:rec-advanced" next to it). If there's a live version of the visualization we could play it, it would be much appreciated.

MKLau commented 6 years ago

Hi Phil, this might be useful as the Dataverse implementation is being worked up: http://camflow.org/demo

pdurbin commented 6 years ago

@MKLau thanks! It seems to choke a bit on https://github.com/MKLau/json_example/blob/master/example_json.txt (due to the size, perhaps) but this is exactly what I was looking for. It seems like a big part of the user experience is going to be the visualization.

MKLau commented 6 years ago

Glad that's useful to get a sense.

I would say that the visualization is generally not useful to the end user without significant condensation of the graph through node clustering.

As the provenance interest implementation moves forward, I think you'll get an idea of how you might want to do this.

pdurbin commented 6 years ago

@MKLau ok. Thanks. Do you have an example file I can upload to http://camflow.org/demo that would provide a better user experience? One with more condensation of the graph through node clustering or whatever it takes to make the visualization more useful? To me, the visualization is the moment when researchers consuming data will say, "I get it. This provenance feature is really useful."

MKLau commented 6 years ago

Hey Phil, I just pushed another JSON example to the json_example repo. This one is a lot simpler.

pdurbin commented 6 years ago

@MKLau thanks! I just tried simpleR.json in http://camflow.org/demo (screenshot below) and the user experience was much better. Firefox didn't invite me to kill the tab because it was so slow (as was the case with example_json.txt where I had to click the "Wait" button). And the graph makes sense to me:

screen shot 2018-02-15 at 10 40 32 am

MKLau commented 6 years ago

Nice work, Phil! Glad it helped.

djbrooke commented 5 years ago

Will close for now, will reopen if we decide to take on a similar approach with further provenance work.