NCEAS / metacatui

MetacatUI: A client-side web interface for DataONE data repositories
https://nceas.github.io/metacatui
Apache License 2.0
42 stars 27 forks source link

Update prov relationships with package update #310

Open gothub opened 6 years ago

gothub commented 6 years ago

When a package that has existing prov relationships is updated, the prov relationships need to be updated as well. If a package member is being updated (and hence a new pid), then then any prov relationship that contains this pid in the subject or object of the relationship should be updated. This would be changed in DataPackage.serialize().

For an exhaustive (exhausting?) explanation of this, please see https://github.com/DataONEorg/rdataone/issues/189

mbjones commented 6 years ago

Based on https://github.com/DataONEorg/rdataone/issues/189, and I see why you say that relationships should move forward on object updates. However, its not clear to me that they should always be carried forward, because an object can be updated using different approaches. In the following two scenarios, I think the provenance relationships should be handled differently:

Scenario A: update object from new run of same script that originally created it

O1 <---derivedFrom--- O2 (using script S1 during execution E1)
O1 <---derivedFrom--- O3 (using script S1 during execution E2)
O2 <---obsoletes----- O3

Scenario B: update object by error correction script on original object

O1 <---derivedFrom--- O2 (using script S1 during execution E1)
O2 <---derivedFrom--- O3 (using script S2 during execution E3)
O2 <---obsoletes----- O3

So, even though O3 obsoletes O2 in both cases, the provenance relationships are different. There are many variants on how scenarios like these might differ. Even in the first Scenario A, one can't blindly copy all of the prov relationships from O2 to O3, because O3 is from a different execution even though its using the same script.

So @gothub -- can you outline just how for these two scenarios you would intend to 'update' the prov relationships? Thanks

gothub commented 6 years ago

I don't fully understand these scenarios, so maybe we can discuss these when you have a minute.

However, i'm not talking about updating a package via multiple runs of scripts, but rather updating packages via the metacatui editor. Say for example that i have 'Package A' that has already been saved to the MN:

Package A (resmap pid = 'A')

Now I open Package A in the editor and decide to replace the member with pid '222' with a newer version of the file, so the member pid would be updated to '333'. (Maybe the editor doesn't provide this functionality of replacing members now, or will not in the near future, in which case I should close this issue.)

I'll call this updated package 'A1'. The new package, after it has been saved to the MN would be:

Package A1 (resmap pid = 'A1', obsoletes resmap pid 'A')

laurenwalker commented 6 years ago

Right now we don't have functionality to either edit prov relationships or replace/edit data objects in the metadata editor, so this is definitely something for down the road. But it's still something to think about at that time.

When we do get the ability to replace/edit objects in a prov graph, I think it should be a fairly simple process because of Backbone.

When an object's bytes are updated, the Backbone model will largely stay the same except for the few changed attributes.

The resource map Backbone collection and model (MetacatUI.rootDataPackage and MetacatUI.rootDataPackage.packageModel) will re-serialize at time of save and should automatically save the updated data object and use it's new pid during serialization. This is already the workflow for the editor and shouldn't be a problem once we add in prov editing.

On Thu, Oct 19, 2017 at 6:56 PM, Peter Slaughter notifications@github.com wrote:

I don't fully understand these scenarios, so maybe we can discuss these when you have a minute.

However, i'm not talking about updating a package via multiple runs of scripts, but rather updating packages via the metacatui editor. Say for example that i have 'Package A' that has already been saved to the MN: Package A (resmap pid = 'A')

  • member with pid '111'
  • member with pid '222'
  • a prov relationship: 222 wasDerivedFrom 111

Now I open Package A in the editor and decide to replace the member with pid '222' with a newer version of the file, so the member pid would be updated to '333'. (Maybe the editor doesn't provide this functionality of replacing members now, or will not in the near future, in which case I should close this issue.)

I'll call this updated package 'A1'. The new package, after it has been saved to the MN would be: Package A1 (resmap pid = 'A1', obsoletes resmap pid 'A')

  • member with pid '111'
  • member with pid '333' (obsoletes pid '222')
  • the updated prov relationship: 333 wasDerivedFrom 111 (the editor would have to update this prov relationship with the updated pid)

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/NCEAS/metacatui/issues/310#issuecomment-338060490, or mute the thread https://github.com/notifications/unsubscribe-auth/AGVeFs_2rUfXUQ9ohgFnKeRtUFdWJprCks5st9OQgaJpZM4P_oL4 .

-- National Center for Ecological Analysis and Synthesis (NCEAS) University of California Santa Barbara (UCSB)

amoeba commented 6 years ago

Kind of a side note, couldn't that last bullet point be, instead,

is prov derivation transitive?

gothub commented 6 years ago

i think 'wasRevisionOf' is maybe more appropriate.

In the example, I'm talking about a scenario something like 111 is a csv file and 333 is a png plot created from it.

mbjones commented 6 years ago

Peter, your scenario is the same as my Scenario B as far as I can tell, and given that the editor is allowing arbitrary replacements of one object with another, I don't think we can say the relationship is transitive for sure. We can say that 333 prov:wasDerivedFrom 222 as @amoeba points out, because prov:wasRevisionOf is a sub-property of prov:wasDerivedFrom, but that does not imply that 333 was derived from 111. Because we know nothing about the process that generated 333, it might be completely unrelated to 111 as far as I can see.

Like @laurenwalker said, this is a non-issue in the editor at the moment, so we can probably just drop the issue until we come back to trying to add prov editing to the metacatui general editor.

gothub commented 6 years ago

@mbjones thanks for the explanation, the point you and @amoeba were making is clear to me now.

@laurenwalker what milestone should this been assigned to?

laurenwalker commented 6 years ago

Just keep it in the Backlog for now. We don't have a milestone assigned for moving the prov editor to the metadata editor yet.

mbjones commented 5 years ago

Today it arose that we are deleting prov relationships at times using some R tools when there are no changes to the objects in a package. While I still think Scenario A & B should not carry prov relations through a package, I think in the case when some objects have not changed in an updated package, we should indeed carry prov relations forward. For example:

Scenario C: update object that has some objects that are unchanged in the new package

O1 <---derivedFrom--- O3 (using script S1 during execution E2)
O4 added as new object

In this situation, all prov relationships between O1 and O3 should be included in the new version of the package.

amoeba commented 4 years ago

Just a note to drop in here for anyone thinking about this in the future: In #653 (replace feature in editor), we discussed this a bit and decided it was okay to, when updating a package, (1) keep PROV referencing resources in the package and (2) remove PROV not referenced by a resource in the package. MetacatUI currently filters PROV on update in the above manner.