Islandora / documentation

Contains islandora's documentation and main issue queue.
MIT License
103 stars 72 forks source link

Meta-Issue: Versioning of resources in CLAW #740

Open bseeger opened 6 years ago

bseeger commented 6 years ago

How will versioning be done in CLAW? This issue is a reminder to consider how versioning will be integrated at the Drupal layer and how that will integrate with fedora's memento versioning model.

As discussed in a recent tech meeting, the idea might be to follow what Drupal does. By default it looks like Drupal does history based revisioning - meaning any change is revisioned. I found this by looking at: 'Structure' -> 'Content Types' -> pick a content type and 'edit' -> look at 'Published Options' for 'Create new revision'. This was checked by default in my system.

Some notes/considerations from the tech meetings:

Versioning in CLAW Because we have a CMS sitting on top of Fedora. Do we want to version on each change?

My understanding in Drupal is you can do auto-versioning and also manually slice a version.

At the end of the day, what ever you do in Drupal we would carry over to Fedora for each version.

We haven't explored that yet with the JCR way of doing it. We are going to try and take whatever Drupal does out of the box.

Fedora will not be doing auto-versioning, but do we want a Fedora version for every Drupal version?

Ideas are welcome.

Out-of-the-box Drupal does manually sliced versions.

DiegoPino commented 6 years ago

@bseeger good questions! some thoughts

  1. has the way versioning in Fedora (5) changed in the last year? what about LDP constraints and tree snapshots? Is there a way we can work on making some little exercise that compares both ways of versioning and finding the common denominator where both strategies can live side by side?
  2. is MEMENTO and its headers and it's implementation/way of querying something that we want to expose on the Drupal side?
  3. What happens with binary files and revisions in Drupal? Files are entities by themselves but attached to other ones via fields. Does a content revision apply revision (and makes a copy) to an attached binary? How does that happen on the media entities abstraction level?
  4. How do we expose revision restoration? does that action on drupal side trigger an actual change on fedora?

I have so many questions really, we pushed the versioning/revision thing a lot into the future, but I feel the reasons were mostly correct: at the moment we felt fedora's 4 versioning system could change and was not completely consistent because of the LDP restricted/confined dependencies on tree snapshotting v/s previous (Fedora 3) and Drupal's per resource ones

whikloj commented 6 years ago

There will not be tree snapshots in Fedora anymore. Each resource will be versioned independently.

But I have a more interesting question.

If we have versions in Drupal and we could restore a Drupal version, save it and that would PUT it into Fedora overwriting the object.

Do we need versioning turned on in Fedora?

In Fedora we are thinking that a restore will probably be a manual process of the GET of a Memento and a PUT to the live resource anyways.

DiegoPino commented 6 years ago

@whikloj, cool, but how is the LDP tree integrity maintained if only per resource snapshot happens on Fedora? ok, I guess out of scope here... but maybe not completely, means we could /under that reasoning, do the same in Drupal, avoid maintaining Linked data integrity, if Drupal allows us (because Drupal actually thinks it is linked entity integrity, at DB level). That interesting question that you propose is indeed interesting! but scares me (Halloween times 🎃 ). Means I should trust keeping my institution's DO history to Drupal instead of Fedora? does that not invalidate its existence in the stack? How does that play with memento? Would we need to build that functionality core into Drupal instead of mangling URL from fedora generated headers and responses?

Indeed an interesting question

whikloj commented 6 years ago

So for the LDP tree we are going to remove referential integrity from Mementos, so if you:

  1. PUT a resource /foo
  2. PUT a resource /bar
  3. PATCH with /foo example:hasRelated /bar
  4. Create a Memento of /foo
  5. Delete /bar

That memento of /foo will still say /foo example:hasRelated /bar even though /bar no longer exists.

The actual LDP tree has integrity, but the past versions are just pictures of a resource in time. It will be up to you to determine how much of the tree you want to preserve, by traversing it and taking snapshots if you so choose.

DiegoPino commented 6 years ago

@whikloj great explanation! that makes sense, may I make you another question? So having snapshots as non-validated pictures of moments in time is very very good. So what happens then when you want to restore a certain point? and the integrity of the old one (its existence refers to currently non-existing resources, web ACL or even namespaces, who knows?) conflicts with the current's LDP state, is that where a GET and PUT (manual, probably fixing and editing) is needed? so in other words, not automatic restoring possible right?

whikloj commented 6 years ago

@DiegoPino right

In the case of my example if you wanted to restore the Memento of /foo, currently Fedora would fail due to referential integrity.

[For any listeners that don't know, referential integrity checks that if I try to PUT/POST/PATCH an object /foo with information that creates a relationship to an object /bar like /foo example:hasRelated /bar AND /bar is in the domain of the repository (ie. it is not a URL to another server) then /bar MUST exist or the operation (PUT/POST/PATCH) will fail]

So you would need to edit the resource between GET and PUT.

There is a larger question of whether Fedora should drop referential integrity altogether, but that is for a different time.

DiegoPino commented 6 years ago

@whikloj nice. So that piece of the workflow is clear to me know. So we need to make comparisons to drupal's one. Need to research more on that piece. Thanks for walking me(us) through

dannylamb commented 6 years ago

@whikloj @bseeger @DiegoPino I'm inclined to leverage Drupal's core versioning features as much as possible. I'd prefer saving and restoring versions be done through Drupal's UI and database. That work is already done for us and we'd never have to talk to Fedora directly.

If people want to push versions from Drupal to Fedora for posterity, we can certainly automate that for folks. That feels like a valid use case and in the spirit of digital preservation. But there are times where users wouldn't want or need it, so we'll need to make sure the feature can be turned off. In particular I'm thinking about instances with Fedoras that are configured to auto-version.

ajs6f commented 6 years ago

+1. The less people have to be aware of Fedora the more CLAW shows its advantages.

dannylamb commented 5 years ago

Dusting this off and turning it into a meta-issue for our roadmap.

We will need to develop a gameplan to move forward on this one. If we stick to using Drupal's core (and possibly contributed?) functionality to manage versions, then we need to have Camel listeners responding to CRUD events on Drupal versions and create Mementos in Fedora accordingly.

elizoller commented 5 years ago

I am very interested in this issue and have taken the following approach:

  1. modify chullo FedoraApi and IFedoraApi to have a createVersion method which calls the fedora API endpoint for creating a memento version (see https://github.com/asulibraries/chullo/commit/60905be0527737c06326d06c6f78204581ba95ce )
  2. modify the MillinerService in the saveNode method that when the item already exists, it creates a version before the node is updated in fedora. (see https://github.com/Islandora-CLAW/Crayfish/compare/master...asulibraries:versioning)

I'm not sure if this is

  1. the right approach to even take - maybe it should be an "action" in drupal that can be triggered via context (or manually via drupal UI). if the action is the way to go, i'm not sure where to put the endpoint that would connect to chullo - would that still be part of milliner? like another endpoint in milliner?
  2. i'm seeing repeated calls to milliner that after one node update (always 6). so its creating 6 versions within a few milliseconds of each other. i tried testing on a clean install with just a collection node (no members, no media) - but i'm wondering i need to wait for some PRs to get merged and try again with this part.

Although it is convenient that drupal creates "revisions" of nodes, I'm not sure it's actually helpful for passing anything to Fedora. https://www.drupal.org/docs/8/modules/jsonapi/revisions

whikloj commented 5 years ago

@elizoller I think the addition of a createVersion and getVersions to Chullo will be needed and so if you want to submit a PR to chullo to add that it would be great.

I would suggest that I think you have too much of the Fedora logic up in the milliner level and a couple assumptions we should avoid.

I would pass the resource's URI to chullo (ie. http://localhost:8080/fcrepo/rest/my_first_resource) then use the internal functions to a) do a HEAD request and locate the timemap LINK header and then b) do a POST at that timemap.

This way:

  1. We keep chullo as the library that knows how to talk to Fedora (instead of moving the logic up)
  2. We stick to using the Fedora API. Because in future the /fcr:versions could change, but if we find the timemap header then we are following the Memento specification and that shouldn't change.
dannylamb commented 5 years ago

Sound advice @whikloj. I'll definitely second sticking to advertised headers in a HEAD request. Don't let the fact that you're making an extra request deter you @elizoller. It's a bullet proof way to deal with the issue.

dannylamb commented 5 years ago

@elizoller Also, might be retries if you're seeing six fails in a row? Check the karaf logs too and see what islandora-indexing-fcrepo is up to.

elizoller commented 5 years ago

Thanks for the advice @whikloj and @dannylamb I made some more changes, see: Crayfish: https://github.com/Islandora-CLAW/Crayfish/compare/master...asulibraries:versioning chullo: https://github.com/Islandora-CLAW/chullo/compare/master...asulibraries:versioning It is working in that it creates one new version in fedora when a corresponding object in drupal is updated. I added a getVersions method to chullo and added a little helper function to get the timemap uri. i can push up PRs if you all feel like its good enough for more formal review.

whikloj commented 5 years ago

So this could create a lot of versions in Fedora and as Mementos are not deltas but an immutable copy of the resource at the time of versioning, this could introduce some bloat.

I'm also thinking specifically of binaries and how the flysystem connector deals with this?

But we will need to work that out anyways as part of this meta-issue.

Functionally I think you need to provide the desired BODY as part of your POST request if you provide a specific timestamp (here)

We will probably need a way to enable/disable versioning so it can be turned off for those that don't want it yet.

But otherwise I think you could open PRs (even draft PRs if you'd like) and we can work out the issues there. If you're comfortable with that.

dannylamb commented 5 years ago

@elizoller If you want, I'd start with the Chullo PR and we'll go from there. There's a lot of angles that need to be explored w/r/t when versions are sliced, but the utility functions to do the actual slicing can come in now np.

elizoller commented 5 years ago

I understand the concern about bloat since memento creates full copies. I am curious to see how Fedora v6 will implement OCFL because from what I see, OCFL stores versions as deltas not full copies and so I wonder how that will change the way Fedora thinks about versioning. From what I have heard, Fedora 6 will have the option to be OCFL compliant but won't necessarily require it.

This is only nodes so it wouldn't effect binaries and this not flysystem either, right?

Just looking at the docs again for when a timestamp is provided (https://wiki.duraspace.org/display/FEDORA5x/RESTful+HTTP+API+-+Versioning#RESTfulHTTPAPI-Versioning-BluePOSTCreateanewversionedresource(anewLDPRm)) and I see what you're saying. Example one is just create a version right now and requires no parameters. Example two is create a version at a specific time and here is the time and the BODY to do so. I will work to adjust the chullo API to handle both of those examples more completely.

I agree, especially due to bloat mentioned above, that versioning should be a feature that can be enabled/disabled through Drupal. I am thinking that to do so, it would need to be a separate "action" in Drupal from the index in Fedora action, so that it could be connected to a setting that would control the action being included in the context. Otherwise, MillinerServer doesn't have any clue about drupal settings, right?

@dannylamb I'll put in a PR to chullo when I finish fleshing out the second example in the fedora API docs.

dannylamb commented 5 years ago

@elizoller You got it. If we push all this into context, it becomes infinitely configurable. Everything milliner needs to know should get pushed into the message that goes on to the queue and away it goes. Milliner will need a special route for versions (or respect certain headers maybe), but it shouldn't be too big of a deal.

mjordan commented 5 years ago

Upvote for Context here. That would allow very finegrained control over when a new version is created.

whikloj commented 5 years ago

@elizoller regarding Fedora 6 and OCFL... if your institution has any desires or requirements I'd recommend you (and maybe @tallgood) should make your voices heard. https://wiki.duraspace.org/display/FF/2019-02+Fedora+Design+Summary

I am also 👍 for context and suggest that passing the createVersion=true as part of the emitted message would be nice and easy to do from Drupal.

elizoller commented 5 years ago

@whikloj and/or @dannylamb

I modified islandora module like so: https://github.com/Islandora-CLAW/islandora/compare/8.x-1.x...asulibraries:versioning which basically adds an action for creating a version, and if the event["type"] is "Version" then it adds "createVersion" as true (otherwise sets as false) on the $event["object"] which gets emitted. Not really sure if this works since I don't know how to develop in/debug Alpaca. Any advice on that?

after https://github.com/Islandora-CLAW/Alpaca/blob/d2a9d71582a0b8745d7a02aae2658d1c5c60bdf0/islandora-indexing-fcrepo/src/main/java/ca/islandora/alpaca/indexing/fcrepo/FcrepoIndexer.java#L131 i would need to add a line like `.setProperty("createVersion").simple("${exchangeProperty.event.object.createVersion}")

in Milliner, i could see this going two ways. one would be to modify the saveNode route and corresponding method to look at the createVersion parameter coming from alpaca (assuming that just setting the property above will pass it through to milliner). then saveNode might look something like this:

       public function saveNode(
        $uuid,
        $jsonld_url,
        $create_version,
        $token = null
    ) {
        $urls = $this->gemini->getUrls($uuid, $token);

        if (empty($urls)) {
            return $this->createNode(
                $uuid,
                rtrim($jsonld_url, '?_format=jsonld'),
                $jsonld_url,
                $token
            );
        } else if($create_version){
            try {
                $version = $this->createVersion(
                    $urls['fedora'],
                    $token
                );
            } catch (Exception $e) {
                $this->log->error('Caught exception: ',  $e->getMessage(), "\n");
            }
        } else {
            return $this->updateNode(
                $urls['drupal'],
                $jsonld_url,
                $urls['fedora'],
                $token
            );
        }
    }

otherwise, if you think it should be a separate route in milliner, something like $app->post('/node/{uuid}/version', "milliner.controller:createVersion"); and map to a corresponding method to do the work. (can't really test this because i'm not sure how to modify alpaca and make sure its working...)

let me know if i'm on the right track (or not) here. also still need to put up a PR for chullo that fully implements options for the fedora versioning API

whikloj commented 5 years ago

@elizoller nice work 👏 . We do need to add some logging to Alpaca to help with people working in the stack.

Two things I can think of are:

  1. You'd need to add the same property for media.
  2. In Fedora you can version a binary separately from its metadata (as the metadata probably will change more often). But we should allow for both of these to occur.

@dannylamb might have more thoughts on this. But at some point I would start opening PRs...once a change get really big it can be daunting to test and get it. Don't feel it has to be complete, changes can come later.

dannylamb commented 5 years ago

@elizoller Throw up what you've got into some PRs and we'll go from there. You're tackling an issue that cuts all the way through the stack (good on ya!) so it'll take some time to massage all the moving parts until they start working together. It's a bit of a trial by fire, but by the time you're done, you'll be able to do pretty much anything in Islandora 8. So throw up those PRs and we'll jump in to help you out. You're not doing this alone.

And don't be too scared of Camel. If you can grok Drupal's migrate framework, you'll be just fine with Camel. They're actually very very similar in the end, it's just, y'know... Java.

elizoller commented 4 years ago

I checked this set of PRs again and I'm fairly certain that the call to update Fedora is fired off before the call to create the new version (see https://github.com/Islandora/Alpaca/pull/61/files#diff-a0a4e90c5cf5024700844b61bbe9e12eR139)

elizoller commented 4 years ago

is it possible this issue can be closed?

elizoller commented 4 years ago

In regards to versioning media objects, I am working on that (once I can build Alpaca again... :P ). I think files will probably have to have a discussion?

elizoller commented 4 years ago

See above PRs for versioning on media. They aren't perfect but I think it's a pretty good start.

I was just messing around with versioning files and I ran into an interesting thing where Drupal says you can't overwrite an existing file. @seth-shaw-unlv I think you mentioned something about what are we going to do about files in a past tech call but I didn't see the issue until I actually went to try to implement it in the FedoraAdapter for Flysystem. So just for fun, I tried installing this module https://www.drupal.org/project/file_replace and it creates a page like /admin/content/files/replace/{{fid}} and allows you to replace the file. Then I did a little work in the adapter here: https://github.com/Islandora/islandora/compare/8.x-1.x...asulibraries:version_files

And it actually worked 😱 (versions available from fedora fcr:versions endpoint) I'm not sure how people would feel about this approach and including an external module for this.

seth-shaw-unlv commented 4 years ago

Contrib modules FTW. Keep the technical debt low!

dannylamb commented 3 years ago

@DonRichards brought up file versioning yesterday in conversation so I hunted down this issue. There are some good PRs that have sat for a bit. We should test these and bring them in as soon as possible.

elizoller commented 3 years ago

Versioning files PR is up at https://github.com/Islandora/islandora/pull/793 Playbook PR is here to add file_replace module: https://github.com/Islandora-Devops/islandora-playbook/pull/184 I don't think the page is that easy to find - because it seems to require knowing the file id which i never do...so i suspect we'll need to add a quick link somewhere else (perhaps in the media edit page)...

Also not sure why the Fedora tests are failing in https://travis-ci.org/github/Islandora/islandora/jobs/718940804 It seems to complain about the response to the fedora api being NULL - is chullo up to date in travis?

mjordan commented 3 years ago

@elizoller 's https://github.com/Islandora/islandora/pull/793 has been merged. Have we thought about a UI in Drupal for viewing/restoring/deleting file versions? Open a new issue?

manez commented 3 years ago

I vote new issue. This one is a bit long in the tooth, and it would help to focus discussion forward.