fcrepo-exts / fcrepo-import-export

Apache License 2.0
15 stars 19 forks source link

FCREPO-2459 Import of versions #109

Closed bbpennel closed 5 years ago

bbpennel commented 7 years ago

Implements import of versions into fcrepo4.

Note, this PR is not quite done, but I wanted to get it created to make sure people are okay with the direction. Some of the remaining things to address:

https://jira.duraspace.org/browse/FCREPO-2459

dbernstein commented 7 years ago

I did a test with a very simple repo and it appears to work. One container off root that contains one binary with two versions. I tested importing back into the same root url as well as mapping to a new URI.

awoods commented 7 years ago

@bbpennel : would you mind updating this branch against master?

bbpennel commented 6 years ago

@awoods There is a failing test in this PR that does look like a bug, but I am going to hold off on trying to fix it pending whether we want to support import of versions in fcrepo 4.x or skip directly to fcrepo 5.

The approach in this PR, of importing all resources in the dataset chronologically, is not necessarily bad for the importer (for instance, it should simplify reference creation order), but was selected due to not being able to import historic mementos in 4.x directly. For 5.x, we would likely want to remove mementos from the main chronological import order and ingest them all at the end to make full use of the API. In which case we would only want this PR if we wanted 4.x to be able to import versions and to have a similar structure for 5.x imports (and any other improvements the approach might offer outside of the versioning use case).

This is a rather large refactor, so while I feel a twinge of physical pain to throw away all that work, it may not be the best course.

dbernstein commented 5 years ago

@bbpennel : I ran some manual tests on the PR and all seems to work as expected.

dbernstein commented 5 years ago

@bbpennel : with the exception of the broken unit test of course. But it appears to work from my manual tests.

bbpennel commented 5 years ago

@dbernstein @awoods After reviewing and fixing up this PR some, I still agree with my assessment that the biggest changes here aren't necessary for fcrepo5. Individual resource versioning and import of historic versions in fcrepo5 removes the complexity of having to replay changes chronologically. There are some refactors in here that would probably be fine to have on master, might take a little effort to split them off though.

Not surprisingly there were a number of issues I ran into while attempting to fix the failing tests, particularly related to importing versions containing resources that later were deleted. From the blurb at the top of the PR apparently I didn't think this PR was really complete back then, but hopefully its in better shape now after my last few commits.

I don't recall if there is any kind of example content we have used for testing releases of the import/export tool in the past, but it wouldn't hurt to put a more real world-ish example through this just to make sure everything seems fine after such a major refactor.

This thing is also still using fcrepo 4.6.0 and has security alerts, so there are some other 4.x maintenance issues that should probably be addressed before we move on to 5.x.