Closed bondjimbond closed 5 years ago
@bondjimbond Islandora Bagit only generates bags at the object level, and doesn't know about pages or compound children. I thought there was a JIRA ticket for this but I can't find it at the moment.
Before we jump to a solution, we need to resolve the problem of whether it's better to include all related (pages, children) objects in a single Bag, or to generate a Bag per object and only preserve the relationships in the respective objects (object 1 is a child of object 2, etc.). Both approaches could present their own challenges.
The approach I've taken in Islandora Fetch Bags is to use a plugin to generate a file listing an object's children, which is then included in the Bag. We could probably do the same thing with Islandora Bagit. We'd then need to generate a separate Bag for each of those children.
Before we jump to a solution, we need to resolve the problem of whether it's better to include all related (pages, children) objects in a single Bag, or to generate a Bag per object and only preserve the relationships in the respective objects (object 1 is a child of object 2, etc.). Both approaches could present their own challenges.
Challenges indeed... I definitely see the appeal in zipping all the children into a single bag, but there are cases where that may be untenable. For example, I think we've got at least a few book objects in the repository where the page-level TIFFs put together were several gigabytes per book. That would be pretty hard to sync to OwnCloud.
But then, in disaster recovery terms, would it be easier to recreate the Book object from a single Bag, or several hundred based on a file outlining the relationships?
And how long would it take to put together a single Bag made out of several hundred children? Would that time out Drupal's cron?
(I know you advise against using Drupal's cron to run the bagging functions for that reason, of course. I'll probably export this to a drush script eventually, once all these other bits are figured out.)
In code terms, it should be fairly easy, I think, to take the preserve-all-children-separately approach... Just add some extra lines to islandora_westvault_form_submit()
. We're already checking for collections there and looping through their children; we can do a similar check for Books, Newspapers, and Newspaper Issues and just loop through their children as well.
Or perhaps the better approach would be to address https://jira.duraspace.org/browse/ISLANDORA-1854 instead?
Thanks, that's the one I was looking for - it wasn't tagged with the Islandora Bagit component. Is now. Let's use the plugin since it will be useful to all Bagit users.
Ah, but then the question is who's going to build it? 🤔 How big a project would that plugin be?
As component manager of the Islandora Bagit module, I can take on the development.
@mjordan Any updates on the status of these improvements to Islandora Bagit?
No. I'll spend time this weekend on this rather than the CRUD issue.
Thanks!
Addressed with #36
Question for @mjordan: How do Book bags work? I assume that if a Book is bagged, then all of its child pages are bagged with it, correct?
What about newspapers?
Problem with newspapers: it's entirely likely that a Newspaper object will be updated with new issues at some point in the future.
Should Newspapers themselves be preserved, or should they be handled like Collections, and preserve only the issues? Or is there some in-between approach to take, since we don't want to lose the parent Newspaper object's metadata and relationships?
I don't imagine we would want to force people to re-preserve the entire newspaper each time a new issue is added.