Closed ruebot closed 8 years ago
Comment by mjordan Tuesday Feb 03, 2015 at 16:20 GMT
Of course I'd vote for BagIt, for the reasons @ruebot mentions. But, I'd be cautious about requiring it since not all sites will have or want to convert their stuff to Bags. Then again, if we're going to require a manifest, requiring BagIt is not all that different.
Comment by daniel-dgi Tuesday Feb 03, 2015 at 18:04 GMT
I'm not terribly familiar with BagIt. It's not something that I've dealt with in my work for clients. But at first glance it seems pretty appropriate.
METS is another option, I guess. Or we could just use a simple JSON or YAML manifest, but something tells me an actual metadata standard would make people feel better about things.
Other than BagIt (which I'm assuming contains all the data in one package), we could probably get away with just dropping the manifest in the watch folder, so long as it details the location of files and the user running the camel process has access to those locations.
Comment by awoods Tuesday Feb 03, 2015 at 18:50 GMT
@daniel-dgi, "holey" bags are also an option if not all of the data is available in the package, with the optional fetch.txt
file.
See: http://tools.ietf.org/html/draft-kunze-bagit-06#section-2.2.3
Comment by ruebot Thursday Feb 19, 2015 at 18:48 GMT
Adding fcrepo and upgration tags since this could also inform the proposed upgration migration tool discussed on today's Fedora Tech call.
Comment by dmoses Saturday Oct 17, 2015 at 23:16 GMT
I think one of the most common patterns in the Drupal community for batch ingesting is using Feeds. It has a number of suppport modules for importing XML as well. @mjordan wrote a module a while back. BagIt would be good choice too and may add predictability to the ingest process.
Comment by daniel-dgi Tuesday Oct 20, 2015 at 14:21 GMT
Thanks for being awesome, @dmoses. Feeds seem attractive from a Drupal front end point of view. Could maybe parse rdfxml? Would like to hear what @mjordan has to say about pros/cons of using feeds and nodes. His module means he's probably got the most experience in that realm of Drupal land.
Not the first time bags have come up, either. I'm interested in seeing if we can zip them and use them to replace our hand-rolled format for zip importer. Are bags of bags possible, as well? It would be amazing if we could mimic what we're doing in 1.x batch but with a well defined standard.
Comment by ruebot Tuesday Oct 20, 2015 at 15:52 GMT
Serialized bags are totally a thing. Are you thinking of the book and newspaper batch ingest w/r/t the bags in bags idea?
Comment by mjordan Tuesday Oct 20, 2015 at 16:18 GMT
@daniel-dgi Bags are agnostic to the content in their 'data' directory and that content's organization, so as @ruebot says, it's legal to have a Bag of Bags. The child Bags would just be serialized into .zip or .tgz files.
To answer your question about nodes in Islandora Feeds, I took that approach because 1) it was easy/I am lazy and 2) it uncouples the steps of importing data and committing that data to the Fedora repo as objects. For example, you can perform various types of QA on the nodes before using Views Batch Operations to create the Islandora objects, add other datastreams, etc.
I wrote that module about two years ago, in fact, I started it at OR3013, with @dmoses, @ruebot and some of the usual suspects sitting right beside me in the back few rows of seats. Now that we have a clear path for Islandora 7.x-2.x, it makes even more sense to create nodes (for obvious reasons) than it did then.
A back of the envelope diagram for using an existing tool like Feeds to manage the import and Bags to wrap file assets might look something like: Feeds creates Drupal nodes that contain F4 object properties (maybe using a Feeds RDF parser?), with pointers to Bags on the Drupal filesystem. Each Bag contains the file assets for an Islandora object. The organization of the content within each Bag would likely be specific to each content model (basic image, newspaper issue, book, etc.). It is legal to also include a (non-Bag) manifest that represents the content model in some way e.g., OAI-ORE, METS), so we might want to explore that option as well.
Using both Feeds and Bags like this is probably overkill, and preparing the Bags would put an additional burden on content handlers. But, there are a lot of other benefits to Bags that may justify that burden, like built-in checksum generation and packaging. Using holey Bags as @awoods points out would add even more flexibility.
Comment by daniel-dgi Tuesday Oct 20, 2015 at 16:24 GMT
Maybe we're really talking about two things here? Just using feeds to import nodes, and then zipped bags as a zip importer replacement? Heck, we could even just accept zip files on our services endpoints and use that to consume entire objects as opposed to the multipart/form-data shenanigans I've got going on right now.
Would be nice to use bags in that way since it's a drupal agnostic fashion to move things around. Within Drupal, feeds definitely seems like a great way to go. Maybe we should make a ticket for someone to dabble around?
This is getting interesting :)
Comment by manez Tuesday Oct 20, 2015 at 16:28 GMT
My (probably not typical) use case would be vastly improved by a bulk export/ingest interface - some way to pull down a small bunch of objects and their metadata, then upload them back up to another Islandora site. Sounds like that's something in the Bags wheelhouse?
That said, +1 for Feeds being a nice GUI/Drupal-y way to import
Comment by mjordan Tuesday Oct 20, 2015 at 16:36 GMT
My (recyclable envelope) diagram used both Feeds and Bags because AFAIK Feeds doesn't deal with file assets in any standardized way and I was assuming that the nodes created by Feeds would have some binary files hanging off them. But, the two could be completely separate. Will jump back into the discussion later, must attend all the meetings now :disappointed:
Comment by daniel-dgi Tuesday Oct 20, 2015 at 19:10 GMT
@mjordan ah, i see. wasn't thinking about feeds not being able to handle files.
Comment by dmoses Tuesday Oct 20, 2015 at 19:33 GMT
I've got the 7.x.2 vm downloaded ... you can do files with feeds. I will investigate and try a proof of concept. Potentially?? it could be another migration tool by parsing the FOXML xml ... which includes paths to the binaries. Not sure. Will report back.
Since we've discussed Bagit bags here a fair bit, I might be worth making sure the planned Fedora Import/Export sprint is on their radar.
Closing old use cases until after MVP doc is released.
Issue by daniel-dgi Tuesday Feb 03, 2015 at 15:31 GMT Originally opened as https://github.com/islandora-interest-groups/Islandora-Fedora4-Interest-Group/issues/13
Reformatting this to use the Use Case template.
Remarks: