OregonDigital / oregondigital_2

The active development on Oregon Digital 2 is in the https://github.com/OregonDigital/OD2 repo.
Other
1 stars 1 forks source link

Proposal: write our own bagit code locally #291

Open jechols opened 8 years ago

jechols commented 8 years ago

We use the hybag gem and the bagit library to handle bags. Both gems haven't seen updates in a while, and both gems (in my opinion) overcomplicate what could be a very simple local process.

The bag spec is absurdly simple. I propose that we do not use any libraries and just handle bags within our app's code. If we find that our code really makes sense to the community, maybe we push it somewhere, but looking at hybag I'm not seeing a lot of benefit to the extra work that's been done to try and make the thing generic.

One issue I see with the current approach is the "one bag per asset" mindset. By only putting one asset in each bag, we're adding a fair amount of overhead. A bulk ingest of 5000 assets means reading 5000 separate bag manifests, rather than just one.

Additionally, verification of bags becomes meaningless. The point of a bag is to ensure validation in case files were missed or corrupted during transmission of data. When each asset is its own bag, we can validate that an individual asset got to its destination properly, but there is no automated mechanism to ensure the whole batch of assets made it. If we're expecting 5500 assets and only get 5400 of them, nothing in the application will know.

We already have so many places with too much room for human error. Bags are trivial to handle in our own codebase. It's not worth the extra error potential to use the current "one bag per asset" process. It's not worth the extra debugging and maintenance to use the bagit and hybag libraries.

As a possibly separate issue, in a perfect world, I'd like to see the bag ingest doing something with workflow metadata so that when an asset comes from a bag, we can identify the bag from which it came, and we can see if the bag as a whole was ingested properly. Items ingested from bags shouldn't be visible or even reviewable unless the whole bag ingest succeeded. If a bag ingest fails, all items from that bag should be purged from fedora.