MetaArchive / bagit-split

Tools for handling the splitting (and unsplitting) of BagIt archives. Made with Python.
Other
2 stars 4 forks source link

Bag Splitting post Bag-it 5.x #5

Open ntallman opened 7 years ago

ntallman commented 7 years ago

LOC is removing the CLI tools from bagit-java in version 5, essentially breaking this script. Since bagit-python, bagit ruby (https://github.com/tipr/bagit), and bagger all do not have bag splitting capability, the only way for institutions to split bags is to write JAVA code, which is not feasible for all dev shops. Any thought to updating this script to make direct use of the Java library?

nkrabben commented 7 years ago

I think this script is a pure python implementation of bag splitting. It uses bagit-python which is itself pure python, so it won't be affected by changes to bagit-java.

ntallman commented 7 years ago

But the documentation says the first command you have to run is bagit-jave CLI to actually split the bag, with the second python script being verification and added tags/documentation?

Splitting a Bag

$ bag splitbagbysize <BAG> --maxbagsize 30
$ python bag-split.py split <BAG>

The first command above uses the official BagIt command-line utility (bag) to split the original , in this example using 30GB as the per-bag limit. You can also use --maxbagsize values like .001 to indicate 1 MB (for example).

The second command uses this tool to verify the split bags against the original bag for integrity and completeness, as well as to create an additional "metadata" bag among the split bags; the /data directory of the metadata bag will contain the original bag's manifests and bag-info.txt file.

nkrabben commented 7 years ago

Oh, looks like I read the code wrong. In that case, it seems like an update is needed if this script is still in use.

Would you be interested in helping to add bag splitting to bagit-python? I'd like to bring the various bag libraries closer to parity on features that they support.

ntallman commented 7 years ago

I would absolutely love it if bag splitting was built into bagit-python! In fact, I've already commented on a GitHub Issue for just that. I'm using bagit-python for other parts of the bagging, would be great to not have to pull in another script.

nkrabben commented 7 years ago

Here's the code that bagit-java used to use: https://github.com/LibraryOfCongress/bagit-java/blob/0a7e63c7e804127ad628246f1e768bde6a692a7f/src/main/java/gov/loc/repository/bagit/transformer/impl/SplitBySize.java

I'll take a crack at reimplementing that this weekend.

ntallman commented 7 years ago

+1! Thank you! Digital preservation practitioners of the world thank you too!

Educopia commented 7 years ago

thanks for taking a swing at updating Nick!

ntallman commented 7 years ago

Any update? Bagit-python still doesn't have bag splitting.