Open ntallman opened 7 years ago
I think this script is a pure python implementation of bag splitting. It uses bagit-python which is itself pure python, so it won't be affected by changes to bagit-java.
But the documentation says the first command you have to run is bagit-jave CLI to actually split the bag, with the second python script being verification and added tags/documentation?
$ bag splitbagbysize <BAG> --maxbagsize 30
$ python bag-split.py split <BAG>
The first command above uses the official BagIt command-line utility
(bag
) to split the original
The second command uses this tool to verify the split bags against the original bag for integrity and completeness, as well as to create an additional "metadata" bag among the split bags; the /data directory of the metadata bag will contain the original bag's manifests and bag-info.txt file.
Oh, looks like I read the code wrong. In that case, it seems like an update is needed if this script is still in use.
Would you be interested in helping to add bag splitting to bagit-python? I'd like to bring the various bag libraries closer to parity on features that they support.
I would absolutely love it if bag splitting was built into bagit-python! In fact, I've already commented on a GitHub Issue for just that. I'm using bagit-python for other parts of the bagging, would be great to not have to pull in another script.
Here's the code that bagit-java used to use: https://github.com/LibraryOfCongress/bagit-java/blob/0a7e63c7e804127ad628246f1e768bde6a692a7f/src/main/java/gov/loc/repository/bagit/transformer/impl/SplitBySize.java
I'll take a crack at reimplementing that this weekend.
+1! Thank you! Digital preservation practitioners of the world thank you too!
thanks for taking a swing at updating Nick!
Any update? Bagit-python still doesn't have bag splitting.
LOC is removing the CLI tools from bagit-java in version 5, essentially breaking this script. Since bagit-python, bagit ruby (https://github.com/tipr/bagit), and bagger all do not have bag splitting capability, the only way for institutions to split bags is to write JAVA code, which is not feasible for all dev shops. Any thought to updating this script to make direct use of the Java library?