MetaArchive / bagit-split

Tools for handling the splitting (and unsplitting) of BagIt archives. Made with Python.
Other
2 stars 4 forks source link

NOTE: As of 2023 this tool is no longer maintained or supported.

BagIt Splitting and Unsplitting Tools

Overview

This tool provides functionality for splitting BagIt bags into a collection of smaller bags, and for "unsplitting" these bags back into a single bag mostly identical to the original.

Usage

To see the full command-line help text, do:

$ python bag-split.py --help

Prerequisites

None, but this script operates on split bags created by the Library of Congress' bagit-java tool.

Splitting a Bag

$ bag splitbagbysize <BAG> --maxbagsize 30
$ python bag-split.py split <BAG>

The first command above uses the official BagIt command-line utility (bag) to split the original , in this example using 30GB as the per-bag limit. You can also use --maxbagsize values like .001 to indicate 1 MB (for example).

The second command uses this tool to verify the split bags against the original bag for integrity and completeness, as well as to create an additional "metadata" bag among the split bags; the /data directory of the metadata bag will contain the original bag's manifests and bag-info.txt file.

Unsplitting a Bag

$ python bag-split.py unsplit <DIRECTORY CONTAINING BAGS>

This command creates a new directory called MERGED_BAG by merging the bags found inside into a single reconstructed bag. The tool will check that the reconstructed bag is a faithful reconstruction of the original. If the input directory's name ended in "_split" (this is added by the LoC tool when it splits a bag), the resulting directory will have the same name with "_split" removed; if the input directory name did not end in "_split", "_merged" will be added to the name of the input directory.

License

See LICENSE.txt