bentley-historical-library / bhl_born_digital_utils

Scripts used for removable media transfers at the Bentley Historical Library
5 stars 8 forks source link

create split_transfer utility #44

Closed djpillen closed 4 years ago

djpillen commented 4 years ago

This PR creates a split_transfer utility that takes a larger transfer and splits it into one or more smaller transfers. From the README:

This utility splits a transfer into multiple smaller chunks. This is especially useful for transfer with more than 10,000 files, which can cause problems in Archivematica. The utility counts the number of files in each item within a transfer and then moves items into directories of fewer than the split size, which defaults to 5,000 files and can be modified by passing a --split_size parameter. The utility keeps individual items whole (i.e., it will not move some subdirectories from one item into chunk and other subdirectories into another chunk). As a result, it is best suited to transfers of many small-to-medium size items, rather than a transfer of one large item (e.g., a single hard drive with 10s of thousands of files). The chunk directories are created inside the transfer directory and are appended with a three-digit sequence. For example, a transfer 172345 with 12,234 files would be split into about 3 chunks: 172345-001, 172345-002, 172345-003