bentley-historical-library / bhl_born_digital_utils

Scripts used for removable media transfers at the Bentley Historical Library
5 stars 8 forks source link

Refactor bhl_born_digital_utils #32

Closed djpillen closed 5 years ago

djpillen commented 5 years ago

This pull request refactors bhl_born_digital_utils. The most notable change is that all scripts have been moved from their individual directories into a single bhl_born_digital_utils directory and are now controlled by a bhl_born_digital_utils.py script that takes a positional input directory, an argument for a command (taking the place of the individual scripts) and command-specific optional arguments.

The functionality of most individual scripts remains the same. Notable exceptions include optical_types.py, which was deleted, runbe.py, which was updated to scan barcode directories individually and then does some minimal parsing of results to delete empty .txt files, and copy_accession.py (formerly robocopy.py) which uses robocopy on Windows systems and rsync on other systems. The scripts have also generally been updated to be a bit more agnostic about whether or not removable media came from the RipStation, and some preliminary work has been done to break out common functionality within a script into functions. Additional future work could further separate those out into shared functions between scripts, and could also implement more intelligent ways of identifying whether or not external tools like ffmpeg or bulk_extractor are available on the system path, in a standardized directory, passed via the command line, or configured in some sort of config file.

Since the scripts have been moved, their git history has mostly been lost, and this pull request makes it appear as though a lot of files were deleted and a lot of files were created, rather than edited. While this is technically true, it does make it difficult to get a sense of what changed in each individual script. To see the changes made to most individual scripts, refer to this commit on my branch: https://github.com/djpillen/bhl_born_digital_utils/commit/56cec3e8f58a7839e9e733dfed418a0ace62c3bc

This script resolves https://github.com/bentley-historical-library/bhl_born_digital_utils/issues/11. It also resolves https://github.com/bentley-historical-library/bhl_born_digital_utils/issues/20 and https://github.com/bentley-historical-library/bhl_born_digital_utils/issues/19

A notable exception at the moment to this refactor work is make_dips.py. Additional work is also needed to resolve ongoing issues with that utility.

cc @hyeeyoungkim

djpillen commented 5 years ago

FYI, I've just added a couple of additional commits to this to add two new utilities: one that moves separated barcodes (identified by the separations column in the bhl_inventory.csv) into a given destination directory and another that moves audio and video formatted media (identified by the media_type column in the bhl_inventory.csv) into their own directory. Both are intended to assist with some common post-RMW transfer/pre-Archivematica transfer review steps.

MaryseLT commented 5 years ago

https://github.com/djpillen/bhl_born_digital_utils/blob/a4f891447c0b2720a872b0aba97ab15d8f51db87/bhl_born_digital_utils/check_output_structure.py#L102

This line of code needs to be changed to this or else this script will not be able to find the file path:

cmd = "{0} -loglevel error -i \"{1}\" -f null - 2>\"{2}\"".format(ffmpeg_path, media_path, log_path)

djpillen commented 5 years ago

https://github.com/djpillen/bhl_born_digital_utils/blob/a4f891447c0b2720a872b0aba97ab15d8f51db87/bhl_born_digital_utils/check_output_structure.py#L102

This line of code needs to be changed to this or else this script will not be able to find the file path:

cmd = "{0} -loglevel error -i "{1}" -f null - 2>"{2}"".format(ffmpeg_path, media_path, log_path)

Just submitted a commit to resolve this: https://github.com/bentley-historical-library/bhl_born_digital_utils/pull/32/commits/db10e0c6453237d4f603e7f420736cc51f0b91e1