NYPL / ami-tools

MIT License
16 stars 6 forks source link

AMI Tools

Build Status Coverage Status

Python3 scripts and classes to help with managing bags of NYPL AMI files

Installation and Updates

Production use

Run the following from your terminal

pip3 install --user 'ami-tools @ git+https://github.com/NYPL/ami-tools'

If you are using virtual environments, do not include the --user.

Development use

If you want a version that you can edit and run separately from the production install, clone this repo and then install it to a virtual environment.

cd /path/to/repo
pyenv virtualenv amitools-dev
pyenv local amitool-dev
pip install -e .

Whenever you run any portion of the ami-tools package while in /path/to/repo it will use this working version of the package.

Tools

Installing the package makes the following tools available from the command line. All scripts include a help dialog.

script_name.py -h

Data collection

survey_drive.py

Generate the following from a mounted drive (or any folder): report of all files, report of all bags, directory with a copy of all presumed metadata (JSON and Excel)

Usage: Survey a drive mounted on a Mac

survey_drive.py -d /Volumes/drive-name -o path/to/dir/for/reports

Validation Tools

validate_ami_bags.py

Check bag Oxums, bag completeness, bag hashes, directory structure, filenames, and metadata (only implemented for Excel)

Usage: Check a directory of bags, default check does not look at metadata or checksums

validate_ami_bags.py -d path/to/dir/of/bags

Usage: Check a single bag, including metadata or checksums

validate_ami_bags.py -b path/to/bag --metadata --slow

validate_ami_excel.py

Check if an excel file adheres to the expectations of media ingest

Usage: Check a single Excel file

validate_ami_bags.py -e path/to/excel/file

validate_bags.py

Check bag Oxums, bag completeness, and bag hashes (if requested). Default is similar to bagit.py --validate --fast except includes completeness check. Less strict than validate_ami_bags.py.

Usage: Check a single bag

validate_bags.py -b path/to/bag --slow

Bag Management Tools

fix_baginfo.py

Update Oxum in bag-info.txt to match actual Oxum

Usage: Check and repair a directory of bag Oxums

fix_baginfo.py -d path/to/dir/of/bags

repair_bags.py (in development)

Manage files in bag-payload but not in manifest, either adding them to the manifest or deleting them.

Usage: Add all untracked files to manifest and Oxum

repair_bags.py -b path/to/bag --addfiles

Usage: Delete all untracked file from data/ directory. By default, only the following system files will be deleted: Thumbs.db files, DS_Store files, Appledouble files, and Icon files

repair_bags.py -b path/to/bag --deletefiles

convert_excelbag_to_jsonbag.py (in development)

Convert an bag that meets rules for AMI Excel bags to a bag that meets rules for AMI JSON bags

Usage: Convert all bags in a directory from Excel to JSON

convert_excelbag_to_jsonbag.py -b path/to/bag

Classes

The package also contains classes for implementing further tools

ami_bag.ami_bag

Extension of the bagit-python Bag class with methods for validation and classification of bags according to NYPL AMI rules

ami_md.ami_excel

Classes and methods for Excel workbooks and sheets storing metadata about preservation masters, edit masters, and no transfers

Usage: Validate the contents preservation master sheet against the ingest business rules

import ami_md.ami_excel

excel_file = ami_md.ami_excel("path/to/excel.xlsx")
excel_file.pres_sheet.validate_worksheet()

ami_md.ami_json

Methods for loading and manipulating AMI JSON data.

Usage: Convert a valid AMI JSON file to a flat key-value dict

import ami_md.ami_json

json_file = ami_md.ami_json(filepath = "path/to/file.json")
new_dict = json_file.convert_nestedDictToDotKey(json_file)

ami_md.ami_md_constants

Constants used for validating, normalizing, and enhancing metadata, mostly through methods in ami_excel.

Shell scripts

The package also includes a handful of scripts for utility functions. To install these scripts, users should chmod +x and create an appropriate alias for each script.

bin/collect_metadata.sh

Copy xlsx and json from bags to a another directory for manipulation and analysis

validate_bags.sh

Validate a directory of bags after network transfer (superseded by validate_ami_bags.py)