UAL-RE / ReBACH

Python-based tool to enable data preservation to a cloud-hosted storage solution
MIT License
2 stars 2 forks source link

ReBACH

Purpose:

This Python tool enumerates all the items and collections in ReDATA, downloads the associated files and metadata into a predefined structure on preservation staging storage, gathers curation information from curation staging storage, and adds the gathered information to the predefined structure.

Description:

ReBACH is run via the command line as outlined in the 'How to Run' section of this readme. During its run, ReBACH enumarates all published items and their versions on UArizona's Figshare using the Figshare API and downloads their metadata to the system memory. ReBACH then downloads files into the preservation staging storage for items that have a matching curation staging storage folder. The tool then validates the files and folder structure in the curation staging store for those items. For the items that have matching folders in the curation staging storage that pass validation, ReBACH copies the files from the curation staging storage into the corresponding preservation staging storage folder, otherwise the preservation staging storage folder and its contents are deleted. Information and errors are logged in a file with some information and errors displayed in the terminal.

Dependencies:

Requirements:

How to run:

Command line

These parameters are only available on the command line. Parameter Description
--xfg The path to the configuration file to use.
--ids A comma-separated list of article IDs to process. E.g., 12345,12356
--continue-on-error If there is an error during the item processing stage for a given item, skip it and continue to the next item.

Execution notes