Supports a preservation workflow through the creation of Archival Information Packages (AIP) from a LEAF environment.
:warning: These command-line scripts are only compatible with CWRC Repository v2.0 based on Linked Academic Editing Framework (LEAF). This replaces preservation workflow: cwrc_preservation used with the CWRC Repository v1.0 as v1.0 reached end-of-life Jan 5th, 2025.
The LEAF Bagger preservation toolkit contains scripts supporting a preservation workflow for a LEAF environment. The primary objective is to manage the flow of content from the CWRC repository into an OpenStack Swift repository for preservation (the destination may be extended for partner projects). Also, the repository provides an application to audit the contents of the source and preserved objects. The scripts are deployable within an OCI container to align with the deployment of CWRC Repository v2.0 and other LEAF installations.
Poll Drupal for items to preserve
leaf-bagger.py
Create AIP (archival information packages) from the Drupal item (metadata and media)
leaf-bagger.py
drupal/getjwtonlogin
used by https://github.com/cwrc/leaf-isle-bagger and https://github.com/cwrc/islandora_bagger (forked from mjordan) to acquire a JWT token within the login response with the JWT token used to pull media/content from the Drupal site for preservationAudit preserved content
leaf-bagger-audit.py
CWRC (Canadian Writing Research Collaboratory) is an “online infrastructure for literary research in and about Canada designed to meet the challenges and embrace the opportunities of the digital turn.” In other words, CWRC is a living repository (i.e., contains content that may be updated, for example, as facts and assertions about a person, place, or event are discovered or changed). CWRC, as of August 2023, contains ~410,000 objects accounting for 1TB+ in storage. The content comes from multiple research projects created by researchers located in many areas of Canada.
CWRC infrastructure is hosted with the Digital Research Alliance of Canada on the Arbutus Cloud hosted at the University of Victoria with data backups hosted in London ON and preservation with UofA Library (via OLRC).
The Dockerfile in this repository describes the requirements and setup. An overview of requirements includes:
views/preservation_show_node_timestamps?page={page}&changed={date_filter}
views/preservation_show_media_timestamps?page={page}&changed={date_filter}
The preservation workflow acts on a polling model where a script runs at regular intervals asking the repository for a list of new/changed items within a given window of time. Any new/changed item has an archival information package generated (AIP) and added to the preservation endpoint.
leaf-bagger.py
Result: a report of items added to the preservation endpoint.
ToDo: what if a small percentage of items in a preservation run fail?
force_single_node
option addedcd ${BAGGER_APP_DIR} && ./bin/console app:islandora_bagger:create_bag -vvv --settings=var/sample_per_bag_config.yaml --node=${drupal_node_id}
The preservation workflow includes an audit step checking, in a basic way, that what exists in the repository is preserved in the preservation endpoint. This step assumes the AIP creation script will fail in unexpected ways and tries to act as a second set of eyes to identify and report failures.
leaf-bagger-audit.py
Result: an audit report indicating the status of all nodes in the repository and their preservation status in a CSV file.
The Nox Python automation tooling helps automate testing and linting. The tool is integrated as part of the CI/CD. The noxfile.py
contains the configuration.
Install as per your OS, e.g., apt install nox
To run tests and linting:
nox
To run only tests
nox -s test
To run only linting
nox -s lint
To run tests outside nox
python3 -m venv ./rootfs/leaf-isle-bagger/venv
./rootfs/leaf-isle-bagger/venv/bin/python3 -m pip install -r rootfs/leaf-isle-bagger/requirements.txt
./rootfs/leaf-isle-bagger/venv/bin/python3 -m pip install -r rootfs/leaf-isle-bagger/requirements_test.txt
./rootfs/leaf-isle-bagger/venv/bin/pytest rootfs/leaf-isle-bagger/tests/
The scripts are meant to be executed within a containerized environment. For alternate approaches, review the Dockerfile layers for installation and docker-compose.yml for environment variable settings.
docker compose exec bagger with-contenv bash
su -s /bin/bash nginx -c "./venv/bin/python3 leaf-bagger.py --server ${BAGGER_DRUPAL_URL} --output /tmp/z.csv --force_single_node 1 --container cwrc-test"
rootfs/etc/s6-overlay/scripts/bagger-setup.sh
for an example of both leaf-bagger.py and leaf-bagger-audit.pyThe OCI container image is based on the isle-bagger image and isle-buildkit. Access to a Drupal site is also required with the container running within a [leaf-base-i8] container deployment or independently (i.e., in a separate deployment).
Local settings: see isle-bagger and parent containers for more settings (e.g., islandora-bagger tool settings). .env.sample contains a sample .env
for docker-compose
Environment Variable | Default | Description |
---|---|---|
LEAF_BAGGER_APP_DIR | /var/www/leaf-isle-bagger/ | The installed directory of islandora-bagger |
LEAF_BAGGER_OUTPUT_DIR | /data/log/ | Report location describing AIP creation & upload |
LEAF_BAGGER_AUDIT_OUTPUT_DIR | /data/log/ | Audit report location |
LEAF_BAGGER_CROND_DATE_WINDOW | 86400 | Time window; return new/changed items in the last "x" seconds |
OS_CONTAINER | OpenStack container name | |
OS_AUTH_URL | OpenStack auth URL | |
OS_PROJECT_ID | OpenStack project ID | |
OS_PROJECT_NAME | OpenStack project name | |
OS_USER_DOMAIN_NAME | OpenStack user domain name | |
OS_PROJECT_DOMAIN_ID | OpenStack project domain id | |
OS_USERNAME | OpenStack user name | |
OS_REGION_NAME | OpenStack region name | |
OS_INTERFACE | OpenStack interface | |
OS_IDENTITY_API_VERSION | OpenStack identity API version |
Two docker-compose secrets are also used
Secret | Description |
---|---|
BAGGER_DRUPAL_DEFAULT_ACCOUNT_PASSWORD | Drupal site password |
OS_PASSWORD | OpenStack user password |
Docker-compose env vars
Environment Variable | Default | Description |
---|---|---|
LOCAL_AIP_DIR | Set when using a bind mount; otherwise a Docker volume | |
LEAF_BAGGER_REPOSITORY | ghcr.io/cwrc | IOC image repository for the LEAF Bagger image; defaults to a local build |
LEAF_BAGGER_TAG | latest | IOC image tag name for the LEAF Bagger image; defaults to latest for a local build |
BAGGER_REPOSITORY | ghcr.io/cwrc | IOC image repository for the Isle Bagger image; defaults to a local build |
BAGGER_TAG | latest | IOC image tag name for the Isle Bagger image; defaults to latest for a local build |
How to update the base?
For an isle-buildkit update (gist, follow dependencies in the Dockerfile layers)
Note: if wanting to test leaf-isle-bagger and isle-bagger locally
docker compose build
docker compose build
leaf-isle-bagger using the same TAG for isle-bagger as in the previous step
docker compose up -d
to run the containerdocker compose exec bagger with-contenv bash
to shell into the containerSee the following as an alternative to specifying an OCI image registry and tag in the Dockerfile: https://docs.docker.com/build/bake/reference/. As an example, see isle-buildkit docker-bake.hcl
.