A set of tools to
Python command-line script to export content via the Islandora REST API on the Islandora Legacy site thus able to run anywhere
XML Metadata
fileXML Metadata
file contains the source metadata in XML form for use by a transformation step to build the CVS format required by Islandora Workbench, (i.e., FOXML Fedora Metadata, plus MODS XML, plus datastream export locations)Set of XQuery scripts to convert the metadata into a CSV format (used with BaseX.org) -- we're using BaseX.org to bulk explore/surface how metadata has been recorded in MODS as CWRC doesn't use the default XML Form Builder forms
Python command-line script to verify contents, to a specified degree, that the Islandora Legacy MODS metadata exists in the new Islandora site via a comparison with the JSON-LD serialization
Git clone the repository
Install Python 3+ (haven't tried with other versions)
Add Python libraries -- local user (not systemwide)
python3 setup.py install --user
Add Python libraries -- systemwide
sudo python3 setup.py install
define a list of pid to export from the Islandora Legacy site and added to a file, one per line
execute the extraction script
--id_list
: list of PIDs to export--server
: the Islandora Legacy server (Drupal 7)--export_dir
: directory to store the export packagepython3 islandora7_export.py --id_list test_data/z --server ${ISLANDORA_LEGACY:-https://example.com} --export_dir /tmp/z/
<metadata pid="" label="" owner="" created="" modified="">
<media_exports>
</media filepath="" ds_id="">
<!-- a list of Islandora Legacy extracted datastreams with their path and datastream id -->
</media_exports>
<resource_metadata>
<!-- a list of extracted metadata datastreams including MODS, RELS-EXT, ect. -->
</resource_metadata>
</metadata>
This script compares the Islandora Legacy content with the new imported via Islandora Workbench content in the new Islandora site to verify/audit the export, transformation, and loading phase. The comparison is made between the Islandora Legacy MODS metadata and the Islandora JSON-LD output.
Reference for the metadata conversion: Islandora MIG and Islandora MIG (Metadata Interest Group) MODS-RDF Simplified Mapping
combined_metadata
directory contents produced by the Extraction
stepTransformation
directory to transform XML metadata into a CSV for use with Islandora WorkbenchA list of available fields can be discovered via the --get_csv_template
option within Islandora Workbench. The fields available depend on the combination of the Drupal config created either via the Islandora defaults profile or the Drupal config subsequently added initial Drupal setup.
the sample transform attempts to use the parent_id
if the collection object is in the exported set in the previous set otherwise defaults to the specified node_id
in the XQuery transform
Care needs to be taken with collections otherwise resources can be added without a collection
Collections need to appear before children/members in the workbench CSV (see creating collections and members together)
2021-10-22: add some logic that attempts to order items in CSV by collection hierarchy: this only works if the items in the collection hierarchy are present and also not already in Islandora. Note: the url_alias
should trigger a warning if one tries to add a collection that pre-exists.
Each item should have either a parent_id
(if the parent collection is referenced in the workbench CSV) or field_member_of
(if the parent collection pre-exists in Drupal). Note: if not, then resources will float without a parent. Creating collections and members together)
field_member_of
to all collections without a parent_id
parent_id
of the member should reference the id
of the parentIf items are added without a collection, the output_csv
Islandora Workbench config will provide a way to update existing items (don't lose the file) assuming they have not changed via the UI. See Islandora Workbench documentation for details.
todo: flesh out potential problem areas around the collection hierarchy and loading
Due to archival records containing the |
character, the Islandora Workbench subdelimiter is set to a custom value as the Workbench default is |
. This requires updating (2022 version is ^|.|^)
Load via Islandora Workbench using the CSV created during the transformation section. See the Workbench documentation for details. A sample config is included in the test_data
directory.
to check that the CSV to import is valid
python3 workbench --config ../workbench_config/workbench_config_test_02.yaml --check
--check
parameter from the above python3 workbench --config ../workbench_config/workbench_config_test_02.yaml
More information:
Attempts to compare Islandora Legacy XML to the JSON-LD output of Islandora (Drupal 8+) node using the mappings defined by the Islandora MIG and with the document: Islandora MIG (Metadata Interest Group) MODS-RDF Simplified Mapping
python3 islandora_audit.py --id_list test_data/z --islandora_legacy https://example.com/ --islandora https://example_9.com/ --comparison_config test_data/comparison_config.sample.json
Purpose: to return a list of all the direct members of a specified collection. As of 2022-04-19: It doesn't traverse the descendent collections of the specified collection.
See the islandora_search.py script
python3 islandora7_search.py --input_file input_file_listing_collection_PIDs --server https://cwrc.ca --output_file output_file_to_store_results
To run tests:
python3 tests/export_unit_tests.py
pycodestyle --show-source --show-pep8 --ignore=E402,W504 --max-line-length=200 .
Media files fail to load via Islandora Workbench (or via the Drupal UI)
fedoraAdmin
roleHow to gather a set of PID from Islandora Legacy (Islandora 7)?
collection_PID=some_islandora_collection_pid
curl "http://localhost:8080/solr/select?rows=999999&start=0&fl=PID&q=RELS_EXT_isMemberOfCollection_uri_ms:%22info:fedora/${collection_PID}%22%20OR%20PID:%22${collection_PID}%22&wt=csv&sort=PID+asc"
linked agent:
<mods:typeOfResource>sound recording-nonmusical</mods:typeOfResource>
:
field_resource_type
is this a special Islandora vocabulary?field_resource_type
and field_model
: mapping via the Islandora Legacy cModel type to Islandora taxonomy terms -- is this correct?
<mods:issuance>monographic</mods:issuance>
recordInfo: need mapping
langcode?
List all models
for $i in /metadata/@models
group by $i
return $i
Lookup by PID
let $pid = "digitalpage:881e0ee6-52ed-4f05-9e8d-c5e51c5c1a31"
for $i in /metadata[@pid=$pid]
return $i