emory-libraries / dlp-curate

Digital curation and preservation workbench for the Emory Preservation Repository.
11 stars 4 forks source link

Bulkrax testing v.1: basic filesets and required metadata #1839

Closed eporter23 closed 1 year ago

eporter23 commented 2 years ago

This ticket relates to the larger goals of #1837. For a first version of testing Bulkrax, we want to achieve a minimal functional installation and configuration of Bulkrax in Curate. See their documentation for CSV Ingest and Configuration options.

Install Bulkrax and configure a CSV Importer that populates simple filesets (1 file per fileset) and Curate required metadata fields for works/filesets:

title [used for work titles and fileset labels] holding_repository [works only] date_created [works only] content_type [works only] emory_rights_statements [works only] rights_statement [works only] data_classifications [works only] visibility [works only? assume that work visibility dictates fileset visibility] deduplication_key [not used by Bulkrax but required for Zizia importer] source_collection_id [works only] pcdm_use [if needed, applicable to filesets only]

Determine how to populate Bulkrax's required source_identifier field

Ingest a representative basic image collection e.g. Oxford Asian Artifacts Collection

bwatson78 commented 2 years ago

PR made: https://github.com/emory-libraries/dlp-curate/pull/1851

NOTE: Besides the headers listed above, model, file, parent, file_types are also necessary for successful imports with Bulkrax.

Also, another note about deduplication_key: The practice of providing a value here can continue, but, if left blank, Bulkrax will create a dynamic id for the field. Every individual Work or Fileset imported needs its own unique deduplication key when importing with Bulkrax. The field parent can use deduplication_key to link FileSet to Work. As well, bear in mind that Bulkrax actively uses source_collection_id to tie Works to Collections.