Closed eporter23 closed 1 year ago
PR made: https://github.com/emory-libraries/dlp-curate/pull/1851
NOTE: Besides the headers listed above, model
, file
, parent
, file_types
are also necessary for successful imports with Bulkrax.
model
: Collection. CurateGenericWork, FileSet; defaults to FileSet, but I don't recommend leaving this blank.file
: The method of importation that has shown to work is utilizing a ZIP file. The CSV would sit at root of it, while the images/other items would be contained inside of a files
folder. Differing from Zizia, the files to be attached to each FileSet should all be listed in this field on the same line, and multiples (according to the documentation) should be separated by a semicolon. See this for an example: https://app.zenhub.com/files/158455630/d0128792-c3c2-481c-8391-b0d4eb0b29d2/download The files should only be listed by their name combined with extension.parent
: A reference to the container above this item. If importing a FileSet, this should be the containing CurateGenericWork. This should contain one of two strings:
deduplication_key
.id
.file_types
: To accommodate our filetype customization, this field was needed. It should contain one string with the filename and filetype coupled together with a semicolon. Multiples should be joined by a pipe. For example: "AmericanTail.jpg:preservation_master_file|BackToTheFuture.png:service_file"; This field defaults to :preservation_master_file
if a filetype isn't found correctly.Also, another note about deduplication_key
: The practice of providing a value here can continue, but, if left blank, Bulkrax will create a dynamic id for the field. Every individual Work or Fileset imported needs its own unique deduplication key when importing with Bulkrax. The field parent
can use deduplication_key
to link FileSet to Work.
As well, bear in mind that Bulkrax actively uses source_collection_id
to tie Works to Collections.
This ticket relates to the larger goals of #1837. For a first version of testing Bulkrax, we want to achieve a minimal functional installation and configuration of Bulkrax in Curate. See their documentation for CSV Ingest and Configuration options.
Install Bulkrax and configure a CSV Importer that populates simple filesets (1 file per fileset) and Curate required metadata fields for works/filesets:
title
[used for work titles and fileset labels]holding_repository
[works only]date_created
[works only]content_type
[works only]emory_rights_statements
[works only]rights_statement
[works only]data_classifications
[works only]visibility
[works only? assume that work visibility dictates fileset visibility]deduplication_key
[not used by Bulkrax but required for Zizia importer]source_collection_id
[works only]pcdm_use
[if needed, applicable to filesets only]Determine how to populate Bulkrax's required source_identifier field
Ingest a representative basic image collection e.g. Oxford Asian Artifacts Collection
source_identifier
andparents
are currently TBD until we figure out the best strategy to populate a shared SOLR field across works, Collections, and filesets.id
is generated at the time of ingest, so we can't predict it in advance. Works havededuplication_key
, but this is not populated for Collections or Filesets currently.deduplication_key
) and all works related to the parent collection (currently noted insource_collection_id
). The value in source_collection_id will vary depending on the environment (local, arch, test, prod) so this may need to be revised in the sample CSV template.