MarcusBarnes / islandora_compound_batch

Provides the basic ability to batch import compound objects into Islandora.
GNU General Public License v3.0
3 stars 12 forks source link

Allow ingesting of child objects that only have metadata #16

Open mjordan opened 7 years ago

mjordan commented 7 years ago

Coming out of #14, we should support the creation of child objects that only have a MODS.xml or MARC.xml file and no payload file (PDF, TIFF, etc.). Currently, islandora_compound_batch determines the child's content model based on the extension of its payload file; additionally, if the payload file is absent, the child is not created. We will need to provide a drush option to let users indicate which content model to assign to metadata-only child objects.

Tagging @egesu to make sure this issue represents the intended use case.

egesu commented 7 years ago

Thanks. I can help with developing this feature. It feels nice to be a part of it and this module seems an easy place to start :-)

We can add a option like --set_content_model=islandora:fooCModel or we can put a simple txt file to each folder to specify the content model. Maybe it can be options.json to add more configuration later.

mjordan commented 7 years ago

My initial approach to #14 was to add an attribute to the child entries in the structure.xml file that indicated the desired content model, but I backed off from that approach because I thought that modifying the structure.xml files for say 5,000 objects would be a PITA. IMO adding another file to each child directory may also be a pain for some users. I'd prefer to see a new drush option, and assume that all the compound objects in the batch are of the same content model. --set_content_model seems as good as any. @MarcusBarnes thoughts?

mjordan commented 7 years ago

Or --child_content_model may be more descriptive? Another option would be to piggy back on the new --content_models option and allow a null extension, like --content_models=null::islandora:fooCModel.

MarcusBarnes commented 7 years ago

@mjordan Would we be able to create a mapping based on some expected metadata value. For example, for MODS XML, the values of typeOfResource http://www.loc.gov/standards/mods/mods-outline-3-5.html#typeOfResource could be mapped to a specific cmodel. The way this would work in this example, is that for each MODS.xml file encountered, inspect the value of the typeOfResource element and match that provided in drush option and assign the corresponding cmodel, providing a default value as needed.

This may be an over complication at this point, since most production compound objects encountered so far have been things like postcards, so using --child_content_model would provide the desired functionality.

mjordan commented 7 years ago

@MarcusBarnes we could use a mapping from metadata fields to determine the anticipated content model of child objects, but some of the enumerated values for typeOfResource could map to multiple content models, e.g., "still image", "text", "cartographic", and maybe even "notated music" could validly map to islandora:sp_pdf, islandora:sp_basic_image, or islandora:sp_large_image_cmodel. Plus, we'd need to accommodate mappings for both MODS and MARC XML elements.

I think the most flexible solution to this problem is to define a hook that would let other modules use whatever method they wanted to define the anticipated content model. But, doing so would add some complexity.