MarcusBarnes / mik

The Move to Islandora Kit is an extensible PHP command-line tool for converting source content and metadata into packages suitable for importing into Islandora (or other digital repository and preservations systems).
GNU General Public License v3.0
34 stars 10 forks source link

Should CONTENTdm and CSV toolchains generate only MODS.xml? #268

Open mjordan opened 8 years ago

mjordan commented 8 years ago

MIK was designed to write out either MODS or DC datastreams. For OAI-PMH toolchains, this makes sense since they will commonly harvest DC from the remote repository. However, is it worth supporting the ability to write DC in the CONTENTdm and CSV toolchains? Since MIK makes writing out MODS fairly easy, would any Islandora site want a CONTENTdm or CSV toolchain to generate DC instead?

I think that the reason we included the metadata_filename option in CONTENTdm and CSV toolchains was to allow them to write out DC.xml instead of MODS.xml. I'll review the code to confirm this recollection and report back. As @patdunlavey points out, the use of metadata_filename is not well documented. If the CONTENTdm and CSV toolchains only ever write out MODS.xml files, we should not require MIK users to configure it to do so.

mjordan commented 8 years ago

Looking at the various groups of classes, mention of MODS is absent from fetchers and filegetters (which is not surprising), but it is frequently used in writer classes, and there is quite a bit of code in the writers that assumes we are writing MODS. CdmSingleFile and CsvSingleFile writers can accommodate either MODS or DC, but, ironically, in both cases the .xml file that is written is not named MODS.xml or DC.xml, it is named using the object identifier or object filename. So accommodating either MODS.xml or DC.xml filenames in single file writers doesn't make any sense.

The more I think about eliminating the option to write DC.xml files and only write MODS.xml for books, newspapers, and compound objects, the more I think we'd gain in terms of simplifying configuration, code, and documentation. If someone wanted to generate DC instead of MODS, they could transform every MODS XML document down to DC using a post-write hook or external script, since mapping from MODS to DC is well understood, and in fact, for this purpose MIK could use the same XSLT stylesheet that Islandora uses, version 1.4 of the MODS 3.4 to DC stylesheet from the Library of Congress.

mjordan commented 7 years ago

I think it would be possible for MIK output DC instead of MODS XML using a combination of the ]templated metadata parser](https://github.com/MarcusBarnes/mik/wiki/Cookbook:-Templated-Metadata-Parser) (using a DC template instead of a MODS template) and setting applicable toolchains' [WRITER] metadata_filename config option to "DC.xml". If that's the case, maybe we can document this capability more clearly?