MarcusBarnes / mik

The Move to Islandora Kit is an extensible PHP command-line tool for converting source content and metadata into packages suitable for importing into Islandora (or other digital repository and preservations systems).
GNU General Public License v3.0
34 stars 10 forks source link

In field mappings, only add shared parent wrapper elements once #29

Closed mjordan closed 8 years ago

mjordan commented 9 years ago

If we have two mappings that define MODS elements that share a common parent wrapper element, we should only add the parent element to the MODS XML once. For example, in the following mapping file the 'Medium' and 'Work Measurements' source fields map to MODS <form> and <note> elements, respectively:

Calendar name,<titleInfo><title>%value%</title></titleInfo>,
School name,"<name type=""corporate""><namePart>%value%</namePart></name>",
Medium,<physicalDescription><form>%value%</form></physicalDescription>,
Work Measurements,<physicalDescription><note>%value%</note></physicalDescription>,
Publisher,<originInfo><publisher>%value%</publisher></originInfo>,
Year,<originInfo><dateIssued>%value%</dateIssued></originInfo>,
Format type,<genre>%value%</genre>,
President,"<note type=""president"">%value%</note>",
Board members,"<note type=""board members"">%value%</note>",
Administrators,"<note type=""administrators"">%value%</note>",
Instructors,"<note type=""instructors"">%value%</note>",
"Staff(technicians,support staff)","<note type=""staff"">%value%</note>",
Degree/Diplomas/Programs,"<note type=""degree/diplomas/programs"">%value%</note>",
Majors/Concentration,"<note type=""majors/concentration"">%value%</note>",
Honorary Degree Recipients,"<note type=""honorary degree recipients"">%value%</note>",
Scholarships/Awards Recipients,"<note type=""scholarship/award recipients"">%value%</note>",
Notes,<note>%value%</note>,

These two MODS elements share the parent <physicalDescription>. Currently, the XML produced looks like this:

  <physicalDescription>
    <form>Paper</form>
  </physicalDescription>
  <physicalDescription>
    <note>16 x 24.4</note>
  </physicalDescription>

but we probably want:

  <physicalDescription>
    <form>Paper</form>
    <note>16 x 24.4</note>
  </physicalDescription>
MarcusBarnes commented 9 years ago

It's worth noting that I was able to successfully validate a test MODS XML document with repeated physicalDescription elements against the MODS XML 3.5 schema. However, repeated parent wrapper elements may not be desirable for (human) readability and when ingesting packages created by MIK into Islandora or other systems.

mjordan commented 9 years ago

Grepping through the MODS 3.5 XSD file shows that every element is repeatable (at least, every occurrence of 'maxOccurs' has a value of 'unbounded' ). However, the LoC MODS User Guidelines say of <physicalDescription> "Repeating this element is not recommended." So, in this case we have schema constraints that do not match "best practice." Regardless, I'd like to have a generic solution to this problem that can be applied to any mapping, for no other reason than readability of the MODS files. Also, we may create XML using MIK that does have some repeatability constraints.

MarcusBarnes commented 9 years ago

I've made a commit to resolve this issue: fffc7ccd92e72436f76bda6137d5cf709d097396. However, I'm not particularly pleased with the code and would like to improve it. Suggestions are welcome.

MarcusBarnes commented 9 years ago

Thank you to @mjordan for the test case that demonstrates that the code in commit https://github.com/MarcusBarnes/mik/commit/fffc7ccd92e72436f76bda6137d5cf709d097396 doesn't handle repeated extension elements as desired. I will reopen this issue while I finish off a fix.

MarcusBarnes commented 9 years ago

Commit b233bf4fc794183f170e188a413cd377d32b847b adds the ability to set repeated elements in the configuration file under the METADATA_PARSER section. For example, you might want the MODS extension top-level element to repeat, rather than the child elements consolidated within one parent wrapper element.

MarcusBarnes commented 9 years ago

Wrapper elements with attributes are skipped by the determineRepeatedWrapperChildElements and related methods in the CdmToMods metadataparsers. The existing methods are not sufficient to track wrapper elements with differing attributes. there are situations where the wrapper elements should be consolidated, for example multiple subject elements with authority attributes.

MarcusBarnes commented 8 years ago

There was progress on this, but it is still not working as expected. See https://github.com/MarcusBarnes/mik/issues/100

MarcusBarnes commented 8 years ago

https://github.com/MarcusBarnes/mik/commit/d4fa81c26ad7f215e355c7c5cbf5394844833b93