MarcusBarnes / mik

The Move to Islandora Kit is an extensible PHP command-line tool for converting source content and metadata into packages suitable for importing into Islandora (or other digital repository and preservations systems).
GNU General Public License v3.0
34 stars 11 forks source link

SplitRepeatedValues metadata manipulator incorrectly reports failed regex #325

Closed mjordan closed 7 years ago

mjordan commented 7 years ago

Doing some pre-ingest tests with a large collection (11k CSV objects) and I'm finding that the SplitRepeatedValues metadata manipulator is reporting a failed regex about 50% of the time. Eyeballing the failed values doesn't reveal any obvious errors to me. Will investigate.

mjordan commented 7 years ago

I tracked the problem down (unnecessary checking of a preg_match()'s $matches within a loop) and fixed it.

In the process of debugging this, I discovered another, unrelated problem: some metadata manipulators throw an exception when the XML snippet they are processing has no length, writing the exception message "DOMDocument::loadXML(): Empty string supplied as input" to the mik.log. By checking for a length on their input and returning the input if the length is 0 we avoid those mik.log entries. With this check in place, the output MODS XML is complete and validates, so I would consider those lines more annoyances than anything else.

@MarcusBarnes OK to add the fix to the second problem into the same PR that fixes this first bug?

MarcusBarnes commented 7 years ago

@mjordan Please add the two fixes together. Thank you in advance for this work.

mjordan commented 7 years ago

Great, just running another job on the 11k objects. So nice to have a clean mik.log and valid MODS for each one! If my QA on the MODS finds no issues I'll open a PR.

mjordan commented 7 years ago

Closed.

MarcusBarnes commented 7 years ago

Addressed in pull-request https://github.com/MarcusBarnes/mik/pull/328 (committed with https://github.com/MarcusBarnes/mik/commit/7ce0163c80d2d5b7f9143a5cba7ddb2d97254051).