MarcusBarnes / mik

The Move to Islandora Kit is an extensible PHP command-line tool for converting source content and metadata into packages suitable for importing into Islandora (or other digital repository and preservations systems).
GNU General Public License v3.0
34 stars 11 forks source link

CSV Compound Child metadata ignored #494

Open bondjimbond opened 5 years ago

bondjimbond commented 5 years ago

I'm trying to run a Compound CSV job, and running into problems getting the metadata in child rows to end up in the child MODS.xml files. MIK is treating the objects as if they do not appear in the spreadsheet.

Here are my relevant config settings:

; MIK configuration file for an OAI-PMH toolchain.

[CONFIG]
config_id = DOH compound
last_updated_on = "2018-07-30"
last_update_by = "bw"

[SYSTEM]
date_default_timezone = 'America/Vancouver'
verify_ca = 0

[FETCHER]
class = Csv
input_file = "DOH/metadata/princeton_allenby.csv"
temp_directory = "/Volumes/Arca/doh_temp"
record_key = key
child_key = child_key

[METADATA_PARSER]
class = mods\CsvToMods
repeatable_wrapper_elements[] = name
repeatable_wrapper_elements[] = subject
repeatable_wrapper_elements[] = identifier
mapping_csv_path = "DOH/metadata/doh_mapping.csv"

[FILE_GETTER]
class = CsvCompound
input_directory = "/Volumes/Arca/DOH_FILES/allenby"
temp_directory = "/Volumes/Arca/doh_temp"
compound_directory_field = Directory

[WRITER]
;datastreams[] = MODS
class = CsvCompound
metadata_filename = MODS.xml
preserve_content_filenames = true
;require_source_file = false
output_directory = "/Volumes/Arca/doh/allenby"
child_title = "%parent_title%, side %sequence_number%"
child_sequence_separator = _
min_children = 1
postwritehooks[] = "/usr/bin/php extras/scripts/postwritehooks/generate_compound_structure_file.php"

Am I missing some setting to allow MIK to read the child objects' metadata rows, or is MIK broken?

mjordan commented 5 years ago

@bondjimbond I can take a look at this tonight. Can you share your metadata spreadsheet and mappings file with me via email?

bondjimbond commented 5 years ago

Sorry @mjordan, I just figured out what the problem is.

So my CSV's child_key colum includes "1" for the first child, "2" for the second. Due, most likely, to annoying spreadsheet editor tendencies to auto-format numerical cells as numbers instead of text. The filenames, meanwhile, end in _01.tif and _02.tif.

MIK throws the error:

[2019-01-21 21:04:04] ErrorException.ERROR: ErrorException {"message":"file_put_contents(/Volumes/Arca/doh/allenby/PRIN_Wright_34/2/MODS.xml): failed to open stream: No such file or directory"

Because based on the filenames, MIK created directories named "01" and "02", but based on the metadata it's trying to find the directory "1" and "2" to write the correct metadata.

So, a couple of takeaways:

  1. We should document this rather common problem somehow and warn users about it.
  2. Maybe it would be nice for MIK to be able to recognize that "1" and "01" could mean the same thing when it comes to child keys.
mjordan commented 5 years ago

Agreed that we should do something here. These two things are a good start, but I wonder if we should have a --checkconfig option for this as well.

bondjimbond commented 5 years ago

Indeed!

mjordan commented 5 years ago

Or possibly building a check into https://github.com/mjordan/iipqa.

bondjimbond commented 5 years ago

I'd rather see it just built into the checks done by --checkconfig... Check that the values in the child_key column match the values in the file extensions.

bondjimbond commented 5 years ago

It should be allowed for the directory to contain file numbers not mentioned in the CSV, but it should be illegal for the CSV to contain child_key values that are not found in the directory.