MarcusBarnes / mik

The Move to Islandora Kit is an extensible PHP command-line tool for converting source content and metadata into packages suitable for importing into Islandora (or other digital repository and preservations systems).
GNU General Public License v3.0
34 stars 10 forks source link

Create a toolchain to illustrate outputting packages other than Islandora ingest packages #260

Closed mjordan closed 8 years ago

mjordan commented 8 years ago

For example, where the metadata is in serialized JSON instead of MODS XML.

mjordan commented 8 years ago

@MarcusBarnes I've got a CSV toolchain that produces JSON metadata files instead of XML. Its output looks like:

csv_to_json_output/
├── IMG_1410.JPG
├── IMG_1410.json
├── IMG_2549.JPG
├── IMG_2549.json
├── IMG_2940.JPG
├── IMG_2940.json
├── IMG_2958.JPG
├── IMG_2958.json
├── IMG_5083.JPG
├── IMG_5083.json
├── manipulator.log
├── mik.log
└── problem_records.log

An example JSON metadata file is:

{
   "Identifier":"image05",
   "File":"IMG_5083.JPG",
   "Title":"Alcatraz Island",
   "Creator":"Jordan, Mark",
   "Date taken":"2014-01-14",
   "Subjects":[
      "Alcatraz Federal Penitentiary",
      "islands"
   ],
   "Note":"Taken from Fisherman's Wharf, San Francisco.",
   "key":"image05"
}

A typical .ini file looks like:

; MIK configuration file for the CSV to JSON toolchain.

: This toolchain is intended to illustrate how to extend MIK to create
; output that differs from Islandora ingest packages. In this case, the
: metadata files are in serialized JSON format, not XML.

; We're able to reuse the CSV fetcher and file getter classes. Yay!

[SYSTEM]

[CONFIG]
config_id = MIK CSV to JSON test
last_updated_on = "2016-10-27"
last_update_by = "Mark Jordan"

[FETCHER]
class = Csv
input_file = "tutorial_metadata.csv"
temp_directory = "/tmp/csv_to_json_temp"
record_key = Identifier

[METADATA_PARSER]
class = json\CsvToJson
; No mappings file; CSV column headings are used as the keys in the JSON.

[FILE_GETTER]
class = CsvSingleFile
input_directory = "/home/mark/Downloads/mik_tutorial_data"
temp_directory = "/tmp/csv_to_json_temp"
file_name_field = File

[WRITER]
class = CsvSingleFileJson
output_directory = "/tmp/csv_to_json_output"
preserve_content_filenames = true

[MANIPULATORS]
metadatamanipulators[] = "SplitRepeatedValuesInJson|Subjects|;"

[LOGGING]
path_to_log = "/tmp/csv_to_json_output/mik.log"
path_to_manipulator_log= "/tmp/csv_to_json_output/manipulator.log"

Question: should we add the writer, metadata parser, and sample metadata manipulator to their respective class file directories, and just document that they are for illustration purposes only? Maybe put the sample .ini file in the extras/samples directory?

Also, I was thinkig about adding an "Extending MIK" section to the cookbook, and writing a walk-through tutorial on how this new toolchain works. What do you think?

mjordan commented 8 years ago

I've pushed up the issue-260 branch just to get it off my laptop. We can work out how we want to organize this sample toolchain later.

MarcusBarnes commented 8 years ago

@mjordan Very cool. Documenting that the parts of the toolchain are for illustration purposes only at this time is sufficient. Also, adding sections on "Extending MIK" in the cookbook would be useful, even if we only provide stubs at this time. It will help inform potential users of the flexibility of MIK as both a migration tool, but also a tool that can be used as part of digital preservation workflows. In the project README where it lists the current status, we can consider indicating which toolchains have been used in production scenarios.

mjordan commented 8 years ago

Excellent. There's a bit more cleanup I want to do, and maybe add some more inline comments to explain what's going on, but I'll do those things over the weekend. Will also update the wiki as discussed.

mjordan commented 8 years ago

@MarcusBarnes one more question: when you say:

Documenting that the parts of the toolchain are for illustration purposes only at this time is sufficient.

Do you mean that you'd rather not see these files mingled with the other (production) classes, and to put them all under a directory in extras/sample? Or mix them with the production classes and also make sure we document that they are samples only?

MarcusBarnes commented 8 years ago

We can add them in src/ and make sure we document that they are currently only samples.

mjordan commented 8 years ago

OK, will finish off/merge and document this weekend, then close this issue.