Closed mjordan closed 8 years ago
@MarcusBarnes I've got a CSV toolchain that produces JSON metadata files instead of XML. Its output looks like:
csv_to_json_output/
├── IMG_1410.JPG
├── IMG_1410.json
├── IMG_2549.JPG
├── IMG_2549.json
├── IMG_2940.JPG
├── IMG_2940.json
├── IMG_2958.JPG
├── IMG_2958.json
├── IMG_5083.JPG
├── IMG_5083.json
├── manipulator.log
├── mik.log
└── problem_records.log
An example JSON metadata file is:
{
"Identifier":"image05",
"File":"IMG_5083.JPG",
"Title":"Alcatraz Island",
"Creator":"Jordan, Mark",
"Date taken":"2014-01-14",
"Subjects":[
"Alcatraz Federal Penitentiary",
"islands"
],
"Note":"Taken from Fisherman's Wharf, San Francisco.",
"key":"image05"
}
A typical .ini file looks like:
; MIK configuration file for the CSV to JSON toolchain.
: This toolchain is intended to illustrate how to extend MIK to create
; output that differs from Islandora ingest packages. In this case, the
: metadata files are in serialized JSON format, not XML.
; We're able to reuse the CSV fetcher and file getter classes. Yay!
[SYSTEM]
[CONFIG]
config_id = MIK CSV to JSON test
last_updated_on = "2016-10-27"
last_update_by = "Mark Jordan"
[FETCHER]
class = Csv
input_file = "tutorial_metadata.csv"
temp_directory = "/tmp/csv_to_json_temp"
record_key = Identifier
[METADATA_PARSER]
class = json\CsvToJson
; No mappings file; CSV column headings are used as the keys in the JSON.
[FILE_GETTER]
class = CsvSingleFile
input_directory = "/home/mark/Downloads/mik_tutorial_data"
temp_directory = "/tmp/csv_to_json_temp"
file_name_field = File
[WRITER]
class = CsvSingleFileJson
output_directory = "/tmp/csv_to_json_output"
preserve_content_filenames = true
[MANIPULATORS]
metadatamanipulators[] = "SplitRepeatedValuesInJson|Subjects|;"
[LOGGING]
path_to_log = "/tmp/csv_to_json_output/mik.log"
path_to_manipulator_log= "/tmp/csv_to_json_output/manipulator.log"
Question: should we add the writer, metadata parser, and sample metadata manipulator to their respective class file directories, and just document that they are for illustration purposes only? Maybe put the sample .ini file in the extras/samples directory?
Also, I was thinkig about adding an "Extending MIK" section to the cookbook, and writing a walk-through tutorial on how this new toolchain works. What do you think?
I've pushed up the issue-260 branch just to get it off my laptop. We can work out how we want to organize this sample toolchain later.
@mjordan Very cool. Documenting that the parts of the toolchain are for illustration purposes only at this time is sufficient. Also, adding sections on "Extending MIK" in the cookbook would be useful, even if we only provide stubs at this time. It will help inform potential users of the flexibility of MIK as both a migration tool, but also a tool that can be used as part of digital preservation workflows. In the project README where it lists the current status, we can consider indicating which toolchains have been used in production scenarios.
Excellent. There's a bit more cleanup I want to do, and maybe add some more inline comments to explain what's going on, but I'll do those things over the weekend. Will also update the wiki as discussed.
@MarcusBarnes one more question: when you say:
Documenting that the parts of the toolchain are for illustration purposes only at this time is sufficient.
Do you mean that you'd rather not see these files mingled with the other (production) classes, and to put them all under a directory in extras/sample? Or mix them with the production classes and also make sure we document that they are samples only?
We can add them in src/ and make sure we document that they are currently only samples.
OK, will finish off/merge and document this weekend, then close this issue.
For example, where the metadata is in serialized JSON instead of MODS XML.