MarcusBarnes / mik

The Move to Islandora Kit is an extensible PHP command-line tool for converting source content and metadata into packages suitable for importing into Islandora (or other digital repository and preservations systems).
GNU General Public License v3.0
34 stars 11 forks source link

Use Csv parsing library in CdmToMods class #28

Closed MarcusBarnes closed 9 years ago

MarcusBarnes commented 9 years ago

Use the League\Csv parsing library in the getMappingsArray method of the metadataparsers/mods/CdmToMods.php class. The League\Csv parsing library is already in use in the CSV fetcher. Initial testing by @mjordan indicates that the library handles CSV files exported from various spreadsheet programs more robustly than the current code in the getMappingsArray method.

MarcusBarnes commented 9 years ago

Basic usage of League\Csv added in commit 051309ceb389f0eb23ac080f3336c73466bded47.

mjordan commented 9 years ago

Cool! Will test this evening.

MarcusBarnes commented 9 years ago

Thank you. It will be interesting to see if even the basic usage of League\Csv has results as robust as your tests. Please note that you should use the updated mappings structure that excludes the "language of field" column. Please see this comment for details: https://github.com/MarcusBarnes/mik/issues/7#issuecomment-110508196.

mjordan commented 9 years ago

Nice work @MarcusBarnes. Using this mappings file (exported from Google Sheets):

Calendar name,<titleInfo><title>%value%</title></titleInfo>,
School name,"<name type=""corporate""><namePart>%value%</namePart></name>",
Medium,<physicalDescription><form>%value%</form></physicalDescription>,
Work Measurements,<physicalDescription><note>%value%</note></physicalDescription>,
Publisher,<originInfo><publisher>%value%</publisher></originInfo>,
Year,<originInfo><dateIssued>%value%</dateIssued></originInfo>,
Format type,<genre>%value%</genre>,
President,"<note type=""president"">%value%</note>",
Board members,"<note type=""board members"">%value%</note>",
Administrators,"<note type=""administrators"">%value%</note>",
Instructors,"<note type=""instructors"">%value%</note>",
"Staff(technicians,support staff)","<note type=""staff"">%value%</note>",
Degree/Diplomas/Programs,"<note type=""degree/diplomas/programs"">%value%</note>",
Majors/Concentration,"<note type=""majors/concentration"">%value%</note>",
Honorary Degree Recipients,"<note type=""honorary degree recipients"">%value%</note>",
Scholarships/Awards Recipients,"<note type=""scholarship/award recipients"">%value%</note>",
Notes,<note>%value%</note>,

and this .ini file:

[FETCHER]
class = Cdm
alias = ecucals
ws_url = "http://content.lib.sfu.ca:81/dmwebservices/index.php?q="
record_key = pointer

[METADATA_PARSER]
class = mods\CdmToMods
alias = ecucals
ws_url = "http://content.lib.sfu.ca:81/dmwebservices/index.php?q="
; Path to the csv file that contains the CONTENTdm to MODS mappings.
mapping_csv_path = 'ecu_calendars_mapping.csv'
; Include the migrated from uri into your generated metadata (e.g., MODS)
include_migrated_from_uri = TRUE

[FILE_GETTER]
class = CdmPhpDocuments
input_directory = "/tmp/mik_input"
alias = ecucals
ws_url = "http://content.lib.sfu.ca:81/dmwebservices/index.php?q="
utils_url = "http://content.lib.sfu.ca/utils/"

[WRITER]
class = CdmPhpDocuments
alias = ecucals
output_directory = "/tmp/mik_output"
metadata_filename = 'MODS.xml'

[MANIPULATORS]

[LOGGING]
; full path to log file for mik log files
path_to_log = "/tmp/mik.log"

I was able to create MODS files that implemented the mappings. There was one error, which I wasn't able to track down:

./mik --config=mjpdf.ini
Getting file information.
Please be patient as this may take some time.
The mik\metadataparsers\mods\CdmToMods class been loaded for CONTENTdm record 3223.
PHP Notice:  Undefined offset: 1 in /home/mark/Documents/hacking/mik/src/metadataparsers/mods/CdmToMods.php on line 130
The metatdata file for record 3223 has been created.
Exporting metadata file.
The mik\metadataparsers\mods\CdmToMods class been loaded for CONTENTdm record 3235.
PHP Notice:  Undefined offset: 1 in /home/mark/Documents/hacking/mik/src/metadataparsers/mods/CdmToMods.php on line 130
The metatdata file for record 3235 has been created.
Exporting metadata file.
The mik\metadataparsers\mods\CdmToMods class been loaded for CONTENTdm record 3252.
PHP Notice:  Undefined offset: 1 in /home/mark/Documents/hacking/mik/src/metadataparsers/mods/CdmToMods.php on line 130
The metatdata file for record 3252 has been created.
Exporting metadata file.
[...]

I'll leave another comment on unrelated issue (nested mappings).