MarcusBarnes / mik

The Move to Islandora Kit is an extensible PHP command-line tool for converting source content and metadata into packages suitable for importing into Islandora (or other digital repository and preservations systems).
GNU General Public License v3.0
34 stars 11 forks source link

Paged content from CONTENTdm has incorrect DSID for JPEG files #452

Closed mjordan closed 6 years ago

mjordan commented 6 years ago

Islandora objects that have the content models islandora:pageCModel and islandora:newspaperPageCModel, i.e., book and newspaper pages, use a datastream ID of 'JPG'. MIK's CONTENTdm toolchains write out a file with the name 'JPEG.jpg' (see https://github.com/MarcusBarnes/mik/blob/master/src/writers/CdmNewspapers.php#L209 and https://github.com/MarcusBarnes/mik/blob/master/src/writers/CdmBooks.php#L220). This filename determines the DSID used in Islandora if the content was batch loaded using islandora_book_batch or islandora_newspaper_batch. Therefore, paged content generated by MIK's CONTENTdm toolchains may have DSIDs that are inconsistent with the paged content composite model, depending no how the paged content was loaded into Islandora.

This bug does not affect other toolchains.

This bug is easy enough to fix, but in the interests of consistency and portability during future migrations, Islandora instances that have paged content with the improperly named datastreams should probably replace the JPEGs with JPGs. SFU's site have had these incorrectly named datastreams for over two years with no visible side effects, but we'll be offering a solution for fixing the problem.

mjordan commented 6 years ago

Looks like the bug is very localized and the fix easy:

mik/src$ grep -r 'JPEG\.' *
writers/CdmNewspapers.php:                $jpg_output_file_path = $page_dir . DIRECTORY_SEPARATOR . 'JPEG.jpg';
writers/CdmBooks.php:                $jpg_output_file_path = $page_dir . DIRECTORY_SEPARATOR . 'JPEG.jpg';
mjordan commented 6 years ago

Fix has been merged.