emory-libraries / dlp-curate

Digital curation and preservation workbench for the Emory Preservation Repository.
11 stars 4 forks source link

Run preprocessor on Office of Alumni Pubs photos boxes 6 and 7 csv #1612

Closed kmichaelis closed 3 years ago

kmichaelis commented 3 years ago

Please run the preprocessor on the csv for files in boxes 6 and 7 of the Office of Alumni Publications Photographs.

CSV file

This can be run in the Langmuir mode.

See the Langmuir notes in the Rake Tasks Tutorial.

When finished, please save the CSV in the FY21 processed folder.

bwatson78 commented 3 years ago

@kmichaelis The preprocessor erred out when expecting metadata here:

{:metadata=>nil, :filesets=>{2=>#<CSV::Row "source_row":38 "deduplication_key":"EUA0179_B006_F001_I010" "type":"fileset" "fileset_label":nil "preservation_master_file":"dmfiles/MARBL/Archives/EUA_0179_AlumniPubs/B006/F001/ARCH/EUA0179_B006_F001_I010_P002_ARCH.tif" "intermediate_file":"dmfiles/MARBL/Archives/EUA_0179_AlumniPubs/B006/F001/PROD/EUA0179_B006_F001_I010_P002_PROD.tif" "other_identifiers":nil "abstract":nil "administrative_unit":nil "local_call_number":nil "creator":nil "date_created":nil "Desc - Date Created - Date Precision":nil "date_issued":nil "content_genres":nil "holding_repository":nil "institution":nil "publisher":nil "emory_rights_statements":nil "rights_statement":nil "subject_names":nil "subject_geo":nil "subject_topics":nil "title":nil "content_type":nil "data_classifications":nil "Ingest.workflow_notes":nil "Digital Object - Parent Identifier":nil "visibility":nil "Directory Path":nil "File Size":nil "Filename":nil "Path":nil "Ingest.workflow_rights_basis":nil "Ingest.workflow_rights_basis_date":nil "Ingest.workflow_rights_basis_note":nil "Accession.workflow_rights_basis":nil "Accession.workflow_rights_basis_date":nil "Accession.workflow_rights_basis_reviewer":nil "Accession.workflow_rights_basis_note":nil "sensitive_material":nil "extent":nil "sublocation":nil "source_collection_id":nil>}}
kmichaelis commented 3 years ago

@bwatson78 looks like the same issue as the other csv, a part 2 with no part 1. I removed that work and replaced the csv in GitHub.

bwatson78 commented 3 years ago

@kmichaelis Another error, same issue:

{:metadata=>nil, :filesets=>{2=>#<CSV::Row "source_row":3346 "deduplication_key":"EUA0179_B007_F004_I0082" "type":"fileset" "fileset_label":nil "preservation_master_file":"dmfiles/MARBL/Archives/EUA_0179_AlumniPubs/B007/F004/ARCH/EUA0179_B007_F004_I0082_P002_ARCH.tif" "intermediate_file":"dmfiles/MARBL/Archives/EUA_0179_AlumniPubs/B007/F004/PROD/EUA0179_B007_F004_I0082_P002_PROD.tif" "other_identifiers":nil "abstract":nil "administrative_unit":nil "local_call_number":nil "creator":nil "date_created":nil "Desc - Date Created - Date Precision":nil "date_issued":nil "content_genres":nil "holding_repository":nil "institution":nil "publisher":nil "emory_rights_statements":nil "rights_statement":nil "subject_names":nil "subject_geo":nil "subject_topics":nil "title":nil "content_type":nil "data_classifications":nil "Ingest.workflow_notes":nil "Digital Object - Parent Identifier":nil "visibility":nil "Directory Path":nil "File Size":nil "Filename":nil "Path":nil "Ingest.workflow_rights_basis":nil "Ingest.workflow_rights_basis_date":nil "Ingest.workflow_rights_basis_note":nil "Accession.workflow_rights_basis":nil "Accession.workflow_rights_basis_date":nil "Accession.workflow_rights_basis_reviewer":nil "Accession.workflow_rights_basis_note":nil "sensitive_material":nil "extent":nil "sublocation":nil "source_collection_id":nil>}}
kmichaelis commented 3 years ago

@bwatson78 work removed, new file uploaded to GitHub.

bwatson78 commented 3 years ago

@kmichaelis Same thing: {:metadata=>nil, :filesets=>{2=>#<CSV::Row "source_row":3650 "deduplication_key":"EUA0179_B007_F006_I0020" "type":"fileset" "fileset_label":nil "preservation_master_file":"dmfiles/MARBL/Archives/EUA_0179_AlumniPubs/B007/F006/ARCH/EUA0179_B007_F006_I0020_P002_ARCH.tif" "intermediate_file":"dmfiles/MARBL/Archives/EUA_0179_AlumniPubs/B007/F006/PROD/EUA0179_B007_F006_I0020_P002_PROD.tif" "other_identifiers":nil "abstract":nil "administrative_unit":nil "local_call_number":nil "creator":nil "date_created":nil "Desc - Date Created - Date Precision":nil "date_issued":nil "content_genres":nil "holding_repository":nil "institution":nil "publisher":nil "emory_rights_statements":nil "rights_statement":nil "subject_names":nil "subject_geo":nil "subject_topics":nil "title":nil "content_type":nil "data_classifications":nil "Ingest.workflow_notes":nil "Digital Object - Parent Identifier":nil "visibility":nil "Directory Path":nil "File Size":nil "Filename":nil "Path":nil "Ingest.workflow_rights_basis":nil "Ingest.workflow_rights_basis_date":nil "Ingest.workflow_rights_basis_note":nil "Accession.workflow_rights_basis":nil "Accession.workflow_rights_basis_date":nil "Accession.workflow_rights_basis_reviewer":nil "Accession.workflow_rights_basis_note":nil "sensitive_material":nil "extent":nil "sublocation":nil "source_collection_id":nil>}}

kmichaelis commented 3 years ago

@bwatson78 removed work and replaced file. I checked the remainder of the csv and didn't see any other occurrences of this so hopefully this will be the last edit.

bwatson78 commented 3 years ago

@kmichaelis Processed file is in that Github folder.

kmichaelis commented 3 years ago

Thanks @bwatson78 !