Closed mjordan closed 7 years ago
PHPUnit tests passed as described.
The smoke test worked as expected. Beautiful work. Thank you @mjordan.
Awesome, thanks for testing @MarcusBarnes. I'll update the CSV Newspaper toolchain wiki page.
Can I close #316?
Github issue: (#383, #316)
What does this Pull Request do?
Optionally, makes MIK copy OCR (.txt) files into page-level directories in the CSV Newspapers toolchain.
This PR also includes some cleanup of the CSV newspaper writer class as describe in #316.
What's new?
If the input directory for jobs using the MIK CSV Newspapers toolchain contain .txt files corresponding to the page master images, like this:
the .txt files will be copied into the newspaper page-level Islandora ingest packages, like this:
Because this feature introduces a new optional entry in the
[WRITER]datastreams[]
list, we need to provide a new configuration option to indicate that MIK should log missing OCR files when the datastreams list is empty:In addition, when
[WRITER]log_missing_ocr_files
is TRUE, the CSV Newspapers input validtor checks for the existence of the .txt files and if any are not found, logs an input validation error.This PR includes PHPUnit tests for both the CSV Newspapers toolchain and for the CSV Newspapers input validator.
How should this be tested?
Run PHPUnit tests:
should result in "(46 tests, 66 assertions)"
should result in "(4 tests, 17 assertions)"
Smoketest:
Using configuration and data files in the attached .zip, do the following:
./mik -c issue-383/issue-383.ini -cc all
and then if there are no problems,./mik -c issue-383/issue-383.ini
Your output directory should look like this:
MIK only generated two packages because one (TT001) has a missing .txt file. You can verify this by looking at the input validator and problem records log file.
To test that this new feature has no side effects in jobs that do not include page-level .txt files, change the input directory to "issue-383/files_no_text" and comment out the
[WRITER]log_missing_ocr_files
config option.issue-383.zip