MarcusBarnes / mik

The Move to Islandora Kit is an extensible PHP command-line tool for converting source content and metadata into packages suitable for importing into Islandora (or other digital repository and preservations systems).
GNU General Public License v3.0
34 stars 11 forks source link

Csv Book Toolchain #77

Closed MarcusBarnes closed 7 years ago

MarcusBarnes commented 8 years ago

An MIK toolchain for migrating books (monographs): create a CSV book filegetter/writer.

The input that Islandora Book Batch expects is documented at https://github.com/Islandora/islandora_book_batch.

N.B: We may need to flatten hierarchical source books for importing into Islandora since Islandora's Book Solution Pack currently only supports flat books.

mjordan commented 7 years ago

Initial work on a CSV Books toolchain. Input looks like:

csvbookstestinput/
├── book1
│   ├── page-01.tif
│   ├── page-02.tif
│   ├── page-03.tif
│   ├── page-04.tif
│   ├── page-05.tif
│   ├── page-06.tif
│   ├── page-07.tif
│   └── page-08.tif
└── book2
    ├── 1884-01-24-01.tif
    ├── 1884-01-24-02.tif
    ├── 1884-01-24-03.tif
    ├── 1884-01-24-04.tif
    ├── 1884-01-24-05.tif
    ├── 1884-01-24-06.tif
    ├── 1884-01-24-07.tif
    └── 1884-01-24-08.tif

.ini file looks like:

; MIK configuration file for generating Islandora book ingest packages
; from a CSV metadata file and locally stored TIFFs.

[SYSTEM]

[CONFIG]
config_id = CSVBookssTest
last_updated_on = "2016-11-01"
last_update_by = "mjordan@example.com"

[FETCHER]
class = Csv
input_file = '/home/mark/Downloads/csvbookstest.csv'
temp_directory = "/tmp/csv_books_temp"
record_key = Identifier

[METADATA_PARSER]
class = mods\CsvToMods
mapping_csv_path = 'csv_books_test_mappings.csv'
temp_directory = "/tmp/csv_books_temp"

[FILE_GETTER]
class = CsvBooks
temp_directory = "/tmp/csv_books_temp"
input_directory = "/home/mark/Downloads/csvbookstestinput"
file_name_field = Directory

[WRITER]
class = CsvBooks
output_directory = "/tmp/csv_books_output"
metadata_filename = MODS.xml
datastreams[] = OBJ
datastreams[] = MODS

[MANIPULATORS]
metadatamanipulators[] = "FilterModsTopic|subject"

[LOGGING]
path_to_log = "/tmp/csv_books_output/mik.log"
path_to_manipultor_log = "/tmp/csv_books_output/mik_manipulator.log"

Output looks like:

csv_books_output/
├── B0001
│   ├── 1
│   │   ├── MODS.xml
│   │   └── OBJ.tif
│   ├── 2
│   │   ├── MODS.xml
│   │   └── OBJ.tif
│   ├── 3
│   │   ├── MODS.xml
│   │   └── OBJ.tif
│   ├── 4
│   │   ├── MODS.xml
│   │   └── OBJ.tif
│   ├── 5
│   │   ├── MODS.xml
│   │   └── OBJ.tif
│   ├── 6
│   │   ├── MODS.xml
│   │   └── OBJ.tif
│   ├── 7
│   │   ├── MODS.xml
│   │   └── OBJ.tif
│   ├── 8
│   │   ├── MODS.xml
│   │   └── OBJ.tif
│   └── MODS.xml
├── B0002
│   ├── 1
│   │   ├── MODS.xml
│   │   └── OBJ.tif
│   ├── 2
│   │   ├── MODS.xml
│   │   └── OBJ.tif
│   ├── 3
│   │   ├── MODS.xml
│   │   └── OBJ.tif
│   ├── 4
│   │   ├── MODS.xml
│   │   └── OBJ.tif
│   ├── 5
│   │   ├── MODS.xml
│   │   └── OBJ.tif
│   ├── 6
│   │   ├── MODS.xml
│   │   └── OBJ.tif
│   ├── 7
│   │   ├── MODS.xml
│   │   └── OBJ.tif
│   ├── 8
│   │   ├── MODS.xml
│   │   └── OBJ.tif
│   └── MODS.xml
├── mik.log
└── problem_records.log
mjordan commented 7 years ago

@MarcusBarnes mind if we merge this into master fairly soon? I'd like to continue with work on #238, which depends on this branch. If you want to test I can send you some input data.

MarcusBarnes commented 7 years ago

@mjordan Yes - please send me some input data so I can test. Thank you.

mjordan commented 7 years ago

Zip file containing input data plus .ini is at https://vault.sfu.ca/index.php/s/ZXUoZN17cUENqiN. You'll need to run composer dump-autoload. Thanks!

MarcusBarnes commented 7 years ago

@mjordan Worked as expected. Please create a pull-request and then I'll merge. Thanks.

mjordan commented 7 years ago

Great, thanks for testing - here's the PR: https://github.com/MarcusBarnes/mik/pull/278

I'll document the toolchain (our last remaining major one to complete!) and update README.md over the weekend.

MarcusBarnes commented 7 years ago

Closed with pull-request https://github.com/MarcusBarnes/mik/pull/278 (commit https://github.com/MarcusBarnes/mik/commit/503033cff290e707921bf7e8c27f59ac12578ede). Thank you @mjordan for your contribution.