MarcusBarnes / mik

The Move to Islandora Kit is an extensible PHP command-line tool for converting source content and metadata into packages suitable for importing into Islandora (or other digital repository and preservations systems).
GNU General Public License v3.0
34 stars 11 forks source link

Add a "filesystem" fetcher #408

Closed mjordan closed 7 years ago

mjordan commented 7 years ago

Related to #403, and for use in automating ingesting content into Islandora. In some cases, there will be no input file (CSV or Excel), just files in a directory. The "filesystem" fetcher will look for each file in the input directory and use the filename as the record key and title. Like the Excel fetcher in #403, this would work with Csv toolchain components. In effect, it would produce a run-time "CSV" data structure that had two fields, a record key and a title.

mjordan commented 7 years ago

Draft cookbook entry is at https://github.com/MarcusBarnes/mik/wiki/Cookbook:-Using-the-Filesystem-fetcher.

whikloj commented 7 years ago

@mjordan so I was just making a FileFetcher. Because our TEI files are just XML in a directory structure. I was looking at configuration options like

[FETCHER]
class = FileFetcher
; Source directory
source_directory = "/Users/whikloj/Desktop/manitobia/TEI_example/maps/ALR"
; Match file names, must be valid regular expression.
source_file_regex = "\.xml$"
; Recurse to subdirectories, defaults to false
recurse_directories = false

I can try to keep it fairly agnostic as long as we don't need to open the files in the Fetcher and just return the list of them?

EDIT: Adding some information to config.

mjordan commented 7 years ago

@whikloj if all you want to do is jam all your TEI files in a single directory and not provide an accompanying CSV input file listing them, the "Filesystem" fetcher in the PR might do the trick. However, for each of your TEI files, MIK's output will be your .xml files plus stub MODS XML files... both using the same filename and extension... which will mean that the MODS file will overwrite the TEI file. Also, the standard Islandora Batch module can't handle OBJ datasreams that end in .xml for the same reason.

Before you write a custom MIK fetcher, have you seen https://github.com/mjordan/islandora_solution_pack_xml/tree/7.x/modules/islandora_simple_xml_batch ? It will let you batch ingest OBJs ending in .xml and accompanying MODS/DC files ending in .xml. The objects don't need to be managed by the Simple XML SP to use this batch ingest module, contrary to what the README says.

mjordan commented 7 years ago

@whikloj but to answer your question, the fetcher doesn't need to open the files, it just creates a list of "records" for them. MIK copies the file content from the input directory to the output director[y|ies].

whikloj commented 7 years ago

Our files are XML with metadata and an associated JPG/TIFF. However the XML and images are in the same directory, so I'll continue with my filtering fetcher. Thanks.

mjordan commented 7 years ago

Some examples that you might find useful are https://github.com/MarcusBarnes/mik/blob/issue-408/src/fetchers/Filesystem.php (not merged into master yet) and https://github.com/MarcusBarnes/mik/blob/master/src/fetchers/Oaipmh.php.

MarcusBarnes commented 7 years ago

Initially addressed in pull-request https://github.com/MarcusBarnes/mik/pull/409 (merged with commit https://github.com/MarcusBarnes/mik/commit/919629498e928c9718bd3d0b73ab8bfeabd05552).