MarcusBarnes / mik

The Move to Islandora Kit is an extensible PHP command-line tool for converting source content and metadata into packages suitable for importing into Islandora (or other digital repository and preservations systems).
GNU General Public License v3.0
34 stars 10 forks source link

OAI toolchain should provide option to write out ingest packages organized by set #248

Open mjordan opened 8 years ago

mjordan commented 8 years ago

Many OAI-PMH repositories such as Digital Commons (and Islandora itself) express collection membership using OAI sets. The OAI-PMH protocol, however, doesn't provide a way to determine an object's set membership; in other words, once you have exported migrated an object out of the repository, you can't determine which OAI set (or collection) it was a member of.

The OAI toolchain should provide an option to respect the set structure of the source repository. This would allow for migrations that retain collection membership.

Perhaps we can implement this using a fetcher manipulator?

mjordan commented 8 years ago

Looking at this a bit more, we might be able to implement "writer manipulators" following the pattern we have already established, which would, in this use case, modify the output path that a set of ingest files is written to. In this case, the output path would be based on the set the record was a member of. We'd need to have some way to relate each object to a set and corresponding directory, but a helper script run prior to the MIK job could issue a list sets request to the OAI provider, and then loop through each set and write out the identifiers for each object and associate them with the current set in a set registry (basically a text file). The writer manipulator could then refer to this list and modify the current object's output path using its entry in the set registry.

The writer manipulator's entry in the .ini file would look like writermanipulators[] = "OaiSetMembership|/tmp/set_registry.txt"

where the parameter is the path to the set registry generated by the helper script.

mjordan commented 8 years ago

Then again, a much simpler approach would be to not introduce writer manipulators but to have the helper script offer an option that organized the harvested content into set-based subdirectories after the fact. This is probably the preferable approach until there are additional use cases for writer manipulators.

mjordan commented 8 years ago

I was wrong about determining a record's set membership. Its setSpecs are included in both GetRecord requests and ListRecord requests.

mjordan commented 8 years ago

Working through a use case with someone performing a migration from an OAI repository, please stay tuned.....

mjordan commented 7 years ago

Potentially related issue: #338.