MarcusBarnes / mik

The Move to Islandora Kit is an extensible PHP command-line tool for converting source content and metadata into packages suitable for importing into Islandora (or other digital repository and preservations systems).
GNU General Public License v3.0
34 stars 11 forks source link

OAI: If item identifier has special characters, temp metadata filename doesn't match filegetter #508

Open bondjimbond opened 4 years ago

bondjimbond commented 4 years ago

I'm doing an OAI migration, and running into problems in src/fetchers/Oaipmh.php.

The Fetcher assumes that the $identifier and $record_key are the same value, but they aren't necessarily.

If the item identifier contains special characters (e.g. oai:thisvancouver.vpl.ca:islandora_1910), MIK treats it differently in different contexts.

When writing the temporary metadata files: https://github.com/MarcusBarnes/mik/blob/master/src/fetchers/Oaipmh.php#L80-L82

Resulting filename: oai%3Athisvancouver.vpl.ca%3Aislandora_1910.metadata.

But the $record_key that is used everywhere else in the code looks like this: oai_thisvancouver.vpl.ca_islandora_1910.

So you end up with problems like this:

ErrorException.ERROR: ErrorException {"message":"file_get_contents(/Volumes/Arca/tmp/oaitest_temp/oai_thisvancouver.vpl.ca_islandora_410.metadata): failed to open stream: No such file or directory","code":{"record_key":"oai_thisvancouver.vpl.ca_islandora_1910","raw_metadata_path":"/Volumes/Arca/tmp/oaitest_temp/oai_thisvancouver.vpl.ca_islandora_1910.metadata","dom":"[object] (DOMDocument: {})"},"severity":2,"file":"/Users/brandon/sfuvault/mik/src/filegetters/OaipmhModsXpath.php","line":56} []

Because the filegetter is looking for $record_key.metadata, while the actual filename is $identifier.metadata. So it can't actually find the file.

So... how the heck do we fix this?

bondjimbond commented 4 years ago

Trying to find where $record_key is first defined.

mjordan commented 4 years ago

I've never liked the fact that the OAI identifiers are so ugly and complex. There is a spec for OAI-PMH identifiers, that defines identifiers using the pattern oai-identifier = scheme ":" namespace-identifier ":" local-identifier. (Note that "namespace" here is not related to Fedora namespaces, it identifies the source OAI repository.) We could, in all places in the MIK OAI code, strip out everything but the "local identifier" part and use that as both the filename and the record key. That would at least give us less rope to hang ourselves with since the filename/record key would be a lot shorter than it is now.

But there is a problem with this: the OAI identifier spec uses : to separate the OAI-specific bits out from the local identifier... which in the case of Islandora source repos is the PID, which itself contains a :.

Maybe a general way to approach this is to modify MIK to strip out everything before and after the local identifier part and then to replace any : with an underscore. If this is done with a central function, we'd just call that function where ever MIK creates or needs to predict an identifier for an object.

bondjimbond commented 4 years ago

That sounds reasonable to me. Where are you thinking of doing this, and what would the function be?

For a quick and dirty patch, I'm thinking the convert-to-underscore would have to happen here: https://github.com/MarcusBarnes/mik/blob/master/src/fetchers/Oaipmh.php#L80-L82

That might just do the job... What do you think?

bondjimbond commented 4 years ago

OK, I've made a change. In that section:

                $identifier = ($rec->header->identifier);
                $identifier = json_decode(json_encode($identifier), 1)[0];
                $identifier = urlencode(str_replace(':', '_', $identifier));

This seems to work; I'm getting files! Unfortunately, the files are not being written to the directories that are created... Weird.