MarcusBarnes / mik

The Move to Islandora Kit is an extensible PHP command-line tool for converting source content and metadata into packages suitable for importing into Islandora (or other digital repository and preservations systems).
GNU General Public License v3.0
34 stars 11 forks source link

Better oai #509

Closed bondjimbond closed 4 years ago

bondjimbond commented 4 years ago

Github issue: (#508 )

What does this Pull Request do?

In OAI-PMH fetcher, replaces : with _ in the $identifier so that, when identifiers have colons in their names, they are converted to underscores when writing the temporary .metadata files instead of encoded nonsense. This makes the .metadata filenames match with the $record_key filenames used later on, and prevent errors (trying to look up my_record_identifier.metadata when the actual file is my%3Arecord%3Aidentifier.metadata).

Interested parties

@mjordan

mjordan commented 4 years ago

@bondjimbond this works. Files in my temp directory are no longer encoded: oai_digital.lib.sfu.ca_hiv_28.metadata.

I'm getting an error which I think is unrelated:

Commencing MIK.
PHP Fatal error:  Uncaught mik\exceptions\MikErrorException in /home/mark/Documents/hacking/mik/mik:105
Stack trace:
#0 [internal function]: {closure}(2, 'count(): Parame...', '/home/mark/Docu...', 98, Array)
#1 /home/mark/Documents/hacking/mik/src/fetchers/Oaipmh.php(98): count(NULL)
#2 /home/mark/Documents/hacking/mik/mik(182): mik\fetchers\Oaipmh->getRecords(NULL)
#3 {main}
  thrown in /home/mark/Documents/hacking/mik/mik on line 105

I'm willing to merge this as is.

mjordan commented 4 years ago

Found problem I saw when I tested this PR. Do you have any fetcher manipulators in your .ini file? I didn't, so https://github.com/MarcusBarnes/mik/blob/master/src/fetchers/Oaipmh.php#L95 failed because https://github.com/MarcusBarnes/mik/blob/master/src/fetchers/Oaipmh.php#L53 sets the default to null. By setting the default to array() line 95 doesn't fail.

If you didn't have any fetchermanipulators registered in your .ini file and it worked, there's something strange going on.

bondjimbond commented 4 years ago

No manipulators.

; MIK configuration file for an OAI-PMH toolchain.

[CONFIG]
config_id = KORA migration
last_updated_on = "2016-11-23"
last_update_by = "bw"

[SYSTEM]
date_default_timezone = 'America/Vancouver'
verify_ca = 0

[FETCHER]
class = Oaipmh
oai_endpoint = "http://thisvancouver.vpl.ca/oai2"
metadata_prefix = mods
;set_spec = islandora_artray_collection
;set_spec = islandora_capturing_vancouver_collection
;set_spec = islandora_carnegie_stories
;set_spec = islandora_chinatown_stories
;set_spec = islandora_cpr_collection
;set_spec = islandora_expo_retro_collection
;set_spec = islandora_hellenic_stories
;set_spec = islandora_hollywood_north_collection
;set_spec = islandora_leonardfrank_collection
;set_spec = islandora_madelinegunterman_collection
;set_spec = islandora_morph_collection
;set_spec = islandora_philip_timms
;set_spec = islandora_remembering_our_dtes_women_collection
;set_spec = islandora_sp_basic_image_collection
;set_spec = islandora_sp_large_image_collection
;set_spec = islandora_story_city_collection
;set_spec = islandora_vancouver_special_collection
;set_spec = islandora_west_end_stories
;set_spec = islandora_what_changes_you_stories

temp_directory = "/Volumes/Arca/tmp/oaitest_temp"

[METADATA_PARSER]
class = mods\OaiToMods

[FILE_GETTER]
class = OaipmhModsXpath
xpath_expression = "//mods:location/mods:url"
temp_directory = "/Volumes/Arca/tmp/oaitest_temp"

[WRITER]
class = Oaipmh
output_directory = "/Volumes/Arca/vpl/root"

[LOGGING]
path_to_log = "/Volumes/Arca/tmp/oaitest_output/mik.log"
path_to_manipulator_log = "/Volumes/Arca/tmp/oaitest_output/manipulator.log"
bondjimbond commented 4 years ago

Oh, I did comment out the subdirectory lines in src/writers/Oaipmh.php... It was creating empty subdirectories, which is clearly undesirable. (I'm thinking that the subdirectory functionality needs refining or rethinking...) But it wasn't causing errors, just an annoyance.

bondjimbond commented 4 years ago

@mjordan @MarcusBarnes Any further tests of this?

bondjimbond commented 4 years ago

@mjordan @MarcusBarnes Just a reminder about this PR. I have been using it successfully to process records from VPL. Any issues with it?

mjordan commented 4 years ago

I haven't looked at it since, but have we figured out what I was experiencing in https://github.com/MarcusBarnes/mik/pull/509#issuecomment-598531169 ?

bondjimbond commented 4 years ago

@mjordan Sorry, I missed the meat of your comment:

If you didn't have any fetchermanipulators registered in your .ini file and it worked, there's something strange going on.

That's the [MANIPULATORS] section?

If so, I do have them defined; will check to see if changing default to array() still works. I imagine it should.

Will report back in a bit.

bondjimbond commented 4 years ago

Any reason still not to merge this?

MarcusBarnes commented 4 years ago

As @bondjimbond addressed the concern in https://github.com/MarcusBarnes/mik/pull/509#issuecomment-616621844 in the last commit, I'll merge now.