MarcusBarnes / mik

The Move to Islandora Kit is an extensible PHP command-line tool for converting source content and metadata into packages suitable for importing into Islandora (or other digital repository and preservations systems).
GNU General Public License v3.0
34 stars 11 forks source link

CDM toolchains extracting wrong files due to repeating ID numbers #490

Closed bondjimbond closed 5 years ago

bondjimbond commented 5 years ago

Looks like the CONTENTdm toolchains expect the CONTETNdm Numbers to be unique, but they aren't in all cases. So running MIK against a given collection can potentially extract objects from a different collection instead of the ones in the target collection.

Examples: http://digicon.athabascau.ca/cdm/singleitem/collection/MK/id/36/rec/1 http://digicon.athabascau.ca/cdm/singleitem/collection/seniors/id/36/rec/1

The two above have different collection nicknames, but the same ID numbers. So running CONTENTdm against the "MK" collection ends up extracting object number 36 from the "seniors" collection.

Is there a way around this? Or is it a bug? Perhaps the toolchain needs to use collection+pointer as the record key instead of just the pointer?

bondjimbond commented 5 years ago

config.ini.txt Config file attached.

mjordan commented 5 years ago

In any given job, MIK does not know about items that do not have the configured collection alias. Are you actually seeing the metadata and binary files from the seniors item being retrieved when the data from the MK item should be?

I'm not sure, but it might be possible that if you are seeing the same metadata for two objects that have the same number, make sure that your "temp" directory is deleted before you run MIK. MIK caches metadata in that directory. But if you are deleting it every time between MIK jobs, there should not be any crossover between collections as you are describing.

MarcusBarnes commented 5 years ago

If this ends up being related to temp files, I would suggest adding the delete temp files shutdown hook script within your configuration files.

mjordan commented 5 years ago

Right, nice catch @MarcusBarnes.

bondjimbond commented 5 years ago

@mjordan Yes, the person who reported (not myself) reports that they were coming from two different collections. But of course I neglected to check whether he was deleting his temp files. Thanks for that - I'll make sure he's doing that and see whether the issue remains.

bondjimbond commented 5 years ago

Turns out it was indeed the temp files. Sorry about that.