FOLIO-FSE / folio_migration_tools

A Python module and CLI tool that transforms legacy ILS data into the native FOLIO formats and loads it into FOLIO
MIT License
11 stars 9 forks source link

Separate holdings records generate the same UUID #397

Closed jermnelson closed 10 months ago

jermnelson commented 2 years ago

In a recent migration, one of our analysts discovered an error involving the correct Holdings being mapped to its items. We are using the HoldingsCsvTransformer transformer (with HRID handling set to default using the default holdings_merge_criteria of "instanceId", "permanentLocationid", and "callNumber").

From our incoming tsv_file (the CATKEY is the legacy_code we match from the MARC 21 bib file):

CATKEY FORMAT CALL_NUMBER_TYPE BASE_CALL_NUMBER VOLUME_INFO BARCODE LIBRARY HOMELOCATION CURRENTLOCATION ITEM_TYPE
10435 a660592 SERIAL LC B3212 .Z7 A12 V.12 2006/2009 36105132152419 GREEN STACKS STACKS STKS-MONO
10436 a660592 SERIAL LCPER B3212 .Z7 A12 V.10-11 2002/03-2004/05 36105123807179 GREEN STACKS STACKS STKS-MONO
10437 a660592 SERIAL LCPER B3212 .Z7 A12 V.1-2 1988-1989 36105008356383 GREEN STACKS STACKS STKS-MONO
10438 a660592 SERIAL LCPER B3212 .Z7 A12 V.13 2015 36105222580610 GREEN STACKS STACKS STKS-MONO
10439 a660592 SERIAL LCPER B3212 .Z7 A12 V.14 2016 36105226756356 GREEN STACKS STACKS STKS-MONO
10440 a660592 SERIAL LCPER B3212 .Z7 A12 V.15:PT.1 2017 36105227374167 GREEN STACKS STACKS STKS-MONO
10441 a660592 SERIAL LCPER B3212 .Z7 A12 V.16 2018 36105228622564 GREEN STACKS STACKS STKS-MONO
10442 a660592 SERIAL LCPER B3212 .Z7 A12 V.17 2019/2020 36105234057490 GREEN STACKS STACKS STKS-MONO
10443 a660592 SERIAL LCPER B3212 .Z7 A12 V.3-4 1990-1993 36105016681236 GREEN STACKS STACKS STKS-MONO
10444 a660592 SERIAL LCPER B3212 .Z7 A12 V.5-6 1994-1996 36105020941410 GREEN STACKS STACKS STKS-MONO
10445 a660592 SERIAL LCPER B3212 .Z7 A12 V.7 1997 36105021595322 GREEN STACKS STACKS STKS-MONO
10446 a660592 SERIAL LCPER B3212 .Z7 A12 V.8-9 1998/1999-2000/2001 36105111533670 GREEN STACKS STACKS STKS-MONO
10447 a660592 SERIAL LC XX(660592.13) V.17 36105230133121 GREEN STACKS ON-ORDER STKS-MONO

The HoldingsCsvTransformer produces the following two Holdings records that we would expect with the default merge criteria:

{'id': '81d1c667-06c0-528c-ab7b-36685d09f7ab',
 'metadata': {'createdDate': '2022-10-17T21:55:36.683',
  'createdByUserId': '8cc3ab86-c943-4d53-8df7-b3dc64fb44ee',
  'updatedDate': '2022-10-17T21:55:36.683',
  'updatedByUserId': '8cc3ab86-c943-4d53-8df7-b3dc64fb44ee'},
 'formerIds': ['a660592'],
 'instanceId': '3191fabe-e494-5757-b9d1-3b98efa4cfab',
 'permanentLocationId': '4573e824-9273-4f13-972f-cff7bf504217',
 'callNumberTypeId': '95467209-6d7b-468b-94df-0f5d7ad2747d',
 'callNumber': 'B3212 .Z7 A12',
 'holdingsTypeId': '03c9c400-b9e3-4a07-ac0e-05ab470233ed',
 'sourceId': 'f32d531e-df79-46b3-8932-cdd35f7a2264',
 'notes': []}
{'id': '81d1c667-06c0-528c-ab7b-36685d09f7ab',
 'metadata': {'createdDate': '2022-10-17T21:55:36.685',
  'createdByUserId': '8cc3ab86-c943-4d53-8df7-b3dc64fb44ee',
  'updatedDate': '2022-10-17T21:55:36.685',
  'updatedByUserId': '8cc3ab86-c943-4d53-8df7-b3dc64fb44ee'},
 'formerIds': ['a660592'],
 'instanceId': '3191fabe-e494-5757-b9d1-3b98efa4cfab',
 'permanentLocationId': '4573e824-9273-4f13-972f-cff7bf504217',
 'callNumberTypeId': '95467209-6d7b-468b-94df-0f5d7ad2747d',
 'callNumber': 'XX(660592.13)',
 'holdingsTypeId': '03c9c400-b9e3-4a07-ac0e-05ab470233ed',
 'sourceId': 'f32d531e-df79-46b3-8932-cdd35f7a2264'}

However the UUID for both records are the same!

This is causing an issue later with items like the following:

{'id': 'c74453f8-bd80-58a8-a992-779411ebee1a',
 'metadata': {'createdDate': '2022-10-17T21:55:51.582',
  'createdByUserId': '8cc3ab86-c943-4d53-8df7-b3dc64fb44ee',
  'updatedDate': '2022-10-17T21:55:51.582',
  'updatedByUserId': '8cc3ab86-c943-4d53-8df7-b3dc64fb44ee'},
 'holdingsRecordId': '81d1c667-06c0-528c-ab7b-36685d09f7ab',
 'barcode': '36105021595322',
 'enumeration': 'V.7 1997',
 'status': {'name': 'Available', 'date': '2022-10-17T21:55:51.582422'},
 'materialTypeId': '1a54b431-2e4f-452d-9cae-9cee66c9a892',
 'permanentLoanTypeId': '2b94c631-fca9-4892-a730-03ee529ffe27',
 'permanentLocationId': '4573e824-9273-4f13-972f-cff7bf504217',
 '_version': 1}

should be matched to first holding record but not the second.

bltravis commented 1 year ago

@jermnelson Have you tried using the field concatenation approach for this?

eg (in holdings field map): { "folio_field": "legacyIdentifier", "legacy_field": "CATKEY", … }, { "folio_field": "legacyIdentifier", "legacy_field": "BASE_CALL_NUMBER", … }

(in item field map): { "folio_field": "holdingsId", "legacy_field": "CATKEY", … }, { "folio_field": "holdingsId", "legacy_field": "BASE_CALL_NUMBER", … }

This should concatenate the contents of those fields (with a " ") into the legacyIdentifer/holdingsId values, respectively.

jermnelson commented 1 year ago

@bltravis, thanks for the suggestion! We eventually ended up going with a similar approach in our holdings mapping file by concatenating the CATKEY, 'CALL_SEQ, andCOPY` columns.

We did find in order for the item field map to work properly we had to also map these fields to the formerId[0] in the holdings field map for the holding_id_map.json file to correctly map these fields for the Item transformer due to this implicated code: https://github.com/FOLIO-FSE/folio_migration_tools/blob/6ae0a3d1518d6f810c74bfa101939d289d5efea0/src/folio_migration_tools/migration_tasks/holdings_csv_transformer.py#L234-L241

Maybe this should be "legacyIdentifier instead of formerIds on line 235?

fontanka16 commented 1 year ago

I remember that I moved to the formerIds field for some reason, but I cannot recall why. @bltravis we discussed this in another context as well recently, did we not?

bltravis commented 10 months ago

Since this is the tool behaving as-expected, I'm going to close this as wont-fix.