TAMULib / fw-registry

MIT License
0 stars 2 forks source link

Create a FOLIO Workflow that identifies duplicate instances #350

Open rmathew1011 opened 6 months ago

rmathew1011 commented 6 months ago

This workflow should identify duplicate instances based on the following comparisons:

The output of this workflow should be a report in the following format:

Screenshot 2024-09-11 at 10 26 02 AM

Each match should apply the following criteria:

OCLC Match

ISBN Match

ISSN Match

LCCN Match

Call Number Match

This needs a schedule worklow, at an anual cadence.

You can create working tables in the mis schema of the LDP.

The results should be emailed to a variable email address.

Rows should only be included if at least one of the matches is true.

Original Text

This script should identify duplicate instances. The specific criteria for to determine that instances are duplicates will be provided, and will most likely be a comparison of multiple data points on the two instances.

The script should combine the two instances by keeping the oldest of the two instances, and removing the newest. All holdings and items from the newest instance should be moved to the oldest instance.

Update: Create a workflow to accomplish the above report - sent as an email (csv as an attachment)

Additional requirements:

Add title and author field for both matching instances.

Report columns as:

hrid, hrid2, oclc, isbn, lccn, issn, call_number, title, title2, author, author2
Dbreck-TAMU commented 6 months ago

Working on the identification of duplicate records first to then better determine what the definition of a 'duplicate' record is. We may want to venture into a plan involved heuristics to determine what a duplicate is.

This workflow is meant to be the first step in a large, overarching workflow that will help the librarians maintain and keep their data clean at scale.

Utilizing the LDP data for now.

rmathew1011 commented 6 months ago

Implementation Strategy:

rmathew1011 commented 6 months ago

IdentifyDuplicateInstances