dag-hammarskjold-library / dlx-rest

UNDHL Files and Metadata Manager
1 stars 1 forks source link

Reinstate 191 $q and 791 $q and add 993 $q as a save action #1139

Open viola-v opened 1 year ago

viola-v commented 1 year ago

In Horizon, 191 $q and 791 $q was added by a script to all records, it took 191 $a and 791 $a and removed all punctuation, which made it easier to search for symbols. We want to reinstate this field as a save action, so that linking between records and between records and files can be less error-prone. It should also ease integration with other UN systems (ODS and gDoc) and contribute to resolving issued #515.

We also want to create a similar function for 993, where we record symbols that provide links to related documents. So 993 $a should get a $q as well.

We would then need a batch process to update old records that don't have a 191 $q. It might be good to check records created since the last time the script was run in Horizon to make sure that 191 $q is correct - it might have been copied over my mistake and not updated or entered wrong manually.

jbukhari commented 1 year ago

Is the goal of this to make searching symbols easier though the search UI? And/or what are the other use cases of this in practice?

JoelleSciboz commented 1 year ago

There are many goals:

viola-v commented 1 year ago

Meeting scheduled to analyze and discuss...

viola-v commented 1 year ago

Recap from the meeting on 6 July 2023: The goal is to facilitate search and file matching and matching between records. The previous method for handling this was to populate 191 $q and 191 $r with symbols that were stripped of punctuation and space characters. 191 $q contained the document symbol itself, and 191 $r contained a combination of the values in 191 $b and $c (session information), similarly stripped of punctuation and spaces. The potential issues with this approach include the possible collision between symbol values (probably mitigated by the inclusion of 191 $r to ensure uniqueness in combination with $q). A related discussion around the discrepancy between files in our system vs ODS suggested the need a separate meeting.

Next steps:

Catalog current issues with symbols and examine whether $q/$r subfield approach will suffice, and the extent to which linked documents without punctuation in their symbols will need to be recorded (automatically?) in 993

Just parking this here since we think that $r might be important in reducing the risk of duplicate symbols - there are over 2,000 GA records that are missing subfield $c. I'm trying to do a bit of clean-up...