cul-it / qa_server

A rails app with questioning authority gem installed to serve as a QA server.
Apache License 2.0
1 stars 6 forks source link

Add Direct Lookups for LC Names #364

Open sfolsom opened 11 months ago

sfolsom commented 11 months ago

Create direct lookups for the following LC Names.

The Samvera QA Code seems to offer search for LC names: https://github.com/samvera/questioning_authority/wiki/Connecting-to-Library-of-Congress-%28LOC%29.

Related links: See https://id.loc.gov/techcenter/searching.html for API documentation.

Example of a direct search config to base new config off: https://github.com/cul-it/qa_server/blob/f3083a8f1392e72890f8d6e6a763c9e0ace1ea0c/config/authorities/linked_data/oclcfast_direct.json#L32

Data to bring in and translate, if available in the API: https://docs.google.com/spreadsheets/d/1rPvEoP9iYNkxJ0eWC8gXe3ci7e6mhW0da59xkGhadi0/edit#gid=745128104

Authorities/sub-authorities:

chrisrlc commented 9 months ago

An update!: I was struggling for a while yesterday trying to track down why I couldn't get a new search endpoint working in the linked_data/loc_direct config. I've confirmed that it's not just complete incompetence on my side because I was able to get https://github.com/cul-it/qa_server/issues/374 (homosaurus_direct) configured locally fairly quickly. And after further digging, I found this note in the ld4p/linked_data_authorities documentation: "NOTE: At this writing, Library of Congress does not support a query API of id.loc.gov which returns a serialization of linked data. As such, the direct configuration only supports fetching an individual term. The cache configurations support both term fetch and search query."

I also see that there are still no linked data search serialization formats available listed in LOC's docs: https://id.loc.gov/techcenter/serializations.html (Or at least, the internet says that none of these are linked data formats?)

If it's not a requirement to work with linked data to return results, I believe we can theoretically extend the qa gem's non-linked data endpoint to return more of what we need (uri, id, and label). Current example with no uri: https://qa-server-service.library.cornell.edu/authorities/search/loc/names?q=Cornell,%20Ezra. I think we'll have to tinker with our deploy setup though if we go this route, since it won't just be a config file change - will find out more from Greg on Monday hopefully.

I'll also add though, that search endpoints from LOC seem to only return a very limited subset of metadata about each result. Example using search endpoint: https://id.loc.gov/search/?q=Cornell,%20Ezra&format=atom and using suggest endpoint: https://id.loc.gov/authorities/names/suggest?q=Cornell,%20Ezra. So not sure (yet?) how we'd get the extended set of data as listed in the google spreadsheet linked above, unless we used multiple API calls.

sfolsom commented 9 months ago

Christina,

Sorry for the struggle you went through. Yes, the environment Lyentte set up did prefer linked returned in the search results. For certain vocabularies we should only expect labels and URIs... at least with the URI, the cataloger can go directly to the entity description to look at it.

It's not a requirement to work with linked data from an API, but it was just easier when we were using caching because we had tooling in place that didn't require new code, just config files. I'll leave it to you to figure out the next steps. I really don't understand the gem's, and I thought the direct lookups could be either linked data or not. It sounds like they NEED (not just prefer) linked data returned.

Just an FYI, I'll dealing with side effects of a covid vaccine. I'll likely not work much today with the way things are going.

Talk to everyone on Monday, Steven


From: Christina Cortland @.> Sent: 29 September 2023 9:49 AM To: cul-it/qa_server @.> Cc: Steven Michael Folsom @.>; Author @.> Subject: Re: [cul-it/qa_server] Add Direct Lookups for LC Names (Issue #364)

An update!: I was struggling for a while yesterday trying to track down why I couldn't get a new search endpoint working in the linked_data/loc_direct config. I've confirmed that it's not just complete incompetence on my side because I was able to get #374https://github.com/cul-it/qa_server/issues/374 (homosaurus_direct) configured locally fairly quickly. And after further digging, I found this note in the ld4p/linked_data_authorities documentationhttps://github.com/ld4p/linked_data_authorities/tree/main/qa_loc: "NOTE: At this writing, Library of Congress does not support a query API of id.loc.gov which returns a serialization of linked data. As such, the direct configuration only supports fetching an individual term. The cache configurations support both term fetch and search query."

I also see that there are still no linked data search serialization formats available listed in LOC's docs: https://id.loc.gov/techcenter/serializations.html (Or at least, the internet says that none of these are linked data formats?)

If it's not a requirement to work with linked data to return results, I believe we can theoretically extend the qa gem's non-linked data endpoint to return more of what we need (uri, id, and label). Current example with no uri: https://qa-server-service.library.cornell.edu/authorities/search/loc/names?q=Cornell,%20Ezra. I think we'll have to tinker with our deploy setup though if we go this route, since it won't just be a config file change - will find out more from Greg on Monday hopefully.

I'll also add though, that search endpoints from LOC seem to only return a very limited subset of metadata about each result. Example using search endpoint: https://id.loc.gov/search/?q=Cornell,%20Ezra&format=atom and using suggest endpoint: https://id.loc.gov/authorities/names/suggest?q=Cornell,%20Ezra. So not sure (yet?) how we'd get the extended set of data as listed in the google spreadsheet linked above, unless we used multiple API calls.

— Reply to this email directly, view it on GitHubhttps://github.com/cul-it/qa_server/issues/364#issuecomment-1740927610, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AB6752S7UJVJJW6YEKQXE73X43GU3ANCNFSM6AAAAAA2362PNA. You are receiving this because you authored the thread.Message ID: @.***>

chrisrlc commented 9 months ago

Switching gears to direct term retrieval:

There are currently 2 qa endpoints for fetching individual terms directly from loc:

  1. Using the non-linked data code from the qa gem: https://lookup.ld4l.org/authorities/show/loc/names/n97066641
  2. Using the linked data configs: https://lookup.ld4l.org/authorities/show/linked_data/loc_direct/names/n97066641
    • parses the same loc json as above and only returns the qa-supported subset of ld values (id, label, altlabel, sameas, narrower, broader)

Are either of these endpoints useful as is? Benefit of the former: it already has all the possible data we can get back from loc. But if we need these results in a different format (e.g. to match the format returned by the latter ld endpoint), we'll need to override more qa gem code. Benefit of the latter: the results format matches all the other qa linked data results. But if we want to return more fields, we'll need to figure out how to add support for additional predicates than the current 6.

sfolsom commented 9 months ago

The QA gem for search seems to be doing a decent/minimal job, e.g. https://lookup-int.ld4l.org/authorities/search/loc/subjects?q=History--

I wonder if we can get the other 4 predicates to show. Even if we can't, I think this is enough for now.

chrisrlc commented 9 months ago

Follow-up from conversation this morning: do we want separate authorities for each of these loc tickets, or can these all just be subauthorities of the loc authority lookup?

For example, if these are all just subauthorities of the loc authority lookup, the lookup urls would all come in a format that look something like:

For this ticket:

For Ticket #369: /authorities/search/loc/subjects?q= For Ticket #363: /authorities/search/loc/bookformat?q= etc.

Alternatively, if we needed to create separate authorities for each ticket, we might have urls that looks like:

sfolsom commented 9 months ago

Subauthorities is enough, no need to create separate lookups for different types of names. We'll need a separate lookup for each LOC related issue in the project (e.g. https://github.com/cul-it/qa_server/issues/369), but the type filters listed in the issue can be subauthorities.

sfolsom commented 9 months ago

That said, I should probably test one in Sinopia before you attempt the others.

chrisrlc commented 8 months ago

@sfolsom This is ready for you to test on lookup-int! Direct lookup results use the id.loc.gov api (same results as what you'd find here: https://id.loc.gov/search/?q=cs:http://id.loc.gov/authorities/names). Each result has an id, label, and uri parsed from: https://id.loc.gov/search/?q=cs:http://id.loc.gov/authorities/names&format=json

Please let me know if you'd like any changes!

New/extended loc names lookup endpoint examples:

sfolsom commented 8 months ago

These look good! Yes, organization is the same as corporate.