LD4P / sinopia_editor

Sinopia Linked Data Editor
https://sinopia.io/
Apache License 2.0
35 stars 9 forks source link

Make OCLC FAST direct access and cache access clearer to understand #2125

Closed elrayle closed 2 years ago

elrayle commented 4 years ago

Description

Currently direct access and cache access to OCLC FAST are labeled the same, such that they appear to be the same authority. At the minimum, the labels need to be updated to make it clear that these are different. Ideally, the labels will be consistent regardless of whether the search will go direct against OCLC's API or against the cache.

Simple rename

Expected label

There are many labels impacted by this. As an example of a simple rename of the label, the expected labels for person would be...

Actual label

For the expected label case above, the actual labels are...

Potential mismatch in names that is not easily reconciled

In cache, but not in direct access

It is possible that these match up to one of the direct access subauths.

In direct access, but not in cache

There are several direct entities that are not in the cache QA config:

It is possible that these match up to one of the cache subauths. But there are clearly more on the direct access side, so there won't be a one-for-one match.

For a complete list of possible subauths for direct access, see Search Indices -> SRU Indices on https://www.oclc.org/developer/develop/web-services/fast-api/linked-data.en.html

Outstanding Questions

What happens to resource templates that reference the existing naming scheme?

From discussion in meeting with QA/Sinopia team, @jermnelson and @justinlittman gave the following response to the resource template question.

Should users have access to both direct access and cache access?

There was some concerns that having access to both would be confusing. We cannot actually remove the either at this point as it would break any resource templates that reference them. If we decide one should not be maintained, it can be marked DEPRECATED as suggested in the previous question.

@sfolsom Does this need further exploration to determine if one should be marked DEPRECATED?

Should the list of subauthorities match?

Short answer: They should match as much as possible.

How to reconcile the differences?

Action below will determine how they should be matched up.

Should names match the names used by OCLC?

Action below will recommend what label to use.

Known actions that need to happen to explore best resolution

ACTION - get full list of entities for cache access

@eichmann in preparation for the second action, list the following for each entity supported by the cache for searching OCLCFAST

ACTION - determine which subauths should be supported and what name should be used

@sfolsom will look at the list of potential SRU indices for direct search and compare them to the entities for the cache (from the first action above) and make recommendations for...

References

sfolsom commented 4 years ago

Here's a mapping between the SRU and classes found in the cache: https://docs.google.com/spreadsheets/d/1Tkv6e6yb4tIUe0a4dZip9P8ht-_lvJAzBiab255J5N8/edit#gid=0.

Still not exactly sure how to clearly describe for template creators the difference between direct and cache. Maybe it's something like (extended context) and (likely more current/limited context)... pretty verbose, I know. Or we just leave it cache and direct and how we have enough documentation readily available to allow folks to make sense of the distinctions.

elrayle commented 4 years ago

@sfolsom @eichmann In the spreadsheet, there is entity Concept (in cache) and oclc.topic (in OCLC API). I'm wondering if these are the same?

@sfolsom OCLC also includes oclc.alt_lc. This is currently surfaced through QA DIRECT for OCLC. Is this something we should surface through the cache and Sinopia?

@eichmann Additional entities to support from the cache: oclc.period and oclc.form. Coming soon will also be oclc.meeting, but that won't be in the data until the next update of the data dump.

sfolsom commented 4 years ago

The only Concepts I see in the cache are id.loc.gov entities that are related matches with the FAST, and I don't think we should be surfacing these Concepts in the FAST lookup. (We have separate LC lookups.) Topics are schema:Intangibles with skos:inScheme http://id.worldcat.org/fast/ontology/1.0/#facet-Topical.

oclc.alt_lc (if I understand it correctly) is a lookup that searches LC headings that then give us the analogous FAST URI. This seems different from the others, and probably is/could be addressed by how Dave indexes the FAST headings and corresponding LCSH.

Hope this makes sense. :)

michelleif commented 4 years ago

@sfolsom @elrayle @jermnelson what is status of this? do we need both direct access and cache?

michelleif commented 3 years ago

@elrayle can we close this?

michelleif commented 3 years ago

discussion at QA-Sinopia Developers meeting on 7/28/21:

FAST endpoint that QA is proxying.

curl --output /dev/null --silent --show-error --write-out '%{time_total}\n' 'http://experimental.worldcat.org/fast/search?query=cql.any+all+%22{twain}%22&sortKeys=usage&maximumRecords=100' -H 'Accept: application/xml' 0.834906

Another much faster FAST endpoint.

curl --output /dev/null --silent --show-error --write-out '%{time_total}\n' 'http://fast.oclc.org/fastIndex/select?q=keywords%3A(twain)&rows=100&start=0&version=2.2&indent=on&fl=id,fullphrase,type,usage,status&sort=usage%20desc' -H 'Accept: application/xml' 0.276711



we expect more templates to be using OCLC FAST (vs LCSH) in future

on another note, cache seems to be case-sensitive for exact match logic, do we want that?

TO DO: 
1. align sub-authorities in https://github.com/LD4P/sinopia_editor/blob/main/static/authorityConfig.json: change the labels to be consistent between cache and direct, example: `"label": "OCLCFAST personal_name (QA) - direct" vs "label": "OCLCFAST person (QA) - cache"`, use "person" or "personal_name" for both 
2. make new ticket for case-sensitivity changes desired
michelleif commented 3 years ago

and add another ticket about improving the cache performance and sorting

elrayle commented 3 years ago

Labels in Sinopia for subauths...

Direct subauths Cache subauths Name to use for both Comments
personal_name person Personal Name rename in Sinopia
corporate_name organization Corporate Name rename in Sinopia
uniform_title work Uniform Title rename in Sinopia
geographic place Geographic rename in Sinopia
event_name event Event Name rename in Sinopia
meeting meeting Meeting add config in Sinopia
period period Period add config in Sinopia
form genreform Form add config in Sinopia
topic topic Topic add config in Sinopia
concept REMOVE form Sinopia. These are id.loc.gov entities and should be looked up through LC.
intangible REMOVE from Sinopia
alt_lc REMOVE from Sinopia

NOTE:

Reference:

elrayle commented 3 years ago

Subauths as mapped in configuration - QA:Cache

"subauthorities": {
      "person":         "Person",
      "organization":   "Organization",
      "work":           "Work",
      "place":          "Place",
      "event":          "Event",
      "meeting":        "Meeting",
      "period":         "Periodization",
      "genreform":      "Genre",
      "concept":        "Concept",
      "intangible":     "Intangible"
    }

Subauths as mapped in configuration - QA:Direct

"subauthorities": {
      "topic":          "oclc.topic",
      "concept":        "oclc.topic",
      "geocoordinates": "oclc.geographic",
      "geographic":     "oclc.geographic",
      "place":          "oclc.geographic",
      "event":          "oclc.eventName",
      "event_name":     "oclc.eventName",
      "meeting":        "oclc.meeting",
      "person":         "oclc.personalName",
      "personal_name":  "oclc.personalName",
      "organization":   "oclc.corporateName",
      "corporate_name": "oclc.corporateName",
      "uniform_title":  "oclc.uniformTitle",
      "work":           "oclc.uniformTitle",
      "period":         "oclc.period",
      "form":           "oclc.form",
      "alt_lc":         "oclc.altlc"
}

NOTE: There are repeats in the names used for subauths in QA. This was an attempt to align the subauths in QA/cache and QA/direct.

sfolsom commented 2 years ago

Alignment analysis here: https://docs.google.com/spreadsheets/d/1baAgyP3CtmJ31fTGYzRcykUE1RSS126L8U8f0UI7zIo/edit?usp=sharing

Takeaways: We can sever some Sinopia direct configs because they are either redundant to others or not applicable to cataloging workflows. We'll eventually need a new cache with Meetings once there is a critical mass of them in the data. We still need QA/Sinopia config to separate schema:Intangibles into two subauths (topic and form).

sfolsom commented 2 years ago

Labels in Sinopia for subauths...

Direct subauths Cache subauths Name to use for both Comments personal_name person Personal Name rename in Sinopia corporate_name organization Corporate Name rename in Sinopia uniform_title work Uniform Title rename in Sinopia geographic place Geographic rename in Sinopia event_name event Event Name rename in Sinopia meeting meeting Meeting add config in Sinopia period period Period add config in Sinopia form genreform Form add config in Sinopia topic concept REMOVE form Sinopia. These are id.loc.gov entities and should be looked up through LC. intangible REMOVE from Sinopia alt_lc REMOVE from Sinopia NOTE:

  • The cache and qa support event and meeting subauths. Sinopia does not currently have meeting defined.
  • In the cache, intangible is topic and form combined. Intangible and topic will not be supported through Sinopia. Form will be supported using the genreform subauth.

I think we still need Topic for the Intangibles that are not in the topical facet as opposed to the form facet.

elrayle commented 2 years ago

@sfolsom Is this done? Can it be closed?

sfolsom commented 2 years ago

Yep, pull request here: https://github.com/LD4P/sinopia_editor/commit/d1664cc7c52b845f0307cde859fdbb19cffa3034