Closed elrayle closed 2 years ago
Here's a mapping between the SRU and classes found in the cache: https://docs.google.com/spreadsheets/d/1Tkv6e6yb4tIUe0a4dZip9P8ht-_lvJAzBiab255J5N8/edit#gid=0.
Still not exactly sure how to clearly describe for template creators the difference between direct and cache. Maybe it's something like (extended context) and (likely more current/limited context)... pretty verbose, I know. Or we just leave it cache and direct and how we have enough documentation readily available to allow folks to make sense of the distinctions.
@sfolsom @eichmann In the spreadsheet, there is entity Concept (in cache) and oclc.topic (in OCLC API). I'm wondering if these are the same?
@sfolsom OCLC also includes oclc.alt_lc. This is currently surfaced through QA DIRECT for OCLC. Is this something we should surface through the cache and Sinopia?
@eichmann Additional entities to support from the cache: oclc.period and oclc.form. Coming soon will also be oclc.meeting, but that won't be in the data until the next update of the data dump.
The only Concepts I see in the cache are id.loc.gov entities that are related matches with the FAST, and I don't think we should be surfacing these Concepts in the FAST lookup. (We have separate LC lookups.) Topics are schema:Intangibles with skos:inScheme http://id.worldcat.org/fast/ontology/1.0/#facet-Topical.
oclc.alt_lc (if I understand it correctly) is a lookup that searches LC headings that then give us the analogous FAST URI. This seems different from the others, and probably is/could be addressed by how Dave indexes the FAST headings and corresponding LCSH.
Hope this makes sense. :)
@sfolsom @elrayle @jermnelson what is status of this? do we need both direct access and cache?
@elrayle can we close this?
discussion at QA-Sinopia Developers meeting on 7/28/21:
top results from direct are more relevant than top results from cache (based on recent work on another project at Stanford using QA); for example try "tea"
@justinlittman will provide more examples
but direct search doesn't offer pagination or rank order (OCLC direct sends xml stream of RDF that is in order, but when converted to graph, there is no order/rank); can the cache preserve the order it receives?
keep both and make less confusing? keep one or the other?
cache slower than direct (data from test on 7/8/21)
curl --output /dev/null --silent --show-error --write-out '%{time_total}\n' 'https://lookup.ld4l.org/authorities/search/linked_data/oclc_fast?q=twain&maximumRecords=100' -H 'Accept: application/json'
4.894599
curl --output /dev/null --silent --show-error --write-out '%{time_total}\n' 'http://experimental.worldcat.org/fast/search?query=cql.any+all+%22{twain}%22&sortKeys=usage&maximumRecords=100' -H 'Accept: application/xml' 0.834906
curl --output /dev/null --silent --show-error --write-out '%{time_total}\n' 'http://fast.oclc.org/fastIndex/select?q=keywords%3A(twain)&rows=100&start=0&version=2.2&indent=on&fl=id,fullphrase,type,usage,status&sort=usage%20desc' -H 'Accept: application/xml' 0.276711
we expect more templates to be using OCLC FAST (vs LCSH) in future
on another note, cache seems to be case-sensitive for exact match logic, do we want that?
TO DO:
1. align sub-authorities in https://github.com/LD4P/sinopia_editor/blob/main/static/authorityConfig.json: change the labels to be consistent between cache and direct, example: `"label": "OCLCFAST personal_name (QA) - direct" vs "label": "OCLCFAST person (QA) - cache"`, use "person" or "personal_name" for both
2. make new ticket for case-sensitivity changes desired
and add another ticket about improving the cache performance and sorting
Labels in Sinopia for subauths...
Direct subauths | Cache subauths | Name to use for both | Comments |
---|---|---|---|
personal_name | person | Personal Name | rename in Sinopia |
corporate_name | organization | Corporate Name | rename in Sinopia |
uniform_title | work | Uniform Title | rename in Sinopia |
geographic | place | Geographic | rename in Sinopia |
event_name | event | Event Name | rename in Sinopia |
meeting | meeting | Meeting | add config in Sinopia |
period | period | Period | add config in Sinopia |
form | genreform | Form | add config in Sinopia |
topic | topic | Topic | add config in Sinopia |
concept | REMOVE form Sinopia. These are id.loc.gov entities and should be looked up through LC. | ||
intangible | REMOVE from Sinopia | ||
alt_lc | REMOVE from Sinopia |
NOTE:
event
and meeting
subauths. Sinopia does not currently have meeting
defined.Reference:
Subauths as mapped in configuration - QA:Cache
"subauthorities": {
"person": "Person",
"organization": "Organization",
"work": "Work",
"place": "Place",
"event": "Event",
"meeting": "Meeting",
"period": "Periodization",
"genreform": "Genre",
"concept": "Concept",
"intangible": "Intangible"
}
Subauths as mapped in configuration - QA:Direct
"subauthorities": {
"topic": "oclc.topic",
"concept": "oclc.topic",
"geocoordinates": "oclc.geographic",
"geographic": "oclc.geographic",
"place": "oclc.geographic",
"event": "oclc.eventName",
"event_name": "oclc.eventName",
"meeting": "oclc.meeting",
"person": "oclc.personalName",
"personal_name": "oclc.personalName",
"organization": "oclc.corporateName",
"corporate_name": "oclc.corporateName",
"uniform_title": "oclc.uniformTitle",
"work": "oclc.uniformTitle",
"period": "oclc.period",
"form": "oclc.form",
"alt_lc": "oclc.altlc"
}
NOTE: There are repeats in the names used for subauths in QA. This was an attempt to align the subauths in QA/cache and QA/direct.
Alignment analysis here: https://docs.google.com/spreadsheets/d/1baAgyP3CtmJ31fTGYzRcykUE1RSS126L8U8f0UI7zIo/edit?usp=sharing
Takeaways: We can sever some Sinopia direct configs because they are either redundant to others or not applicable to cataloging workflows. We'll eventually need a new cache with Meetings once there is a critical mass of them in the data. We still need QA/Sinopia config to separate schema:Intangibles into two subauths (topic and form).
Labels in Sinopia for subauths...
Direct subauths Cache subauths Name to use for both Comments personal_name person Personal Name rename in Sinopia corporate_name organization Corporate Name rename in Sinopia uniform_title work Uniform Title rename in Sinopia geographic place Geographic rename in Sinopia event_name event Event Name rename in Sinopia meeting meeting Meeting add config in Sinopia period period Period add config in Sinopia form genreform Form add config in Sinopia topic concept REMOVE form Sinopia. These are id.loc.gov entities and should be looked up through LC. intangible REMOVE from Sinopia alt_lc REMOVE from Sinopia NOTE:
- The cache and qa support
event
andmeeting
subauths. Sinopia does not currently havemeeting
defined.- In the cache, intangible is topic and form combined. Intangible and topic will not be supported through Sinopia. Form will be supported using the genreform subauth.
I think we still need Topic for the Intangibles that are not in the topical facet as opposed to the form facet.
@sfolsom Is this done? Can it be closed?
Yep, pull request here: https://github.com/LD4P/sinopia_editor/commit/d1664cc7c52b845f0307cde859fdbb19cffa3034
Description
Currently direct access and cache access to OCLC FAST are labeled the same, such that they appear to be the same authority. At the minimum, the labels need to be updated to make it clear that these are different. Ideally, the labels will be consistent regardless of whether the search will go direct against OCLC's API or against the cache.
Simple rename
Expected label
There are many labels impacted by this. As an example of a simple rename of the label, the expected labels for person would be...
Actual label
For the expected label case above, the actual labels are...
Potential mismatch in names that is not easily reconciled
In cache, but not in direct access
It is possible that these match up to one of the direct access subauths.
In direct access, but not in cache
There are several direct entities that are not in the cache QA config:
It is possible that these match up to one of the cache subauths. But there are clearly more on the direct access side, so there won't be a one-for-one match.
For a complete list of possible subauths for direct access, see Search Indices -> SRU Indices on https://www.oclc.org/developer/develop/web-services/fast-api/linked-data.en.html
Outstanding Questions
What happens to resource templates that reference the existing naming scheme?
From discussion in meeting with QA/Sinopia team, @jermnelson and @justinlittman gave the following response to the resource template question.
DEPRECATED
at the end of the label. This will give resource template editors an opportunity to remove the auth/subauth from use.Should users have access to both direct access and cache access?
There was some concerns that having access to both would be confusing. We cannot actually remove the either at this point as it would break any resource templates that reference them. If we decide one should not be maintained, it can be marked DEPRECATED as suggested in the previous question.
@sfolsom Does this need further exploration to determine if one should be marked DEPRECATED?
Should the list of subauthorities match?
Short answer: They should match as much as possible.
How to reconcile the differences?
Action below will determine how they should be matched up.
Should names match the names used by OCLC?
Action below will recommend what label to use.
Known actions that need to happen to explore best resolution
ACTION - get full list of entities for cache access
@eichmann in preparation for the second action, list the following for each entity supported by the cache for searching OCLCFAST
ACTION - determine which subauths should be supported and what name should be used
@sfolsom will look at the list of potential SRU indices for direct search and compare them to the entities for the cache (from the first action above) and make recommendations for...
References