LD4P / qa_server

A rails engine with questioning authority gem installed to serve as an authority search server with normalized results.
Apache License 2.0
5 stars 2 forks source link

New Indexing: LOCNAMES - changes in accuracy tests #398

Open elrayle opened 3 years ago

elrayle commented 3 years ago

The new indexing scheme has the following impact on LOCNAMES accuracy tests:

All tests are passing except one, which was failing before the indexing change. Before it was not found at all. After it is found, but not by the expected position. This issue is to explore and document why that one test continues to fail.

image

sfolsom commented 3 years ago

The Camden NJ URI should be http://id.loc.gov/authorities/names/n80010449. The one listed is for the county. Should I update the test here: https://github.com/LD4P/qa_server/blob/master/lib/generators/qa_server/templates/config/authorities/linked_data/scenarios/locnames_ld4l_cache_validation.yml?

That said, I think the issue here is that once the punctation is removed we get results with much longer text strings than the cataloger provides in the search. If greater precision isn't possible, I think we're ok to change the position to 20 or so, since we now have pagination.

sfolsom commented 3 years ago

Added a pull request https://github.com/LD4P/qa_server/pull/407 to fix the URI. Still this test will failed under the current search.

@dave The relevancy (after removing punctuation) still seems to not be precise enough. See: https://lookup.ld4l.org/authorities/search/linked_data/locnames_new_ld4l_cache/geographic?q=Camden%20NJ&maxRecords=20

elrayle commented 3 years ago

Searching for Camden N.J. returns the expected subject as the second result. The data uses state abbreviations instead of postal codes, such that Camden NJ does not include the subject.

@sfolsom will contact LOC to see if the postal code version could be added as a variant.

sfolsom commented 3 years ago

@kefo and @thisismattmiller have you ever considered add an alt label for stateside locations with the present day postal abbreviations? https://about.usps.com/who-we-are/postal-history/state-abbreviations.htm We're having a bear of a time trying to support searches that include the new (1963-) abbreviations when the labels do not include them. E.g. for "Seattle, Wash." would also have the label "Seattle, WA".

kefo commented 3 years ago

I have considered this in the past.

Allow me to perpend on this for a few days and ask some questions. There is a project that we intend to do soon that will require us to evaluate all geographic headings in the system so this is a timely idea. It would make sense to include it, assuming there's no issue that arises.

elrayle commented 3 years ago

@kefo At a bare minimum, having "Seattle, WA" as a variant label would allow our search to find it with q=Seattle WA.