Open elrayle opened 3 years ago
The Camden NJ URI should be http://id.loc.gov/authorities/names/n80010449. The one listed is for the county. Should I update the test here: https://github.com/LD4P/qa_server/blob/master/lib/generators/qa_server/templates/config/authorities/linked_data/scenarios/locnames_ld4l_cache_validation.yml?
That said, I think the issue here is that once the punctation is removed we get results with much longer text strings than the cataloger provides in the search. If greater precision isn't possible, I think we're ok to change the position to 20 or so, since we now have pagination.
Added a pull request https://github.com/LD4P/qa_server/pull/407 to fix the URI. Still this test will failed under the current search.
@dave The relevancy (after removing punctuation) still seems to not be precise enough. See: https://lookup.ld4l.org/authorities/search/linked_data/locnames_new_ld4l_cache/geographic?q=Camden%20NJ&maxRecords=20
Searching for Camden N.J.
returns the expected subject as the second result. The data uses state abbreviations instead of postal codes, such that Camden NJ
does not include the subject.
@sfolsom will contact LOC to see if the postal code version could be added as a variant.
@kefo and @thisismattmiller have you ever considered add an alt label for stateside locations with the present day postal abbreviations? https://about.usps.com/who-we-are/postal-history/state-abbreviations.htm We're having a bear of a time trying to support searches that include the new (1963-) abbreviations when the labels do not include them. E.g. for "Seattle, Wash." would also have the label "Seattle, WA".
I have considered this in the past.
Allow me to perpend on this for a few days and ask some questions. There is a project that we intend to do soon that will require us to evaluate all geographic headings in the system so this is a timely idea. It would make sense to include it, assuming there's no issue that arises.
@kefo At a bare minimum, having "Seattle, WA" as a variant label would allow our search to find it with q=Seattle WA
.
The new indexing scheme has the following impact on LOCNAMES accuracy tests:
All tests are passing except one, which was failing before the indexing change. Before it was not found at all. After it is found, but not by the expected position. This issue is to explore and document why that one test continues to fail.