LD4P / qa_server

A rails engine with questioning authority gem installed to serve as an authority search server with normalized results.
Apache License 2.0
6 stars 3 forks source link

New Indexing: LOCNAME_RWO - changes in accuracy tests #401

Closed elrayle closed 3 years ago

elrayle commented 3 years ago

The new indexing scheme has the following impact on LOCNAME_RWO accuracy tests:

LOCNAME_RWO validations: (1-9)

image

LOCNAME_RWO2 validations: (10-18)

LOCNAME_RWO3 validations: (19-26)

image

sfolsom commented 3 years ago

Endalageta Kabada failure has something to do with diacritics (see the ʼEndālagétā Kabada test that passes).

The preferred label is "American Civil Liberties Union", and "ACLU" is a variant label. We'd have to make exact matches on variant labels perform better. As it is, in id.loc.gov "American Civil Liberties Union" brings http://id.loc.gov/authorities/names/n79079580 up in the 52 position, so their preferred label exact match isn't great. "ACLU" performs a little better (position 20) in id.loc.gov.

elrayle commented 3 years ago

Check out the difference in results between...

https://lookup-int.ld4l.org/authorities/search/linked_data/locnames_rwo_ld4l_cache/person?q=Twain,%20Mark,%201835-1910&maxRecords=10

[
  {
    "uri": "http://id.loc.gov/rwo/agents/n79021164",
    "id": "n 79021164",
    "label": "Twain, Mark, 1835-1910"
  },
  {
    "uri": "http://id.loc.gov/rwo/agents/n82045653",
    "id": "n 82045653",
    "label": "Twain, Mark, 1835-1910 (Spirit)"
  }
]

https://lookup-int.ld4l.org/authorities/search/linked_data/locnames_rwo_new_ld4l_cache/person?q=Twain,%20Mark,%201835-1910&maxRecords=10

[
  {
    "uri": "http://id.loc.gov/rwo/agents/n79021164",
    "id": "n 79021164",
    "label": ""
  },
  {
    "uri": "http://id.loc.gov/rwo/agents/n82045653",
    "id": "n 82045653",
    "label": ""
  },
  {
    "uri": "http://id.loc.gov/rwo/agents/n2017045245",
    "id": "n 2017045245",
    "label": ""
  },
  {
    "uri": "http://id.loc.gov/rwo/agents/nb2003044788",
    "id": "nb2003044788",
    "label": ""
  },
  {
    "uri": "http://id.loc.gov/rwo/agents/n96101071",
    "id": "n 96101071",
    "label": ""
  },
  {
    "uri": "http://id.loc.gov/rwo/agents/no2014014518",
    "id": "no2014014518",
    "label": ""
  },
  {
    "uri": "http://id.loc.gov/rwo/agents/no2014045549",
    "id": "no2014045549",
    "label": ""
  },
  {
    "uri": "http://id.loc.gov/rwo/agents/no2009013312",
    "id": "no2009013312",
    "label": ""
  }
]
elrayle commented 3 years ago

ACTION: @sfolsom will update the validation for Endalageta Kabada to note the reason it fails.


While exploring test failures, we noticed that the label isn't being displayed. In the rwo, the label is in the rdf-schema#label. It was expected that it should defined in the authority object's primary label. The spreadsheet specifies rdf:type.

ACTION: @sfolsom will review the LDPath for primary label in the spreadsheet.

elrayle commented 3 years ago

Fixed by PR #416 by marking it pending and documenting in test as a known failure with explanation.