LD4P / qa_server

A rails engine with questioning authority gem installed to serve as an authority search server with normalized results.
Apache License 2.0
6 stars 3 forks source link

results include data from other authorities #461

Open elrayle opened 2 years ago

elrayle commented 2 years ago

I believe this is impacting multiple authorities, but here is an example from LOC_SUBJECTS

https://lookup.ld4l.org/authorities/search/linked_data/locsubjects_ld4l_cache?q=science&maxRecords=4&context=false

[
  {
    "uri": "http://id.loc.gov/authorities/subjects/sh85118553",
    "id": "http://id.loc.gov/authorities/subjects/sh85118553",
    "label": "Science@en"
  },
  {
    "uri": "http://id.loc.gov/authorities/subjects/sh00007934",
    "id": "http://id.loc.gov/authorities/subjects/sh00007934",
    "label": "Science@en"
  },
  {
    "uri": "http://www.wikidata.org/entity/Q336",
    "id": "http://www.wikidata.org/entity/Q336",
    "label": "\\"
  },
  {
    "uri": "http://data.bnf.fr/ark:/12148/cb121155321",
    "id": "http://data.bnf.fr/ark:/12148/cb121155321",
    "label": "\\"
  }
]
elrayle commented 2 years ago

Looking at raw data to determine if the data is coming from the cache incorrectly.

curl -L -D - -H 'Accept: application/n-triples' 'http://services.ld4l.org/ld4l_services/loc_subject_batch.jsp?query=science&maxRecords=4&startRecord=1&lang=en'

Results (including rank, type, and prefLabel only for the loc results; last two are all the returned triples)...

<http://id.loc.gov/authorities/subjects/sh85118553> <http://vivoweb.org/ontology/core#rank> "1" .
<http://id.loc.gov/authorities/subjects/sh85118553> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://www.w3.org/2004/02/skos/core#Concept> .
<http://id.loc.gov/authorities/subjects/sh85118553> <http://www.w3.org/2004/02/skos/core#prefLabel> "Science@en"@en .

<http://id.loc.gov/authorities/subjects/sh00007934> <http://vivoweb.org/ontology/core#rank> "2" .
<http://id.loc.gov/authorities/subjects/sh00007934> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://www.w3.org/2004/02/skos/core#Concept> .
<http://id.loc.gov/authorities/subjects/sh00007934> <http://www.w3.org/2004/02/skos/core#prefLabel> "Science@en"@en .

<http://www.wikidata.org/entity/Q336> <http://vivoweb.org/ontology/core#rank> "3" .
<http://www.wikidata.org/entity/Q336> <http://www.loc.gov/mads/rdf/v1#authoritativeLabel> "\\"science\\" " .

<http://data.bnf.fr/ark:/12148/cb121155321> <http://vivoweb.org/ontology/core#rank> "4" .
<http://data.bnf.fr/ark:/12148/cb121155321> <http://www.loc.gov/mads/rdf/v1#authoritativeLabel> "\\"Science\\" " .
eichmann commented 2 years ago

@elrayle please check on this now that we've done a full reindexing for the case-insensitive match.

elrayle commented 2 years ago

Still an issue. May be happening because the other auths also use skos:concept. May be able to fix by changing to madsrdf:Authority. Or could check that the URI starts with the expected LOC host.

Plan to add a constraint clause in the seed query that forces results to be skos:concept.