LD4P / qa_server

A rails engine with questioning authority gem installed to serve as an authority search server with normalized results.
Apache License 2.0
6 stars 3 forks source link

Extract labels from HTTP response headers #45

Closed kirkhess closed 4 years ago

kirkhess commented 5 years ago

From http://id.loc.gov/techcenter/:

Requesting a concept URI with a HTTP HEAD method exposes a private header called "X-PrefLabel", that is a URL-encoded representation of the preferred label.

For example, running cURL with the "-I" argument on the URI for "Bahia grass" performs a HTTP request using the HEAD method.

curl -I http://id.loc.gov/authorities/subjects/sh93007391 HTTP HEAD requests return the HTTP response only, sans the body of the RDF or XHTML content. Among these headers, one would see a given header "X-PrefLabel: Bahia%20grass". It is possible to use a HTTP library within the programming language of your choice to access this header. URL-decoding the value of the X-PrefLabel header yields the string "Bahia grass".

The specific use case that should be considered is for MESH subjects, so they could be easily matched via the label to the X-URI.

kirkhess commented 5 years ago

Nate had researched this particular solution:

https://id.nlm.nih.gov/mesh/query
https://id.nlm.nih.gov/mesh/sparql
This sparql looks up “environmental Pollutants” as a mesh TopicalDescriptor
=====
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX meshv: <http://id.nlm.nih.gov/mesh/vocab#>
PREFIX mesh: <http://id.nlm.nih.gov/mesh/>

SELECT DISTINCT ?class
FROM <http://id.nlm.nih.gov/mesh>
WHERE {
       ?class rdfs:label "Environmental Pollutants"@en .
       ?class a meshv:TopicalDescriptor . 
} limit 10

Returns mesh:D004785

Via curl:
curl 'https://id.nlm.nih.gov/mesh/sparql?query=PREFIX+rdf%3A+%3Chttp%3A%2F%2Fwww.w3.org%2F1999%2F02%2F22-rdf-syntax-ns%23%3E%0D%0APREFIX+rdfs%3A+%3Chttp%3A%2F%2Fwww.w3.org%2F2000%2F01%2Frdf-schema%23%3E%0D%0APREFIX+xsd%3A+%3Chttp%3A%2F%2Fwww.w3.org%2F2001%2FXMLSchema%23%3E%0D%0APREFIX+owl%3A+%3Chttp%3A%2F%2Fwww.w3.org%2F2002%2F07%2Fowl%23%3E%0D%0APREFIX+meshv%3A+%3Chttp%3A%2F%2Fid.nlm.nih.gov%2Fmesh%2Fvocab%23%3E%0D%0APREFIX+mesh%3A+%3Chttp%3A%2F%2Fid.nlm.nih.gov%2Fmesh%2F%3E%0D%0APREFIX+mesh2015%3A+%3Chttp%3A%2F%2Fid.nlm.nih.gov%2Fmesh%2F2015%2F%3E%0D%0APREFIX+mesh2016%3A+%3Chttp%3A%2F%2Fid.nlm.nih.gov%2Fmesh%2F2016%2F%3E%0D%0APREFIX+mesh2017%3A+%3Chttp%3A%2F%2Fid.nlm.nih.gov%2Fmesh%2F2017%2F%3E%0D%0APREFIX+mesh2018%3A+%3Chttp%3A%2F%2Fid.nlm.nih.gov%2Fmesh%2F2018%2F%3E%0D%0APREFIX+mesh2019%3A+%3Chttp%3A%2F%2Fid.nlm.nih.gov%2Fmesh%2F2019%2F%3E%0D%0A%0D%0ASELECT+DISTINCT+%3Fclass%0D%0AFROM+%3Chttp%3A%2F%2Fid.nlm.nih.gov%2Fmesh%3E%0D%0AWHERE+%7B+%3Fclass+rdfs%3Alabel+%22Environmental+Pollutants%22%40en+.%0D%0A++++++++%3Fclass+a+meshv%3ATopicalDescriptor+.%0D%0A++++++%7D%0D%0AORDER+BY+%3Fclass%0D%0A&format=XML&year=current&limit=10&offset=0#lodestart-sparql-results'

It returns:
<?xml version="1.0"?>
<sparql xmlns="http://www.w3.org/2005/sparql-results#">
  <head>
    <variable name="class"/>
  </head>
  <results>
    <result>
      <binding name="class">
        <uri>http://id.nlm.nih.gov/mesh/D004785</uri>
      </binding>
    </result>
  </results>
</sparql>
kirkhess commented 5 years ago

Something like this:

curl -I https://lookup.ld4l.org/authorities/search/linked_data/mesh_ld4l_cache/label/Environmental%20Pollutants

HTTP/1.1 303 SEE OTHER
Location: http://purl.bioontology.org/ontology/MESH/D007388.html
Vary: Accept
X-URI: http://purl.bioontology.org/ontology/MESH/D007388
X-PrefLabel: Environmental Pollutants
elrayle commented 4 years ago

It's not clear what the use case for this would be in the current ecosystem of the cache-qa_normalization-sinopia_editor. This can be reopened later if a use case becomes clear. Thanks for bringing this information to our attention. If you feel that this issue should remain open, please provide details on the use case that you see this fulfilling.