NatLibFi / Skosmos

Thesaurus and controlled vocabulary browser using SKOS and SPARQL
Other
218 stars 94 forks source link

Ability to serve LinkedData also for ConceptSchemes, not only Concepts (.../entity?uri=myConceptScheme) #697

Open tfrancart opened 6 years ago

tfrancart commented 6 years ago

At which URL did you encounter the problem?

http://erato.sparna.fr/authority/entity?uri=http://data.legilux.public.lu/resource/authority/user-format

What steps will reproduce the problem?

  1. Go to above URL.
  2. Note that the given URI is a URI of a ConceptScheme, not a Concept URI (at http://erato.sparna.fr/authority/user-format/fr/)
  3. We get a 404

What is the expected output? What do you see instead?

It would be nice if this same API works also for ConceptSchemes / Vocabularies, not only for Concepts & Collections.

I am however not really sure what would be the bast way to do it : look in the vocabularies configuration file for the skosmos:mainConceptScheme ? search for the URI also as a ConceptScheme ?

tfrancart commented 6 years ago

Actually, it does work. Our particular problem is that we have :

So, strictly speaking, the ConceptScheme URI does not begin with Vocabulary URI space since it is missing the final "/". Hence, the Model::guessVocabularyFromUri() method does not find in which vocabulary the URI is defined. It is testing if the given URI begins with one of the URI space defined in the config file : https://github.com/NatLibFi/Skosmos/blob/master/model/Model.php#L590.

I don't want to set the vocabulary URI space to "xxx" (without final slash) to keep pretty navigation URLs. I also can't change the URI patterns for our schemes.

Also note, If I set the vocabulary URI space to "xxx", Model::guessVocabularyFromUri() does work, but I get redirected to "vocab-id/fr/page/" (empty entity ID after the /page), which gives a 404.

Should have we give a "xxx/" URI to the ConceptScheme ? can we work around this without modifying our URIs ?

osma commented 6 years ago

We have similar issues even when ConceptScheme URIs end with slashes so they match the vocabulary URI space. I think it's worth investigating.

One possibility would be to relax the guessVocabularyFromUri check a little bit so that it also matches without the trailing slash, do you think it would help in your case?

tfrancart commented 6 years ago

It does help by adding this extra loop in guessVocabularyFromUri :

        // didn't work, try to find without the final "/" or "#" of the URI space
        // in order to match ConceptSchemes URI that don't have the final '/' or '#'
        foreach ($this->vocabsByUriSpace as $urispace => $vocabs) {
            if (
                ($urispace{strlen($urispace)-1} == '/' || $urispace{strlen($urispace)-1} == '#')
                &&
                strpos($uri, substr($urispace, 0, strlen($urispace)-1)) === 0
            ) {
                return $this->disambiguateVocabulary($vocabs, $uri);
            }
        }

One thing, however, is that we are redirected to a generic display page of the ConceptScheme, not to the "home page" of the vocabulary. So we need also this modification in EntityController:redirectWeb :

        // if the provided URI is actually the main concept scheme URI, display the home page of the vocabulary
        if($uri == $vocab->getDefaultConceptScheme()) {
            $url = $baseurl . "$vocid";
        } else if ($localname !== $uri && $localname === urlencode($localname)) {
            ...

And do we need something similar in redirectREST ? (I am not familiar with the API)

tfrancart commented 6 years ago

Slightly more elaborated additionnal lookup in guessVocabularyFromURI to prevent incorrect matching :

        // didn't work, try to find without the final "/" or "#" of the URI space
        // in order to match ConceptSchemes URI that don't have the final '/' or '#'
        foreach ($this->vocabsByUriSpace as $urispace => $vocabs) {
            if (
                ($urispace{strlen($urispace)-1} == '/' || $urispace{strlen($urispace)-1} == '#')
                &&
                strpos($uri, substr($urispace, 0, strlen($urispace)-1)) === 0
                &&
                // to avoid potential false match
                strlen($uri) == strlen($urispace)-1
            ) {
                return $this->disambiguateVocabulary($vocabs, $uri);
            }
        }
tfrancart commented 6 years ago

Generally speaking, even if the additions described above can do the job for our case, I think the algorithm would be more robust by :

  1. Querying for the graph(s) in which the requested URI is defined (= has a label, or a type)
  2. Look in the configuration which vocabulary(ies) correspond to the graph(s)
  3. Then proceed as now

this way we would be agnostic on the URI pattern (at the cost of an extra query).

osma commented 6 years ago

I agree, but if we want to avoid an extra query, it could be something like this:

  1. Use normal guessVocabularyFromURI
  2. If it didn't indicate a single vocabulary, query for the graph(s) as per previous comment