NatLibFi / Skosmos

Thesaurus and controlled vocabulary browser using SKOS and SPARQL
Other
222 stars 94 forks source link

Ability to show isothes:ThesaurusArray in Hierarchy tab #656

Open tfrancart opened 6 years ago

tfrancart commented 6 years ago

I need to include ThesaurusArray in Hierarchy tab. This includes :

I will (maybe) work on this in the next few weeks/months.

osma commented 6 years ago

Ok, go ahead. Might be a little challenging since the code and the REST API isn't designed for this, but it's worth trying. Probably makes sense to break this up into smaller pieces and implement them one by one.

tfrancart commented 6 years ago

Before proceeding, I would like your opinion on the implementation approach. I am thinking about extending the query in GenericSparql::generateParentListQuery, so that it also returns the (optional) array (arrays ?) in which the child belong. Something like adding this, similar to what is done in generateConceptInfoQuery :

    private function generateParentListQuery($uri, $lang, $fallback, $props) {
        $fcl = $this->generateFromClause();
        $propertyClause = implode('|', $props);

        if ($arrayClass === null) {
            $selectArrays = $optionalArrays = "";
        } else {
            // add information that can be used to format narrower concepts by
            // the array they belong to ("milk by source animal" use case)
            $selectArrays = "\n ?childArray (SAMPLE(?childArrayLabels) as ?childArrayLabel)";
            $optionalArrays = <<<EOQ
        OPTIONAL {
                  ?childArray skos:member ?children .
                  ?childArray a <$arrayClass> .
                  FILTER NOT EXISTS {
                    ?childArray skos:member ?other .
                    FILTER NOT EXISTS { ?other skos:broader ?broad }
                  }

                  OPTIONAL {
                    ?childArray skos:prefLabel ?childArrayLabels .
                    FILTER (langMatches(lang(?childArrayLabels), "$lang"))
                  }
                  OPTIONAL {
                    ?childArray skos:prefLabel ?childArrayLabels .
                    FILTER (langMatches(lang(?childArrayLabels), "$fallback"))
                  }
                  OPTIONAL { # fallback - other language case
                    ?childArray skos:prefLabel ?childArrayLabels .
                  }
                  OPTIONAL {
                    ?childArray skos:notation ?childArrayNota .
                  }
         }
EOQ;
        }
// then in the query :
SELECT ?broad ?parent ?member ?children ?grandchildren
(SAMPLE(?lab) as ?label) (SAMPLE(?childlab) as ?childlabel) (SAMPLE(?topcs) AS ?top) (SAMPLE(?nota) as ?notation) (SAMPLE(?childnota) as ?childnotation) **$selectArrays** $fcl
WHERE {
...

This would be returned as additionnal information in the REST API (RestController:hierarchy), and then handled in hierarchy.js when building the tree.

Plus :

Minus :

Note that this does not cover the case where ThesaurusArray are used to organise top-level concepts.

The alternative approach would be to create a separate API to retrieve the array information of a list of concept, but this would necessitate one or more additionnal API calls from hierarchy.js.

Do you have any opinion/preference ? Thanks

tfrancart commented 6 years ago

Also, I am wondering whether the following clause is really necessary :

                  FILTER NOT EXISTS {
                    ?childArray skos:member ?other .
                    FILTER NOT EXISTS { ?other skos:broader ?broad }
                  }

Skosmos could simply "trust" the structure of the SKOS file, and defer necessary checks or alarms to in Skosify to warn the user about ThesaurusArrays not containing sibling concepts. Skosmos would gain in performance. I have noticed this clause is killing performance on Jena 2.4.x (to the point of generating a timeout of the query), but performs fine in Jena 3.

tfrancart commented 6 years ago

Something similar would be needed in GenericSparql::generateChildQuery, when retrieving the children of a concept. Which makes me think a separate API call could be better, since it could be used :

Something like :