NatLibFi / Skosmos

Thesaurus and controlled vocabulary browser using SKOS and SPARQL
Other
229 stars 95 forks source link

Search results with notation code are not sorted by natural notation sort strategy #1209

Open osma opened 3 years ago

osma commented 3 years ago

At which URL did you encounter the problem?

http://localhost/Skosmos/hklj/en/

What steps will reproduce the problem?

  1. Set up local Skosmos installation with YKL/PLC and configure its skosmos:sortByNotation setting to natural
  2. Search by notation code "32" in PLC
  3. Look at the search results (either the suggestion box or the search result page)

What is the expected output? What do you see instead?

Expected that the search results would be ordered by notation code in natural order, as configured (e.g. 3.2 comes before 32.18). But they are still in lexical order:

image

The reason is that search results are ordered by a SPARQL query which doesn't implement a natural sort order for notation codes. Implementing natural sort in an ORDER BY clause would probably be difficult as the SPARQL 1.1 standard doesn't provide such a sorting function. It might be possible with a Jena extension function. Even the Java standard library doesn't seem to have a natural sorting implementation, but there are other implementations.

Setting this to Blue Sky since it's unlikely to be of practical value for any of the vocabularies we know about (all Finto vocabularies with searchByNotation enabled use lexical sort order) and is difficult to implement.

osma commented 3 years ago

Note also that when searching by notation code, the search results will always be ordered by notation code (regardless of the skosmos:sortByNotation value - even if set to false). E.g. in the above screenshot "Finland's political system" and other terms starting with Finland are sorted after "Politics" and "Philosophy".

When searching by label, notation codes do not affect the order of search results. Here, the first result "Finland" has the notation code "42" while later results have smaller notation code values:

image

I think this is as it should be. What kind of order to use for results depends on the type of search query (label or notation), not on a configuration setting.