epimorphics / elda

Epimorphics implementation of the Linked Data API
Other
53 stars 27 forks source link

Generated query ListEndpoint doesn't include DISTINCT #173

Closed rwalkerands closed 7 years ago

rwalkerands commented 7 years ago

Well, this seems to be a defect, but perhaps you can advise otherwise. At least, if it's not a defect, it's not obvious to me (after looking through the linked data spec and the source code) what the fix/workaround should be.

I'm getting duplicates on a ListEndpoint, because the generated SPARQL query doesn't include DISTINCT, and duplicates aren't otherwise being filtered out.

I attach instrument-test.txt, a cut-down version of the spec file. See the svoc:conceptLabelContainsEndpoint endpoint.

You can see the current behaviour at: http://vocabs.ands.org.au/repository/api/lda/aodn/aodn-instrument-vocabulary/version-1-0/concept?labelcontains=SMP-ODO for HTML, or for XML: http://vocabs.ands.org.au/repository/api/lda/aodn/aodn-instrument-vocabulary/version-1-0/concept.xml?labelcontains=SMP-ODO

That "live" server is running an older version of the Elda library, but I've confirmed the behaviour is the same with 1.3.19.

The generated query (copy/pasted from the Sesame log file) is:

PREFIX  skos: <http://www.w3.org/2004/02/skos/core#>
PREFIX  rdfs: <http://www.w3.org/2000/01/rdf-schema#>

SELECT  ?item
WHERE
  {   { ?item skos:prefLabel ?l }
    UNION
      { ?item skos:altLabel ?l }
    UNION
      { ?item rdfs:label ?l }
    FILTER regex(str(?l), "SMP-ODO", "i")
  }
OFFSET  0
LIMIT   10

So not "SELECT DISTINCT". In this case, it seems that there's a match against both the concept's prefLabel and the altLabel, and the query (against Sesame server) is thus returning two results.

instrument-test.txt

rwalkerands commented 7 years ago

Hello?

ehedgehog commented 7 years ago

Sorry, Richard, I was caught up in something else; I'll look at the issue today.

Chris

On 21 November 2016 at 23:51, Richard Walker notifications@github.com wrote:

Hello?

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/epimorphics/elda/issues/173#issuecomment-262104932, or mute the thread https://github.com/notifications/unsubscribe-auth/AAb1FBC7Rel_cZ6VANfLUPhwTPAcyphRks5rAi5kgaJpZM4KzOkj .

Chris "allusive" Dollin

ehedgehog commented 7 years ago

Hi Richard

You're right that the linked data api spec says nothing (that I ever found) on this, and that the default case is that no DISTINCT is applied. (This avoids a possibly expensive unnecessary set construction and iteration.) To force a DISTINCT you can supply an ordering (which has its own costs, of course.)

Other that supplying an ordering, you could use an api:select and write the entire query explicitly: your example is already half-way there with its use of api:where.

I presume that having the client code do the DISTINCT wouldn't be convenient ...

It should be too hard to add another Elda bell-or-whistle to define a DISTINCT selector or recognise (and drop) a well-known ordering, but I'm reluctant to do that unless really necessary; current Elda has lots of little knobs to tweak and its a pain keeping track of them all.

Chris

On 16 November 2016 at 01:10, Richard Walker notifications@github.com wrote:

Well, this seems to be a defect, but perhaps you can advise otherwise. At least, if it's not a defect, it's not obvious to me (after looking through the linked data spec and the source code) what the fix/workaround should be.

I'm getting duplicates on a ListEndpoint, because the generated SPARQL query doesn't include DISTINCT, and duplicates aren't otherwise being filtered out.

I attach instrument-test.txt, a cut-down version of the spec file. See the svoc:conceptLabelContainsEndpoint endpoint.

You can see the current behaviour at: http://vocabs.ands.org.au/repository/api/lda/aodn/aodn- instrument-vocabulary/version-1-0/concept?labelcontains=SMP-ODO for HTML, or for XML: http://vocabs.ands.org.au/repository/api/lda/aodn/aodn- instrument-vocabulary/version-1-0/concept.xml?labelcontains=SMP-ODO

That "live" server is running an older version of the Elda library, but I've confirmed the behaviour is the same with 1.3.19.

The generated query (copy/pasted from the Sesame log file) is:

PREFIX skos: http://www.w3.org/2004/02/skos/core# PREFIX rdfs: http://www.w3.org/2000/01/rdf-schema#

SELECT ?item WHERE { { ?item skos:prefLabel ?l } UNION { ?item skos:altLabel ?l } UNION { ?item rdfs:label ?l } FILTER regex(str(?l), "SMP-ODO", "i") } OFFSET 0 LIMIT 10

So not "SELECT DISTINCT". In this case, it seems that there's a match against both the concept's prefLabel and the altLabel, and the query (against Sesame server) is thus returning two results.

instrument-test.txt https://github.com/epimorphics/elda/files/593491/instrument-test.txt

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/epimorphics/elda/issues/173, or mute the thread https://github.com/notifications/unsubscribe-auth/AAb1FMvL7H3uMjUFPJxyNHmZ79AhGzsiks5q-lgcgaJpZM4KzOkj .

Chris "allusive" Dollin

rwalkerands commented 7 years ago

OK, so nothing "wrong", and the quickest (best?) way to "fix" is to change the definition of the endpoint to use api:select instead of api:where.

I changed the endpoint definition to this:

svoc:conceptLabelContainsEndpoint a api:ListEndpoint
                ; rdfs:comment "List concepts where a skos label property contains this text, case-insensitive"
        ; api:uriTemplate  "/aodn/aodn-instrument-vocabulary/version-1-0/concept?labelcontains={text}"
                ; api:exampleRequestPath  "/aodn/aodn-instrument-vocabulary/version-1-0/concept?labelcontains=cambrian"
                ; api:selector [
                                api:select
                                        """PREFIX  skos: <http://www.w3.org/2004/02/skos/core#>
                                           PREFIX  rdfs: <http://www.w3.org/2000/01/rdf-schema#>
                                           SELECT DISTINCT ?item
                                           WHERE {
                                            { ?item skos:prefLabel ?l }
                                                        UNION
                                                        { ?item skos:altLabel ?l }
                                                        UNION
                                                        { ?item rdfs:label ?l }
                                                        FILTER regex( str(?l) , ?text , 'i' )
                    } """
                ]
                ; api:defaultViewer svoc:basicConceptViewer
                ; api:viewer  api:basicViewer,svoc:basicConceptViewer
                .

Works perfectly. The log shows that the OFFSET and LIMIT clauses are correctly appended.

Oops, forgot to say: once again, thanks so much for your advice/help.

I have nothing further to add ... if you don't either, happy for this to be closed.