drlivingston / kr

Clojure API for RDF and SPARQL - provides consistent access to APIs including Jena and Sesame
56 stars 17 forks source link

Bug or pattern not well formed on query function ? #8

Closed nicolasGuillouet closed 10 years ago

nicolasGuillouet commented 10 years ago

Hi,

first : thanks for your project. It's very usefull and seems to correspond to what i need ;-)

I make some tests for discovering KR and RDF manipulation on Clojure (i'm newbie on Clojure). I had some troubles when i ran some queries with patterns, so i tried to see strings produced from differents functions :


(deftest test-gitub-issue
  (let [
        source (load-models ["musical_work_source_desc.owl" "musical_work_source_source1.owl"]);  load XML/RDF into one Model
        store (jena-kb-from-model source)
        pattern '((
                   http://exmo.inrialpes.fr/connectors/example_musical_work_source1/source1
                   http://exmo.inrialpes.fr/connectors#opus
                   ?/l
                     ))
        vars (variables pattern)
        non-vars (symbols-no-vars pattern)
        namespaces (distinct (map namespace non-vars))
        prefixes (get-prefixes-from-namespaces namespaces)]
    (println "variables " vars)
    (println "vars " vars)
    (println "non-vars " non-vars)
    (println "namespaces " namespaces)
    (println "prefixes " prefixes)
    (println "prefix-block " (prefix-block prefixes))
    (println "SELECT : " (apply str "SELECT " *select-type*
                                (interleave (map sym-to-sparql vars) (repeat " "))))
    (println "sparql-query-body " (sparql-query-body pattern))
    (println "sparql-select-query" (sparql-select-query pattern [])) 
    (println  " jena-query-pattern" (jena-query-pattern store pattern []) " pattern : " pattern))
  )

Witch gives :


variables  #{?/l}
vars  #{?/l}
non-vars  #{http://exmo.inrialpes.fr/connectors#opus http://exmo.inrialpes.fr/connectors/example_musical_work_source1/source1}
namespaces  (http://exmo.inrialpes.fr http://exmo.inrialpes.fr/connectors/example_musical_work_source1)
prefixes  ()
prefix-block  
SELECT :  SELECT ?l 
sparql-query-body   <http://exmo.inrialpes.fr/connectors/example_musical_work_source1source1> <http://exmo.inrialpes.frconnectors#opus> ?l .  

sparql-select-query SELECT ?l 
WHERE {  <http://exmo.inrialpes.fr/connectors/example_musical_work_source1source1> <http://exmo.inrialpes.frconnectors#opus> ?l .  
}
 jena-query-pattern ()  pattern :  ((http://exmo.inrialpes.fr/connectors/example_musical_work_source1/source1 http://exmo.inrialpes.fr/connectors#opus ?/l))

As you can see, last '/' is missing on URIs used inside query body (ie : http://exmo.inrialpes.frconnectors#opus and http://exmo.inrialpes.fr/connectors/example_musical_work_source1source1). Is that a bug or should I declare my pattern differently ?

Thanks for your help.

Nicolas

ps : if it is a bug, it seems to come from sym-to-long-name function ... ;-)

drlivingston commented 10 years ago

Hi Nicolas,

I wanted to respond quick first, but I'm happy to follow up more.

You can certainly use the lower-level functions to manipulate SPARQL strings etc. I do it all the time. But if you have a KB object (which you can coerce most Jena and Sesame things into) you can use some of the higher level functions too. like ask and query you can see some examples of this in kr-core tests specifically the sparql tests.

Regarding the behavior you are seeing with the slashes. This is because the pattern-based language has no way of knowing if you are trying to use a full URI or you are using Clojure symbols. What is happening is what you type in is first being passed through the Clojure reader that's going to create a symbol and cut it somewhere on a slash to divide namespace and name.

KR writes symbols back by looking up the namespace in the list of namespaces registered with the KB being queried, if it's there it converts that short name to a long name, if it's not it just uses the namespace verbatim. Then it concatenates that to the name (localname) to make a URI. So effectively that's how / why your slash got "eaten".

The preferred way to deal with this is define namespace mappings and use those. Or if you must type in a whole URI like that you can hack around it by putting in another slash on the namespace. -- I might have put in another way into the system too I forgot I'll look at that code and try to remember...

I'll write more soon and help more. But I just wanted to put a few pointers out there to what's going on.

I'm glad you like the library and I'm happy to help. Kevin

nicolasGuillouet commented 10 years ago

Hi Kevin,

thanks for your quick response ;-).

My goal is to use query and ask (I only ran low level functions for understanding where my problem was ;-))

FYI i just made an other test where i register namespaces and where i use two patterns : one with full URI and one other with prefixes :

(deftest test-gitub-issue
  (let [
        source (load-models ["musical_work_source_desc.owl" "musical_work_source_source1.owl"]);  load XML/RDF into one Model
        store (jena-kb-from-model source)]

    (binding [*kb*  (register-namespaces store '(
                                 ("data" "http://exmo.inrialpes.fr/connectors/")
                                 ("exmo" "http://exmo.inrialpes.fr/connectors#")
                                 ("rdf" "http://www.w3.org/1999/02/22-rdf-syntax-ns#")
                                 ("rdfs" "http://www.w3.org/2000/01/rdf-schema#")
                                 ("skos" "http://www.w3.org/2004/02/skos/core#")
                                 ("foaf" "http://xmlns.com/foaf/0.1/")
                                 ))]
      (doseq [pattern [
                       '((
                          http://exmo.inrialpes.fr/connectors/example_musical_work_source1/source1
                          http://exmo.inrialpes.fr/connectors#opus
                          ?/l
                          ))
                       '((
                          data/example_musical_work_source1/source1
                          exmo/opus
                          ?/l
                          ))]]
        (let [vars (variables pattern)
              non-vars (symbols-no-vars pattern)
              namespaces (distinct (map namespace non-vars))
              prefixes (get-prefixes-from-namespaces namespaces)]
          (println "\nPATTERN : " pattern)
          ;(println "vars " vars)
          ;(println "non-vars " non-vars)
          (println "namespaces " namespaces)
          (println "prefixes " prefixes)
          ;(println "prefix-block " (prefix-block prefixes))
          ;(println "sparql-query-body " (sparql-query-body pattern))
          (println "sparql-select-query" (sparql-select-query pattern [])) 
          ;(println  " jena-query-pattern" (jena-query-pattern store pattern []) " pattern : " pattern)
          ))))

  )

Witch gives :

PATTERN :  ((http://exmo.inrialpes.fr/connectors/example_musical_work_source1/source1 http://exmo.inrialpes.fr/connectors#opus ?/l))
non-vars  #{http://exmo.inrialpes.fr/connectors#opus http://exmo.inrialpes.fr/connectors/example_musical_work_source1/source1}
namespaces  (http://exmo.inrialpes.fr http://exmo.inrialpes.fr/connectors/example_musical_work_source1)
prefixes  ()
sparql-select-query SELECT ?l 
WHERE {  <http://exmo.inrialpes.fr/connectors/example_musical_work_source1source1> <http://exmo.inrialpes.frconnectors#opus> ?l .  
}

PATTERN :  ((data/source1 exmo/opus ?/l))
non-vars  #{exmo/opus data/source1}
namespaces  (exmo data)
prefixes  ([exmo http://exmo.inrialpes.fr/connectors#] [data http://exmo.inrialpes.fr/connectors/example_musical_work_source1/])
sparql-select-query PREFIX exmo: <http://exmo.inrialpes.fr/connectors#> 
PREFIX data: <http://exmo.inrialpes.fr/connectors/example_musical_work_source1/> 
SELECT ?l 
WHERE {  <http://exmo.inrialpes.fr/connectors/example_musical_work_source1/source1> <http://exmo.inrialpes.fr/connectors#opus> ?l .  
}

Then, you right : no problem if we use registered namespaces ...

For the case where the symbol is full URI, could we not doing things differently : using directly the symbol if it is like /http:\/\/.*/ (in function sym-to-sparql) ?

Thanks

Nicolas

drlivingston commented 10 years ago

I have thought about doing something like that (and I run into this from time to time with my own queries) but it will start to muddy up the distinction between URIs and and Clojure symbols. And it will create special cases for what is acceptable for symbol namespaces. http:// isn't the only common prefix for URIs. You are also dependent on the Clojure reader then which may or may not interpret your URI entirely the way you think it will - as you've already kind of discovered (for example, see the discussion here http://clojure.org/reader )

If you don't want to create namespaces (and I think this is by far the most common way for interacting with RDF), for the time being you can explicitly create symbols that will be interpreted the way you want (symbol ns name). I don't know what will happen for returning values with unknown namespaces though. I think they will be cut like the reader cuts them.

In the long run I think item-to-sparql in kr-core / sparql needs to be extended to take a few more types. For example Java URLs that could be put right into the patterns too. That's something I thought about for a while and haven't yet got to it.

Kevin

nicolasGuillouet commented 10 years ago

i agree with you : using namespaces is a best practice... But some times i could not use it. I create URI's where i have more than one sub element behind the main namespace. For example : "http://data.discotheka.com/musical_work/mozart/piano/bar-123456". If i wanted to use only namespaces, i should declare "http://data.inria.fr/musical_work/mozart/piano/" as a namespace (this is mainly the case for all individuals i create on my store) :-(. Otherwise, after reading the doc on the reader, I understand now why we have this problem with slashes ... I did the test with (symbol ns name), as you expected, it gives the same result.

About, the item-to-sparql function, i could contribute if you want ...

Nicolas

nicolasGuillouet commented 10 years ago

Oups, no using (symbol ns name) works fine ;-) :

(deftest test-gitub-issue
  (let [
        source (load-models ["musical_work_source_desc.owl" "musical_work_source_source1.owl"]);  load XML/RDF into one Model
        store (jena-kb-from-model source)]

    (binding [*kb*  (register-namespaces store '(
                                 ("exmo" "http://exmo.inrialpes.fr/connectors#")
                                 ("rdf" "http://www.w3.org/1999/02/22-rdf-syntax-ns#")
                                 ("rdfs" "http://www.w3.org/2000/01/rdf-schema#")
                                 ("skos" "http://www.w3.org/2004/02/skos/core#")
                                 ("foaf" "http://xmlns.com/foaf/0.1/")
                                 ))]
      (doseq [pattern [
                       (list 
                             (symbol "http://exmo.inrialpes.fr/connectors/example_musical_work_source1/" "source1")
                             'http://exmo.inrialpes.fr/connectors#opus
                             '?/l
                              )
                       (list
                             (symbol "exmo-test/example_musical_work_source1" "source1")
                             'exmo/opus
                             '?/l
                          )]]
        (let [vars (variables pattern)
              non-vars (symbols-no-vars pattern)
              namespaces (distinct (map namespace non-vars))
              prefixes (get-prefixes-from-namespaces namespaces)]
          (println "\nPATTERN : " pattern)
          ;(println "vars " vars)
          ;(println "non-vars " non-vars)
          (println "namespaces " namespaces)
          (println "prefixes " prefixes)
          ;(println "prefix-block " (prefix-block prefixes))
          ;(println "sparql-query-body " (sparql-query-body pattern))
          (println "sparql-select-query" (sparql-select-query pattern [])) 
          ;(println  " jena-query-pattern" (jena-query-pattern store pattern []) " pattern : " pattern)
          ))))

  )

Gives :

PATTERN :  (http://exmo.inrialpes.fr/connectors/example_musical_work_source1//source1 http://exmo.inrialpes.fr/connectors#opus ?/l)
namespaces  (http://exmo.inrialpes.fr/connectors/example_musical_work_source1/ http://exmo.inrialpes.fr)
prefixes  ()
sparql-select-query SELECT ?l 
WHERE {  <http://exmo.inrialpes.fr/connectors/example_musical_work_source1/source1> <http://exmo.inrialpes.frconnectors#opus> ?l . }

PATTERN :  (exmo-test/example_musical_work_source1/source1 exmo/opus ?/l)
namespaces  (exmo exmo-test/example_musical_work_source1)
prefixes  ([exmo http://exmo.inrialpes.fr/connectors#])
sparql-select-query PREFIX exmo: <http://exmo.inrialpes.fr/connectors#> 
SELECT ?l 
WHERE {  <exmo-test/example_musical_work_source1source1> <http://exmo.inrialpes.fr/connectors#opus> ?l . }

As we can see, the pattern has a double slash and of course it does not work if i use a ns inside symbol function.

But, i does not resolve my problem, i would prefer to not have to use the symbol function and calling the pattern directly with the URI as you proposed ;-).

Nicolas

drlivingston commented 10 years ago

What do you mean when you say it does not work if you use a ns inside a symbol function? (to be clear symbol is a Clojure function, it's exactly what the reader is calling when it sees foo/bar this is the same as the Clojure code (symbol "foo" "bar"). Those symbols however they are made are then passed on to the KR API.)

The pattern language is designed to work with Clojure data structures and symbols. Something like exmo-test/example_musical_work_source1/source1 is a bizarre symbol. Multiple slashes in symbols are handled in a complicated way by the Clojure reader. Unfortunately it's not like CommonLisp where any string can be used with ease. There are no escapes in the Clojure reader that can make it easy to delineate exactly what you want. That's why I proposed calling symbol directly.

KR also doesn't support nested namespaces if that is what you are trying to do? It will always match to the longest though.

I'm still trying to completely understand what you are doing though?

i would prefer to not have to use the symbol function and calling the pattern directly with the URI as you proposed ;-).

To be clear you'll never be able to just put a URI in you would need to annotate it somehow. (as again everything has to go through the Clojure reader first) But I could add a feature for the pattern langague understanding java.net.URI as elements. Then patterns could contain things like

(URI. "http://example.com/foo")

Kevin

drlivingston commented 10 years ago

By the way you can use syntax-quoting instead of composing the list

`(~(symbol "exmo-test/example_musical_work_source1" "source1")
   exmo/opus
   ?/l)

is equivalent to:

(list (symbol "exmo-test/example_musical_work_source1" "source1")
      'exmo/opus
      '?/l)
drlivingston commented 10 years ago

Regarding converting from short names to long names:

When KR is going to a long name it take a Clojure symbol, looks up the namespace of that symbol in the short-name-to-long-name namespace hash, and creates a URI by concatenating that return value with the name of the symbol. (if there is no match in the hash it just uses the namespace verbatim)

When it's converting back from a URI, it looks for the longest string matching a prefix of the URI in the namespace table. It then cuts that off and uses what's left as the symbol name. It then returns a symbol where the namespace is the short name from the table and the name is what was left.

That's already a lot more string munging then I want to do, and a lot more "convenience" then you get from vanilla Jena or Sesame.

To look for parts of namespaces that would be even more overhead that's not in the general case use-case. To be clear in some of my own work we have a base namespace that we use to extend a lot of other namespaces. That results in a lot of namespaces, but there really is no great way around that. So if someone else controls namespace foo and bar and we want to add something to them I end up with namespaces like basefoo, basebar, etc. A tad "messy" but completely explicit about what's going on.

drlivingston commented 10 years ago

alternately if you had base-namespace/extended-namespace/local-name and you only registered base-namespace you could create symbols this way:

(symbol "base-namespace" "extended-namespace/local-name")

that would bypass the reader's desire to cut the symbol on the last slash.

nicolasGuillouet commented 10 years ago

Hi Kevin, ok great if I can use (URI. "http://example.com/foo"). As I said the second time using symbol works fine ;-). Calling (symbol "base-namespace" "extended-namespace/local-name") would certainly not be usefull for my use cases (I tested it, because I tried to resolve this problem ;-) ).

Thanks for the syntax-quoting, it's a powerfull clojure capacity ...

drlivingston commented 10 years ago

I've put in a fix for this into github. The RDF functions and the SPARQL functions now support java.net.URI everywhere that Clojure symbols could previously be used. The SPARQL API also allows the pattern language to be extended via sparql-ify. There's examples in the tests for RDF and SPARQL in kr-core.

I'm working on a patch for another problem I have discovered in the reification code for the forward-chaining system. When that is done I will push another release to Clojars.

nicolasGuillouet commented 10 years ago

Great ;-)

Thanks

drlivingston commented 10 years ago

The update has be released and is available in version 1.4.16.

A few other changes in 1.4.16 that shouldn't impact you. (Or only for the positive if you had those bugs too.) See release notes: https://github.com/drlivingston/kr/wiki/Release-notes

nicolasGuillouet commented 10 years ago

I think it wouldn't be useful for me. But it seems not working with properties paths (using (URI. "http://www.w3.org/2001/XMLSchema#integer) in place of xsd/integer, see : https://github.com/drlivingston/kr/blob/master/kr-core/src/test/clojure/edu/ucdenver/ccp/test/kr/test_sparql_property_paths.clj). I receive a ClassCastException java.net.URI cannot be cast to clojure.lang.Named

drlivingston commented 10 years ago

Oh, so this isn't with property paths but with types on literals, right? (Although I'm not sure property paths have been tested either, I'll add more tests.)

So there's a problem on line 120 sparql.clj It calls sym-to-sparql assuming that the type argument for a "boxed" literal is a symbol. This should call the new function sparql-ify instead.

Can you tell me exactly what you did to get the error you are seeing? Just so I make sure I have caught the right thing? I'll double check that. Add tests for typed literals, fixing the above error. And add some tests for property paths too.

I'll shoot to have these things corrected and a new version pushed in the next 24 hours or so. Thank you for continuing to test the code.

drlivingston commented 10 years ago

There is probably also a bug in the RDF portion of the API too for adding / querying triples with typed literals like this. I'll try to track that down there too and add some tests for it as well.

nicolasGuillouet commented 10 years ago

I did this test in REPL :

(require 
            ['edu.ucdenver.ccp.kr.kb :as 'kr-kb]
            ['edu.ucdenver.ccp.kr.rdf :as 'kr-rdf]
            ['edu.ucdenver.ccp.kr.sparql :as 'kr-sparql]
            ['edu.ucdenver.ccp.kr.jena.kb :as 'kr-jena]
            ['edu.ucdenver.ccp.kr.jena.sparql :as 'kr-jena-sparql]
            ['edu.ucdenver.ccp.kr.jena.rdf :as 'kr-jena-rdf])
(import [java.net URI])

(def store (kr-rdf/register-namespaces (kr-kb/kb :jena-mem) '(
                                                   ("rdf" "http://www.w3.org/1999/02/22-rdf-syntax-ns#")
                                                   ("rdfs" "http://www.w3.org/2000/01/rdf-schema#")
                                                   ("disco" "http://data.discotheka.com/ontology/genres/")
                                                   ("mo" "http://purl.org/ontology/mo/")
                                                   ("xsd" "http://purl.org/ontology/mo/")
                                                   )))

(kr-rdf/add store '(mo/musical_work1 mo/opus ["BWV 830" rdfs/Literal]))
(kr-sparql/ask store `(~(URI. "http://purl.org/ontology/mo/musical_work1") ~(URI. "http://purl.org/ontology/mo/opus") ~["BWV 830" 'rdfs/Literal]))

return true. Have to note that strangely first call on new REPL return false. When we output the datas inside the model, we can see that the opus Property does not have rdf/Literal DataType :-(. It works fine on the second time test running. An other bug ?

(kr-sparql/ask store `(~(URI. "http://purl.org/ontology/mo/musical_work1") ~(URI. "http://purl.org/ontology/mo/opus") ~["BWV 830" (URI. "http://www.w3.org/2000/01/rdf-schema#Literal")]))

Produce the ClassCastException java.net.URI cannot be cast to clojure.lang.Named.

drlivingston commented 10 years ago

I'm working on checking in the changes for this now. I cannot replicate your behavior with having to ask twice and it not working the first time. (And if there was something I would probably assume it's with the underlying store.) If you can replicate that issue with then new release I'm about to push 1.4.17 please open a new issue with your example.

Regarding your example above I have added tests that cover these types of things. Is your example something you really do, or just trying to check if something is broken? Because you use a weird namespace for xsd, and you are typing a string as rdfs/Literal which I don't think ever gets used like that as an extended type -- although for testing it's good to try something that isn't xsd because I found a bug in how my Jena code was handling non-xsd types. (Now I think it's doing it correctly, before it was making them all null.)

Added RDF tests and added SPARQL tests and property path tests.

nicolasGuillouet commented 10 years ago

Thanks Kevin, it isn't real test just for reproducing the ClassCastException. I also reproduce a bug I discovered : when I ran some tests with emacs / cider-repl, first call produce some failures I didn't have on second call ... I try to reproduce It and I open a new issue ;-)

Nicolas