Swirrl / tripod

ActiveModel-style Ruby ORM for RDF Linked Data. Works with SPARQL 1.1 HTTP endpoints.
MIT License
58 stars 14 forks source link

CriteriaExecution methods do not retrieve instance data. #41

Closed muratseyhan closed 9 years ago

muratseyhan commented 9 years ago

Execution methods for criteria objects do not retrieve any information but the URIs of the resources. resources and first methods seem to have this problem. I used the example code in the documentation on tripod (0.10.9). I had the same issue on 0.10.8.

# $project_path/app/models/person.rb
class Person
  include Tripod::Resource

  # these are the default rdf-type and graph for resources of this class
  rdf_type 'http://example.com/person'
  graph_uri 'http://example.com/people'

  field :name, 'http://example.com/name'
  field :knows, 'http://example.com/knows', :multivalued => true, :is_uri => true
  field :aliases, 'http://example.com/alias', :multivalued => true
  field :age, 'http://example.com/age', :datatype => RDF::XSD.integer
  field :important_dates, 'http://example.com/importantdates', :datatype => RDF::XSD.date, :multivalued => true
end

Adding data using the example in the documentation:

> uri = 'http://example.com/ric'
"http://example.com/ric"
> p = Person.new(uri)
#<Person:0x000000058578e8 @uri=#<RDF::URI:0x2c2bbc0 URI:http://example.com/ric>, @repository=#<RDF::Repository:0x2c2bb84()>, @new_record=true, @graph_uri=#<RDF::URI:0x2c2b9cc URI:http://example.com/people>>
> p.name = 'Ric'
"Ric"
> p.age = 31
31
> p.aliases = ['Rich', 'Richard']
["Rich", "Richard"]
> p.important_dates = [Date.new(2011,1,1)]
[Sat, 01 Jan 2011]
> p.save!
true

find method seems to work fine:

> ric = Person.find('http://example.com/ric') #=> returns a single Person object.
#<Person:0x00000005c5d5c8 @uri=#<RDF::URI:0x2e2eaa8 URI:http://example.com/ric>, @repository=#<RDF::Repository:0x2e3dcc4()>, @new_record=false, @graph_uri=#<RDF::URI:0x2e2e8f0 URI:http://example.com/people>>

> ric.age
31

> ric.name
"Ric"

> ric.aliases
["Richard", "Rich"]

resources method does not retrieve the fields for the target instances.

> people = Person.all.resources #=> returns all people as an array
#<Tripod::ResourceCollection:0x00000005c37148 @resources=[#<Person:0x00000005c32378 @uri=#<RDF::URI:0x2e19068 URI:http://example.com/ric>, @repository=#<RDF::Repository:0x2e1bda4()>, @new_record=false, @graph_uri=#<RDF::URI:0x2e18d48 URI:http://example.com/people>>], @criteria=#<Tripod::Criteria:0x000000059bc788 @resource_class=Person, @where_clauses=["?uri a <http://example.com/person>", "?uri ?p ?o"], @extra_clauses=[], @graph_uri="http://example.com/people">, @sparql_query_str=nil, @resource_class=nil, @return_graph=true>

> people.count
1

> person = people.first
#<Person:0x00000005c32378 @uri=#<RDF::URI:0x2e19068 URI:http://example.com/ric>, @repository=#<RDF::Repository:0x2e1bda4()>, @new_record=false, @graph_uri=#<RDF::URI:0x2e18d48 URI:http://example.com/people>>
> person.uri
#<RDF::URI:0x2e19068 URI:http://example.com/ric>
> person.age
nil
> person.name
nil
> person.aliases
[]

Here is the Fuseki logs for the resources call.

21:31:18 INFO  [110] POST http://127.0.0.1:3030/read_write_service/sparql
21:31:18 INFO  [110] Query = SELECT DISTINCT ?uri (<http://example.com/people> as ?graph) WHERE { GRAPH <http://example.com/people> { ?uri a <http://example.com/person> . ?uri ?p ?o } }
21:31:18 INFO  [110] exec/select
21:31:18 INFO  [110] 200 OK (9 ms) 
21:31:18 INFO  [111] POST http://127.0.0.1:3030/read_write_service/sparql
21:31:18 INFO  [111] Query =          CONSTRUCT {           ?tripod_construct_s ?tripod_construct_p ?tripod_construct_o .                    }         WHERE {           { SELECT (?uri as ?tripod_construct_s) {             SELECT DISTINCT ?uri (<http://example.com/people> as ?graph) WHERE { GRAPH <http://example.com/people> { ?uri a <http://example.com/person> . ?uri ?p ?o } }           } }           ?tripod_construct_s ?tripod_construct_p ?tripod_construct_o .                    }       
21:31:18 INFO  [111] exec/construct
21:31:18 INFO  [111] 200 OK (8 ms) 

first yields the same issue.

> person2 = Person.first
#<Person:0x00000005d25dc0 @uri=#<RDF::URI:0x2e92ea4 URI:http://example.com/ric>, @repository=#<RDF::Repository:0x2e92120()>, @new_record=false, @graph_uri=#<RDF::URI:0x2e92c74 URI:http://example.com/people>>
> person2.uri
#<RDF::URI:0x2e92ea4 URI:http://example.com/ric>
> person2.age
nil
> person2.name
nil
> person2.aliases
[]

The logs for find call.

21:35:53 INFO  [112] POST http://127.0.0.1:3030/read_write_service/sparql
21:35:53 INFO  [112] Query = SELECT * { SELECT DISTINCT ?uri (<http://example.com/people> as ?graph) WHERE { GRAPH <http://example.com/people> { ?uri a <http://example.com/person> . ?uri ?p ?o } } } LIMIT 1
21:35:53 INFO  [112] exec/select
21:35:53 INFO  [112] 200 OK (9 ms) 
21:35:53 INFO  [113] POST http://127.0.0.1:3030/read_write_service/sparql
21:35:53 INFO  [113] Query =          CONSTRUCT {           ?tripod_construct_s ?tripod_construct_p ?tripod_construct_o .                    }         WHERE {           { SELECT (?uri as ?tripod_construct_s) {             SELECT * { SELECT DISTINCT ?uri (<http://example.com/people> as ?graph) WHERE { GRAPH <http://example.com/people> { ?uri a <http://example.com/person> . ?uri ?p ?o } } } LIMIT 1           } }           ?tripod_construct_s ?tripod_construct_p ?tripod_construct_o .                    }       
21:35:53 INFO  [113] exec/construct
21:35:53 INFO  [113] 200 OK (10 ms) 

CONSTRUCT queries that hit the database do not seem to make sense, and they return empty graphs as one would expect.

fonji commented 9 years ago

Hello! Can you try something like this?

person2 = Person.first
person2.hydrate!
person2.age
ricroberts commented 9 years ago

Hi @muratseyhan. Thanks for this. I'll take a look when I get a moment.

muratseyhan commented 9 years ago

Hello! Can you try something like this?

Hi @fonji! Here is the output:

>> person2 = Person.first
#<Person:0x000000024bf3c8 @uri=#<RDF::URI:0x125f8e0 URI:http://example.com/ric>, @repository=#<RDF::Repository:0x1266730()>, @new_record=false, @graph_uri=#  <RDF::URI:0x125e06c URI:http://example.com/people>, @changed_attributes={:rdf_type=>[]}>

>> person2.hydrate!
#<RDF::NTriples::Reader:0x000000029d2860 @options={:validate=>false, :canonicalize=>false, :intern=>true, :prefixes=>{}, :encoding=>#<Encoding:UTF-8>}, @input=#<StringIO:0x000000029d1d98>, @line=".", @line_rest=nil>

>> person2.age
31
fonji commented 9 years ago

So here's your temporary fix: call hydrate! before trying to get any data from an object. That's what I do. @RicSwirrl is there a reason to return dry objects using .first and .resources? Or should this be considered a bug? I guess it can be difficult to do it with a single query and without a loop to call hydrate! on every object, no worries, I'm just curious.

muratseyhan commented 9 years ago

Thanks @fonji! I know that I can retrieve the data in other ways, but I would rather monkey patch these methods. I think this is definitely a bug (or a consequence of several bugs), since find and resources both hit the database with a CONSTRUCT query that seems to be meant for retrieving data, but is guaranteed to return empty graphs. I suspect the problem is in the generation of these queries. Consider the following query performed by resources:

CONSTRUCT {
  ?tripod_construct_s ?tripod_construct_p ?tripod_construct_o .
}         
WHERE {           
  { 
    SELECT (?uri as ?tripod_construct_s) 
    {
      SELECT DISTINCT ?uri (<http://example.com/people> as ?graph)
      WHERE { 
        GRAPH <http://example.com/people> 
        {
          ?uri a <http://example.com/person> . 
          ?uri ?p ?o
        } 
      }           
    } 
  }      
  ?tripod_construct_s ?tripod_construct_p ?tripod_construct_o .   
}       

The query does not even select the predicates and objects and have an unnecessary nested SELECT clause. It is rather easy to retrieve the desired data with a single query such as the following:

CONSTRUCT {
  ?uri ?tripod_construct_p ?tripod_construct_o .
}         
WHERE {           
  SELECT DISTINCT ?uri (<http://example.com/people> as ?graph) ?tripod_construct_p ?tripod_construct_o
  WHERE { 
    GRAPH <http://example.com/people> 
    { 
      ?uri a <http://example.com/person> . 
      ?uri ?tripod_construct_p ?tripod_construct_o
    }           
  }    
}        
fonji commented 9 years ago

I think you're right, but I'm still waiting for @RicSwirrl's opinion about this. My guess will be that it's because it used to be a describe statement instead of a construct. You only need the uri for the describe, hence the subquery returning it. Construct is way faster when you have lots of relations (which is my case and why I'm happy they changed), so I think it's a good thing but it needs a patch. If you have time to create a fork and then a pull request (I sadly don't), I guess it will be appreciated. The linked commit given above should help you find the code to patch.

muratseyhan commented 9 years ago

Thanks @fonji! That makes sense. I haven't had enough time to examine the source code to propose a fix myself, yet I will probably have the time for it this weekend. I will create a pull request if it would still be open.

Maybe I should open another issue for this, but it also seems unnecessary to hit the database twice for first and resources methods. Both yield a SELECT to retrieve the target URIs first, then a CONSTRUCT to retrieve other relevant data. I suspect this is another thing that made sense when using DESCRIBE queries.

@RicSwirrl, many thanks for this much needed project. It's also nice to see that it has started to form an active community.

ricroberts commented 9 years ago

Thanks @muratseyhan @fonji for your comments. Yeah - there's some legacy reasoning behind this. Some of the internals of Tripod are due a revisit, for performance and sense! I'll take a look this week.

fonji commented 9 years ago

Thanks @RicSwirrl, that's awesome!