the fuseki count triples very slow, when rdf ontology has ring.

brealisty commented 1 year ago

Version

fuseki2 4.4.0

Question

the rdf ontology like this: three entities A B C, A-relation1-B, A-relation2-C, B-relation3-C.
when count the thriples with: select (count(?s) as ?count) {?s ?p ?o}, very very slow, about 120s, but just only 90k triples
the memory 6G, out of memory; then set 12G, almost full.

but when I delete A-relation1-B, will be very fast, less than 3s, 60k triples; 6G memory work well.

I've tried another graph, about 120k triples, also less than 3s.

afs commented 1 year ago

Hi @brealisty - What interference setup are you using? How many different cycles are there?

brealisty commented 1 year ago

@afs I can provide more informations.

1. Design a rdf ontology with protege, generate a kg.owl file.

class: A B C， obejct properties: A :hasInclude B, A :hasApear C, B hasApear C.

2. generate a kg.ttl file with jena_d2rq-0.8.1 via mysql.

generate-mapping.bat -u xxx -p xxx -o kg.ttl jdbc:mysql://locahost:port/database_name?useSSL=false mysql data structure: table A, Table B, Table C, Table A-B-map, Table A-C-map, Table B-C-map.

3.modify the kg.ttl to fit kg.owl

I don't think this step matters, because another version without A :hasInclude B, do the same thing.

4. generate kg.nt file with jena_d2rq-0.8.1 via kg.ttl.

dump-rdf.bat -o kg.nt kg.ttl

5. generate tdb folder with apache-jena-4.4.0

tdbloader.bat --loc=./tdb/ kg.nt

6. copy the kg.owl to fuseki's `run/databases/` path

cp kg.owl path_to/apache-jena-fuseki-4.4.0/run/databases/ontology.ttl

7. create a kg_cf.ttl file, and Fill the following configuration into the file

@prefix :      <http://base/#> .
@prefix tdb:   <http://jena.hpl.hp.com/2008/tdb#> .
@prefix rdf:   <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
@prefix ja:    <http://jena.hpl.hp.com/2005/11/Assembler#> .
@prefix rdfs:  <http://www.w3.org/2000/01/rdf-schema#> .
@prefix fuseki: <http://jena.apache.org/fuseki#> .

:service1        a                fuseki:Service ;
    fuseki:dataset                    <#dataset> ;
    fuseki:name                       "kg_demo" ;
    fuseki:serviceQuery               "query" , "sparql" ;
    fuseki:serviceReadGraphStore      "get" ;
    fuseki:serviceReadWriteGraphStore "data" ;
    fuseki:serviceUpdate              "update" ;
    fuseki:serviceUpload              "upload" .

<#dataset> rdf:type ja:RDFDataset ;
    ja:defaultGraph <#model_inf> ;
    .

<#model_inf> a ja:InfModel ;
    ja:baseModel <#tdbGraph> ;
    ja:content [ja:externalContent <../databases/ontology.ttl> ] ;

    ja:reasoner [ja:reasonerURL <http://jena.hpl.hp.com/2003/OWLFBRuleReasoner>] .

<#tdbGraph> rdf:type tdb:GraphTDB ;
    tdb:dataset <#tdbDataset> ;
    .

<#tdbDataset> rdf:type tdb:DatasetTDB ;
    tdb:location "tdb" ;
    .

8. start fuseki server

fuseki-server.bat

I do the same step, but only delete the object properties A :hasInclude B in ontology. this version will work will, count triples very fast. so I think maybe the cycle structure ontology make this issue.

LorenzBuehmann commented 1 year ago

I do not see any cycle in your "ontology". I mean neither did you mention any class hierarchy axiom not any property hierarchy axioms (resp. triples).

obejct properties: A :hasInclude B, A :hasApear C, B hasApear C. What does this mean, you defined domain and range for each property?

That said, for your current ontology, a more light weight reasoner would be better, all you have is domain and range. Try some RDFS only profile, e.g. RDFS simple.

The only rules being applied are

T(s, p, o), T(p, rdfs:domain, C) => T(s, rdf:type, C)
T(s, p, o), T(p, rdfs:range, C) => T(o, rdf:type, C)

each has to be applied just once, no fix point iteration would be needed as the conclusions do not produce triples for the rules premises.

Also, computing the inferences once and writing it into a TDB(2) database would be way more efficient for any future query. Wouldn't that be sufficient for you? For static data this would avoid i) loading all the data in-memory and ii) computing the inferences on each start and initial query.

apache / jena