apache / jena

Apache Jena
https://jena.apache.org/
Apache License 2.0
1.11k stars 652 forks source link

the fuseki count triples very slow, when rdf ontology has ring. #1852

Open brealisty opened 1 year ago

brealisty commented 1 year ago

Version

fuseki2 4.4.0

Question

but when I delete A-relation1-B, will be very fast, less than 3s, 60k triples; 6G memory work well.

I've tried another graph, about 120k triples, also less than 3s.

afs commented 1 year ago

Hi @brealisty - What interference setup are you using? How many different cycles are there?

brealisty commented 1 year ago

@afs I can provide more informations.

1. Design a rdf ontology with protege, generate a kg.owl file.

class: A B C, obejct properties: A :hasInclude B, A :hasApear C, B hasApear C.

2. generate a kg.ttl file with jena_d2rq-0.8.1 via mysql.

generate-mapping.bat -u xxx -p xxx -o kg.ttl jdbc:mysql://locahost:port/database_name?useSSL=false mysql data structure: table A, Table B, Table C, Table A-B-map, Table A-C-map, Table B-C-map.

3.modify the kg.ttl to fit kg.owl

I don't think this step matters, because another version without A :hasInclude B, do the same thing.

4. generate kg.nt file with jena_d2rq-0.8.1 via kg.ttl.

dump-rdf.bat -o kg.nt kg.ttl

5. generate tdb folder with apache-jena-4.4.0

tdbloader.bat --loc=./tdb/ kg.nt

6. copy the kg.owl to fuseki's run/databases/ path

cp kg.owl path_to/apache-jena-fuseki-4.4.0/run/databases/ontology.ttl

7. create a kg_cf.ttl file, and Fill the following configuration into the file

@prefix :      <http://base/#> .
@prefix tdb:   <http://jena.hpl.hp.com/2008/tdb#> .
@prefix rdf:   <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
@prefix ja:    <http://jena.hpl.hp.com/2005/11/Assembler#> .
@prefix rdfs:  <http://www.w3.org/2000/01/rdf-schema#> .
@prefix fuseki: <http://jena.apache.org/fuseki#> .

:service1        a                fuseki:Service ;
    fuseki:dataset                    <#dataset> ;
    fuseki:name                       "kg_demo" ;
    fuseki:serviceQuery               "query" , "sparql" ;
    fuseki:serviceReadGraphStore      "get" ;
    fuseki:serviceReadWriteGraphStore "data" ;
    fuseki:serviceUpdate              "update" ;
    fuseki:serviceUpload              "upload" .

<#dataset> rdf:type ja:RDFDataset ;
    ja:defaultGraph <#model_inf> ;
    .

<#model_inf> a ja:InfModel ;
    ja:baseModel <#tdbGraph> ;
    ja:content [ja:externalContent <../databases/ontology.ttl> ] ;

    ja:reasoner [ja:reasonerURL <http://jena.hpl.hp.com/2003/OWLFBRuleReasoner>] .

<#tdbGraph> rdf:type tdb:GraphTDB ;
    tdb:dataset <#tdbDataset> ;
    .

<#tdbDataset> rdf:type tdb:DatasetTDB ;
    tdb:location "tdb" ;
    .

8. start fuseki server

fuseki-server.bat

I do the same step, but only delete the object properties A :hasInclude B in ontology. this version will work will, count triples very fast. so I think maybe the cycle structure ontology make this issue.

LorenzBuehmann commented 1 year ago

I do not see any cycle in your "ontology". I mean neither did you mention any class hierarchy axiom not any property hierarchy axioms (resp. triples).

obejct properties: A :hasInclude B, A :hasApear C, B hasApear C. What does this mean, you defined domain and range for each property?

That said, for your current ontology, a more light weight reasoner would be better, all you have is domain and range. Try some RDFS only profile, e.g. RDFS simple.

The only rules being applied are

T(s, p, o), T(p, rdfs:domain, C) => T(s, rdf:type, C)
T(s, p, o), T(p, rdfs:range, C) => T(o, rdf:type, C)

each has to be applied just once, no fix point iteration would be needed as the conclusions do not produce triples for the rules premises.

Also, computing the inferences once and writing it into a TDB(2) database would be way more efficient for any future query. Wouldn't that be sufficient for you? For static data this would avoid i) loading all the data in-memory and ii) computing the inferences on each start and initial query.