dice-group / Ontolearn

Ontolearn is an open-source software library for explainable structured machine learning in Python. It learns OWL class expressions from positive and negative examples.
https://ontolearn-docs-dice-group.netlify.app/index.html
MIT License
41 stars 9 forks source link

Retrieval via Triplestore #290

Closed Demirrr closed 1 year ago

Demirrr commented 1 year ago

Feature Description

  1. Triple stores are fast
  2. A DL concept can be represented as SPARQL query

We should perform the instance retrieval (i.e. retriving individuals beloning to a particular DL concept) via a triple store.

Possible solution

We have the DL2SPARQL library which contains a correct SPARQL mapping of DL concepts.

Below, there is an example of loading and launching a triple store on the family knowledge base. This triple score is later can be queries via SPARL query.

mkdir Fuseki && cd Fuseki

# Install Jena 
wget https://archive.apache.org/dist/jena/binaries/apache-jena-4.7.0.tar.gz
# Install Jena-Fuseki.
wget https://archive.apache.org/dist/jena/binaries/apache-jena-fuseki-4.7.0.tar.gz
# Unzip files
tar -xzf apache-jena-fuseki-4.7.0.tar.gz
tar -xzf apache-jena-4.7.0.tar.gz
# Create folder for triple-store
mkdir -p Fuseki/apache-jena-fuseki-4.7.0/databases/family/

# Loading
Fuseki/apache-jena-4.7.0/bin/tdb2.tdbloader --loader=parallel --loc Fuseki/apache-jena-fuseki-4.7.0/databases/family/databases/family/ KGs/Family/Family.owl
13:08:43 INFO  loader          :: Loader = LoaderParallel
13:08:43 INFO  loader          :: Start: KGs/Family/Family.owl
13:08:43 INFO  loader          :: Finished: KGs/Family/Family.owl: 2,032 tuples in 0.63s (Avg: 3,215)
13:08:44 INFO  loader          :: Finish - index SPO
13:08:44 INFO  loader          :: Finish - index OSP
13:08:44 INFO  loader          :: Finish - index POS
13:08:44 INFO  loader          :: Time = 1.004 seconds : Triples = 2,032 : Rate = 2,024 /s

# Launching a triple store
cd Fuseki/apache-jena-fuseki-4.7.0 && java -Xmx4G -jar fuseki-server.jar --tdb2 --loc=databases/family /family

### Send a query to the triple store
curl http://localhost:3030/family/ --data query=PREFIX%20rdf%3A%20%3Chttp%3A%2F%2Fwww.w3.org%2F1999%2F02%2F22-rdf-syntax-ns%23%3E%0APREFIX%20rdfs%3A%20%3Chttp%3A%2F%2Fwww.w3.org%2F2000%2F01%2Frdf-schema%23%3E%0ASELECT%20%2A%20WHERE%20%7B%0A%20%20%3Fsub%20%3Fpred%20%3Fobj%20.%0A%7D%20LIMIT%2010 -X POST

Querying the triple store via post request and SPARQL query.

import requests
response = requests.post('http://localhost:3030/family/sparql', data={
    'query': 'SELECT ?s  { ?s <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://www.benchmark.org/family#Brother>}'})
print({'<'+i['s']['value']+'>' for i in response.json()['results']['bindings']})

Please let me know if something is not explain clearly

alkidbaci commented 1 year ago

A few points that need more clarification:

  1. Should I load every dataset and put the whole Fuseki folder as .zip in the hobbit server for users to download it? (And maybe add a description on how to load a dataset for people that want to load a new dataset)
  2. Should I make the instance retrieval via triple store optional in addition to the current approach because if the fuseki server is not running there will be nothing but errors ( or should the server start automatically if it has not already when a call for instances is requested?)
Demirrr commented 1 year ago

(1) I would say not for the time being. Setting up a triplestore should be done by the user. (2) Yes please.

Demirrr commented 1 year ago

Thank you for the PR @alkidbaci