The goal of this project is to provide a reliable and high-quality search functionality over RDF Schemas and OWL Ontologies:
This is how you get an index up and running, and filled with data.
The recommended way on OS X is using Homebrew. After Homebrew is set up and configured, simply run:
brew install elasticsearch
To do: Add instructions for other operating systems...
The easiest way for development use is this, using the provided configuration file:
elasticsearch -f -D es.config=elasticsearch.yml
The -f
flag starts ElasticSearch in the foreground so you can stop it with Ctrl+C.
The -D
option instructs ElasticSearch to use the elasticsearch.yml
configration file. This configuration places data and logs into a subdirectory elasticsearch
within this repository. For production use, you may want to use a different setup.
You need Maven. Install it if necessary (brew install maven
on OS X).
mvn package
This compiles and assembles the command-line app. The result is two things:
target/vocidex-cli.tar.gz
and can be deployed wherever you liketarget/vocidex-cli/vocidex
and can be used directlyFrom inside the generated app's directory, the command-line tools can be run by invoking bin/appname
.
# go to CLI build dir
cd target/vocidex-cli/vocidex
# Download LOV N-Quads dump as lov_aggregator.nq, takes a while
curl -o lov_aggregator.nq http://lov.okfn.org/dataset/lov/agg/lov_aggregator.rdf
# Load it, takes a while
bin/index-lov elasticsearch localhost lov lov_aggregator.nq
curl 'http://localhost:9200/lov/class,property,vocabulary/_search?q=test&pretty=1'
If this returns a longish JSON response, all is good.
create-index
: Index initializerThis tool connects to an ElasticSearch cluster and initializes a new index for use with Vocidex. To see its syntax:
bin/create-index
Example invocation:
# Adds an index called 'lov' on the 'elasticsearch' cluster
bin/create-index elasticsearch localhost lov
add-vocabulary
: The ElasticSearch Vocabulary IndexerThis tool reads an RDFS or OWL file, and indexes any terms defined therein in an ElasticSearch index. To see its syntax:
bin/add-vocabulary
Example invocation:
# Indexes SKOS into the 'skos' index on the 'elasticsearch' cluster
bin/add-vocabulary elasticsearch localhost skos http://www.w3.org/2004/02/skos/core
index-lov
: The Linked Open Vocabularies IndexerThis tool populates an ElasticSearch index with the contents of the Linked Open Vocabularies dump. The dump can be obtained here. The file needs to be downloaded, and its extension changed to .nq
because otherwise Jena gets confused. It really is an N-Quads file, not an RDF/XML file. To see the tool's syntax:
bin/index-lov
Example invocation:
# Download LOV dump with right name
curl -o lov_aggregator.nq http://lov.okfn.org/dataset/lov/agg/lov_aggregator.rdf
# Indexes the dump into an index called 'lov' on the 'elasticsearch' cluster
bin/index-lov elasticsearch localhost lov lov_aggregator.nq
Once the ElasticSearch index is populated, the standard REST-based ElasticSearch APIs can be used to run searches.
The following example searches for classes, properties and vocabularies in the lov
index, using the keyword test
:
curl 'http://localhost:9200/lov/class,property,vocabulary/_search?q=test&pretty=1'
Equivalent to:
curl -XPOST 'http://localhost:9200/lov/class,property,vocabulary/_search?pretty=1' -d '{"query":{"match":{"_all":"test"}}}'
This provides an autocomplete feature on pre-tokenized (using edge_ngram [1;100]) and indexed fields *.autocomplete
.
curl -XPOST 'http://localhost:9200/lov/class,property/_search?pretty=1' -d '{
"fields" : ["uri", "prefixed", "localName"],
"query" : {
"multi_match" : {
"query": "foaf:",
"fields": ["prefixed.autocomplete","uri.autocomplete"],
"type" : "match_phrase"
}
}
}'
Initializing Eclipse files:
mvn eclipse:eclipse -DdownloadSources -DdownloadJavadocs
Running the tests:
mvn test
Use the issue tracker to discuss stuff, and feel free to submit pull requests.
Vocidex works by creating a JSON document for each entity to be indexed (classes, properties, datatypes, vocabularies), and putting them into an ElasticSearch index. Here we document the structure of these JSON documents.
Note: “term array” is a JSON array of objects, each with uri
and label
keys.
type
: class
, property
, datatype
uri
: absolute URIuri.autocomplete
: edge_ngram tokenized for autocomplete over uri
prefix
: Namespace prefix, either provided by LOV or manually at index time; may be absentlocalName
: Part after the last hash/slashlocalName.autocomplete
: edge_ngram tokenized for autocomplete over localName
prefixed
: Prefixed name (e.g., foaf:Person
), or absent if no prefix
prefixed.autocomplete
: edge_ngram tokenized for autocomplete over prefixed
label
: rdfs:label
or similar property, or a string synthesized from the local namecomment
: rdfs:comment
or similar property; may be absentvocabulary
: LOV metadata about the vocabulary; may be absent
uri
prefix
label
homepage
Term keys as listed above, plus:
superclasses
: term arraydisjointClasses
: term arrayequivalentClasses
: term arrayTerm keys as listed above, plus:
domains
: term arrayranges
: term array; each member also has either an isDatatype
or isClass
field with value true
superproperties
: term arrayinverseProperties
: term arrayequivalentProperties
: term arrayisAnnotationProperty
: booleanisObjectProperty
: booleanisDatatypeProperty
: booleanisFunctionalProperty
: booleanisInverseFunctionalProperty
: booleanisTransitiveProperty
: booleanisSymmetricProperty
: booleanTerm keys as listed above
type
: vocabulary
uri
: absolute URI as per LOVuri.autocomplete
: edge_ngram tokenized for autocomplete over uri
prefix
: conventional prefix as per LOVprefix.autocomplete
: edge_ngram tokenized for autocomplete over prefix
label
: as for termsshortLabel
: curated short-form label as per LOV; may be absentcomment
: as for termshomepage
: URL from LOV metadata; may be absent