An implementation of the Jena storage layer on the Cassandra storage engine.
Create a keyspace on the Cassandra server for each DataSet or collection of graphs.
Create an instance of CassandraConnection.
Use the keyspace name and the CassandraConnection to create a DatasetGraphCassandra or GraphCassandra instance.
Once constructed they should work as any normal DatasetGraph or Graph.
There are 4 tables created in the Cassandra keyspace. The tables have 4 primary segments identified:
The tables are identified by their column order and are SPOG, PGOS, OSGP, and GSPO.
In addition each table has 3 additional columns with indexes.
When the Graph.find(), DatasetGraph.find() or DatasetGraph.findNG() is called the query engine determines which table can best answer the query given the g, s, p or o values.
Depending on the value of the o the indexes are added to the primary segments.
Cassandra queries have some particular requirements:
the primary key must be specified. If it is not specified then token( col ) > Long.MIN_VALUE will return all the values
the rest of the key columns do not have to be specified except that if a key segment has a value all earlier segments must also have values.
To handle the case where a previous key segment is missing we will stop at the first missing segment. If there are any further specified segments we will use a result filter to properly filter them.
We always remove the object value if any non primary data (obj_dtype, obj_int, obj_value) are available as we will use the other columns for the primary query, and then the indexes to locate the proper values.