Simplify access to Aerospike DB.
Aerospike is extremely fast and scalable key-value store. It provides clients for various languages including Java. However, unlike relational databases that all support SQL and therefore can easily implement JDBC driver each no-SQL DB creates its own world. API exposed by clients of no-SQL databases are different (not standard) and optimized for specific features of certain databases.
From other hand there are a lot of tools that help to visualize data stored in database, create reports based on the data, perform various ETL operations etc. These tools traditionally support SQL. Database that has standards compliant JDBC driver can be easily connected to various tools. Majority of popular no-SQL databased have JDBC drivers. Aerospike did not have one. This was the reason to start this project.
Slides for presentation can be found here.
The driver is being developed now and is not available in any public repository. One can however build and publish it to a private repository. Once it is done refer to it using definition like the following:
For maven:
<dependency>
<groupId>com.nosqldriver</groupId>
<artifactId>aerospike-jdbc-driver</artifactId>
<version>1.0-SNAPSHOT</version>
</dependency>
or for Gradle:
compile group: 'com.nosqldriver', name: 'aerospike-jdbc-driver', version: '1.0-SNAPSHOT'
It is very easy to build the project. The only prerequisite is JDK 8 or higher. Clone git repository and run:
./gradlew build -x test
This command will compile and project and package jar that can be found under ./build/lib If you want to run unit test you want to run unit tests please install and run aerospike database on the machine that runs the tests. Once this is done run:
./gradlew build
This command will compile the code, run all unit and integration tests and create jar file.
By default, tests run against Aerospike instance running on localhost
and listening to default port 3000.
To change this behaviour use system properties aerospike.host
and aerospike.port
respectively, e.g.:
./gradlew build -Daerospike.host=10.10.1.1 -Daerospike.port=3333
Aerospike JDBC driver depends on SQL parser and Aerospke Java client. If you want to use the driver with 3rd party tool it is very convenient to build single fat jar that contains all dependencies. Create the fat jar using command:
./gradlew fatJar
Fat JAR aerospike-jdbc-driver-all-1.0-SNAPSHOT.jar
can be added to 3rd party tool like SQuirreL or DB viewer
The Aerospike JDBC driver is compliant to JDBC version 4. If it is available in classpath no more configuration is needed, and the following code should work:
Connection conn = DriverManager.getConnection("jdbc:aerospike:localhost/test");
ResultSet rs = conn.createStatement().executeQuery("select * from people");
while (rs.next()) {
rs.getString(1);
rs.getInt("year_of_birth");
}
The JDBC URL format is: jdbc:aerospike:HOST[:PORT][/NAMESPACE][?PARAM1=VALUE1[&PARAM2=VALUE2]
.
For example jdbc:aerospike:localhost
connects to Aerospike running on a local machine and listening to the default port (3000). If you want to customize port use URL like jdbc:aerospike:localhost:4000
, to connect to specific namespace add it to URL like: jdbc:aerospike:localhost/test
. The following example shows how to connect to namespace test
of Aerospike running on a remote machine and listening to port 4567: jdbc:aerospike:10.1.1.1:4567/test
.
Typical Aerospike installation consists of several instances, so several IP addresses of servers can be passed to driver as following: jdbc:aerospike:first:3100,second:3200,third:3300
. If port is omitted, the default value of 3000 is used.
The parameters can be supplied either as a part of the URL or as the separate properties. The driver translates parameters into Aerospike client policies. Aerospike client has the following policy types:
Policy related parameters look like: policy.POLICY_TYPE.PROPERTY_NAME=PROPERTY_VALUE
, i.e.
Some policies have the same properties. For example socketTimeout
, totalTimeout
, replica
etc. are relevant for most policies. If you want to set the same value for specific property of all policies use *
instead of policy type: policy.*.socketTimeout=15000
Primary key in a relational database is a constraint applied to "regular" data column or several columns. Primary key (or just a key) in no-sql databases like Aerospike is special entity that is treated separately from the data. Aerospike JDBC driver hides these differences as much as it is possible emulating behavior of a relational DB. However there are some limitations that should be taking into account.
Aerospike does not operate with key itself but with its digest that is always calculated. By default, Aerospke does not store the primary key (PK) in the DB, so it cannot be retrieved later. If you want to store key in Aerospike set sendKey
property of WritePolicy
to true
(it is false
by default). The corresponding property of this driver is policy.write.sendKey=true
. If you want to retrieve key from the DB you have to use the same property to one of ReadPolicy
, QueryPolicy
, ScanPolicy
or BatchPolicy
depending on the API you are using. This driver hides from you details of the API used under the hood, if you always want to read the keys set property sendKey=true
to all policies used for reading (read, query, scan, batch).
If keys should be always available the easiest way is to supply property policy.*.sendKey=true
. This statement will be applied to all relevant policies, so Aerospike will always store and retrieve the key.
Even if we do not store or retrieve primary key we can read its digest. Use special field PK_DIGEST
for this purpose.
This feature can be enabled via parameter policy.driver.sendKeyDigest=true
.
Relational databases hold metadata that describe the database structure (catalogs, schemas, tables, columns etc). Aerospike is a schema-less DB. Its tables are called sets and columns are called bins. Set holds any number of rows. Each row can hold any number of bins of any name and type. However, very often people just hold the DB schema in the application layer and in fact each row has the same bins.
JDBC standard operates with 2 types of metadata:
java.sql.DatabaseMetaData
java.sql.ResultSetMetaData
Discovery of metadata of Aerospike cluster may be pretty heavy: it merges information retrieved from each node of the cluster. Various tools call method Connection.getMetaData()
very often that can cause performance problems. The driver caches this information for certain time period that can be configured using property policy.driver.databaseMetadataCacheTimeout
(default value is 60000 ms.)
Execution of operation that can change the metadata (INSERT
, UPDATE
, CREATE_INDEX
, DROP_INDEX
) invalidates this cache.
The Aerospike JDBC driver discovers schema dynamically using the first N
rows of the set. This means that if other rows have additional bins they could be ignored when reading data using select
statement. Number of rows used for the schema discovery can be configured using property policy.driver.discoverMetadataLines
. Its default value is 1.
The driver does not implement import and export functionality. However, various tools (e.g. DBeaver) does this. Tools typically perform export using query like the following:
select * from THE_TABLE
The exported results can be stored using various formats (CSV, JSON, XML, SQL insert statements) If you want to use exported data for import you have to:
General recommendation: if you want to store and retrieve keys via the driver always configure connection with property policy.*.sendKey=true
.
The next issue is the export format. CSV format does not hold any schema information. If CSV file contains column with value 123 we cannot know whether this is integer, double or string, so generic tools cannot generate correct SQL insert
statement. So, you cannot import data from CSV file to empty table. If this is your scenario create one syntactically correct fake record using insert
statement. This will help tool to discover the table structure and generate correct SQL insert
statement. Then import data from the file. After that you can delete the fake record using SQL delete
statement. If the data file format does not matter export data as SQL insert statements. This script can be executed against empty table because schema discovery is not needed in the case.
namespace_name
to change active namespace. Useful for the interactive mode or scripts. create table
. This operation is meaningless applicable to Aerospike that creates set once somebody writes to this set. create/drop schema
cannot be implemented for Aerospike that requires static definition of namespaces using aerospike.conf
.SQL identifier should follow the following rules: consist of Latin letters, digits and underscores (_) starting from a letter. If your identifier does not follow these rules wrap it with quotes.
For example:
select name1 from data
Neither name1
nor data
should not be quoted. However:
select "first name" from "100"
Here first name
is quoted because it contains space while 100
is quoted because it starts with a digit.
Even empty identifiers can be used:
insert into data_table (PK, "") values (1, 3.1415925)
select "" from data_table
select sin("") from data_table
Consequent double quotes in the example above indicate empty identifier.
Function | Description |
---|---|
len(s) , length(s) |
length of string, list, map |
ascii(s) |
ASCII code of the first character of the given string |
char(code) |
character by its ASCII code |
locate(subStr, str, [offset=1]) |
returns position of subStr into str starting from offset (that is 1 if omitted) |
instr(subStr, str) |
returns position of subStr into str |
trim(s) ,ltrim(s) , rtrim(s) |
trims string (removes spaces from both/left/right sides) |
strcmp(s1, s2) |
compares given strings |
left(str, n) |
n-characters substring of given string from the beginning |
lower(str) , lcase(str) |
converts all characters to lower case |
upper(str) , ucaes(str) |
converts all characters to upper case |
str(v) |
converts given value to string. Implements special support for byte arrays. |
substring(str, start, length) |
length characters long substring of given string started from start |
space(n) |
generates string that consists of n spaces |
concat(...) |
concatenate given strings |
concat(separator, ...) |
concatenate given strings using separator |
reverse(str) |
reverses given string |
to_base64(bytes) |
generates Base64 representation of given byte array |
from_base64(str) |
returns byte array from given Base64 |
substr(str, from, to) |
returns substring of given string |
concat(str1, str2, ...) |
concatenates given strings |
concat_ws(separator, str1, str2, ...) |
concatenates given strings using separator |
now() |
retrieves current epoch time in milliseconds |
year([date]) |
retrieves year (*) |
month([date]) |
retrieves month (*) |
dayofmonth([date]) |
retrieves day of month (*) |
hour([date]) |
retrieves hour (*) |
minute([date]) |
retrieves minute (*) |
second([date]) |
retrieves second (*) |
millisecond([date]) |
retrieves millisecond (*) |
date([date]) |
retrieves date object (*) |
epoch(str, fmt) |
parses given string representation of date using given format to epoch |
map(s) , list(s) , array(s) |
create map, list and array respectively from their string representation |
sin(x) , cos(x) , tan(x) , cot(x) |
trigonometric functions |
asin(x) , acos(x) , atan(x) , atan2(x) |
inverse trigonometric functions |
degrees(r) , radians(d) |
transforms radians to degrees and vice versa |
pi() |
returns π |
abs(x) |
returns absolute value |
floor(x) , ceil(x) |
return floor and ceil value of given x |
round(x, n) |
rounds given value x to scale n |
exp(x) , ln(x) , log10(x) , log2(x) |
math functions ex, ln(x), lg10(x), lg2(x) |
pow(x, n), power(x, n) | calculates xn |
(*) [date]
is optional argument. If omitted current date is used. Otherwise, can be either Date object or epoch or string representation of date parsed using one of the following formats: yyyy-MM-dd HH:mm:ss.SSS z, yyyy-MM-dd HH:mm:ss z, yyyy-MM-dd HH:mm:ss.SSS, yyyy-MM-dd HH:mm:ss, yyyy-MM-dd HH:mm, yyyy-MM-dd
Aerospike can hold map in single bin. One can either query map as-is or its fields separately. For example is the bin name is data
and it contains map that hold personal data like first_name
, last_name
and year_of_birth
the data can be queried as following:
select data[first_name], data[last_name], data[year_of_birth] from (select data from people)
Please note that Aerospike returns map of strings, so values of all fields including those that look like numeric are represented as strings. For example year_of_birth
is not represented as number but as string that contains numeric characters.
Let's take an example. We have class Person
:
public class Person implements Serializable {
private String firstName;
private String lastName;
private int yearOfBirth;
// getters, setters etc.
}
The first way to store objects of this class in Aerospike is creating bins corresponding to the fields of the class, i.e. firstName
, lastName
, yearOfBirth
. However, one can prefer to store the information in single bin. If class Person
is serializable it can be easily done using PreparedStatement
:
PreparedStatement insert = conn.prepareStatement("insert into people (PK, data) values (?, ?)");
insert.setInt(1, 1);
insert.setObject(2, new Person("John", "Lennon", 1940));
Once this is done you can retrieve the information using select statement like:
select * data from people;
Unfortunately table people
has only one column data
that holds binary information. Fortunately serializable objects can be deserialized automatically using the following query:
select data[firstName], data[lastName], data[yearOfBirth] from (select data from people)
Where statement can be applied on both selects for example
select data[firstName], data[lastName], data[yearOfBirth] from (select data from people) where data[firstName]='John'
Not serializable classes cannot be supported out-of-the-box. However, this problem can be solved using custom functions.
Custom function is a public class that has public default constructor and implements java.util.function.Function<T, R>
where T
is a type of input parameter and R
is a type of output parameter.
Let's take a look on example from the previous chapter but this time the class Person
is not Serializable
. Let's assume that data has been already written to the database, and we want to have a convenient way to retrieve it:
select human[firstName], human[lastName], human[yearOfBirth] from (select person(data) as human from people)
The statement calls function person()
that converts data stored in the database to object of class Person
. How is it possible? Indeed, driver does not know anything about either class custom Person
or its serialization format. The answer is that person()
is a custom function.
The functionality of the driver can be extended by providing custom functions. Custom function is a public class that has public default constructor and implements java.util.function.Function<T, R>
where T
is a type of input parameter and R
is a type of output parameter.
Here is an example of function that calculates square root of given numeric parameter:
public class Sqrt implements Function<Double, Double> {
@Override
public Double apply(Double d) {
return Math.sqrt(d);
}
}
This once implemented, compile and added to classpath this function must be registered as following:
jdbc:aerospike:localhost/test?custom.function.sqrt=com.company.Sqrt
Generally the parameter that registers custom function looks like:
custom.function.FUNCTION_NAME=FULLY_QUALIFIED_CLASS_NAME
Where
FUNCTION_NAME
- the name that can be then used to call this function from SQL statement
FULLY_QUALIFIED_CLASS_NAME
- the fully qualified class name of class that implements the function. One can define as many functions as he wants, e.g.:
jdbc:aerospike:localhost?custom.function.strlen=com.company.StrlengthCalculator&custom.function.sqrt=com.company.Sqrt
These functions can be used in SQL query as following:
select sqrt(4), strlen('abc'); -- returns 2, 3
select sqrt(strlen('abcd')); -- returns 2
Here is an example of custom function that deserializes custom binary representation of Person
to instance of class Person
:
public class PersonDeserializer implements Function<byte[], Person> {
@Override
public Person apply(byte[] bytes) {
try (DataInputStream dis = new DataInputStream(new ByteArrayInputStream(bytes))) {
return new Person(dis.readUTF(), dis.readUTF(), dis.readInt());
} catch (IOException e) {
throw new IllegalArgumentException(e);
}
}
}
Customizations must be added to classpath of the application that uses driver. The custom function must be configured using configuration parameter supplied in JDBC URL:
jdbc:aerospike:localhost/test?custom.function.person=com.company.PersonDeserializer
This configuration makes function person()
available for SQL queries.
The TRUNCATE TABLE command deletes the data inside a table.
The following SQL truncates the table "Categories":
truncate table mytable
truncate table mynamespace.mytable
Additionally, to standard SQL syntax the Aerospike can truncate records older than specified in the special parameter "beforeLastUpdate":
truncate table mytable '2020-06-20'
truncate table mytable '2020-06-20 10:20:30.456'
The date should be specified using format like yyyy-MM-dd[ HH:mm[:ss[.SSS[ z]]]]
You can download binaries here:
If you are running under Java 11 and higher you need Nashorn - java script engine that should be added to the classpath together with the driver. Take the Nashorn jar file here.
Download aerospike-jdbc-driver-all-1.0-SNAPSHOT.jar to your computer. This jar file contains the driver's binaries together with all dependencies.
Copy aerospike-jdbc-driver-all-1.0-SNAPSHOT.jar
under $SQUIREL_SQL_HOME/drivers
.
Open Squirel SQL application, choose drivers tab at the left pane, press "add" button and fill form:
jdbc:aerospike:host:3000/test
com.nosqldriver.aerospike.sql.AerospikeDriver
Add the path to jar file to "Extra class path" panel. Choose Aliases tab at the left pane and press add button. Fill form: name, choose the driver from list, URL, username and password (if needed)
Choose Database/DriverManager, then press button "New." Fill the following data:
com.nosqldriver.aerospike.sql.AerospikeDriver
jdbc:aerospike:{host}[:{port}]/[{database}]