burtcorp / athena-jdbc

A JDBC driver for AWS Athena
BSD 3-Clause "New" or "Revised" License
23 stars 17 forks source link
aws-athena java jdbc jdbc-driver

athena-jdbc

If you are looking for the best JDBC driver for Amazon Athena, have a look at Amazon Athena JDBC driver v3. The driver in this repo is no longer maintained, but one of the authors was the lead on the new official driver.


Build Status Maven Central

If you're reading this on GitHub, please note that this is the readme for the development version and that some features described here might not yet have been released. You can find the readme for a specific version via the release tags (here is an example).

This is a JDBC driver for AWS Athena.

Installation

Using Maven you can add this to your dependencies:

<dependency>
  <groupId>io.burt</groupId>
  <artifactId>athena-jdbc</artifactId>
  <version>${athena-jdbc.version}</version>
</dependency>

Check the releases page for the value of ${athena-jdbc.version}.

Dependencies

The driver requires Java 8 or later.

The only dependency is AWS SDK, specifically software.amazon.awssdk:athena. See pom.xml for more details.

Usage

JDBC URLs

The driver registers itself with java.sql.DriverManager automatically, and accepts JDBC URLs with the subprotocol athena. The subname is the default database name for the connection, and is optional. The simplest possible JDBC URL is "jdbc:athena", which is equivalent to "jdbc:athena:default". To use a database called "test" as the default database for the connection use the URL "jdbc:athena:test".

Driver and data source classes

When using a JDBC URL to connect you shouldn't need the driver class name, but if you for some reason need them, driver class name is io.burt.athena.AthenaDriver and the data source class name is io.burt.athena.AthenaDataSource.

Connection properties

There are three connection properties:

These properties are the same for both the java.sql.DriverManager and javax.sql.DataSource APIs.

Examples

Connecting with DriverManager

import java.sql.Connection;
import java.sql.DriverManager;
import java.util.Properties;

Properties properties = new Properties();
properties.setProperty("region", "us-east-1"); // or set AWS_REGION
properties.setProperty("outputLocation", "s3://some-bucket/and/a/prefix");

try (Connection connection = DriverManager.getConnection("jdbc:athena:default", properties)) {
  // use connection
}

Currently the java.sql.Connection returned by the driver is thread safe and doesn't need to be closed until the application stops, if at all.

Creating a DataSource

This is the least amount of code needed to obtain a connection, and you also get type safe configuration:

import io.burt.athena.AthenaDataSource;
import java.sql.Connection;
import javax.sql.DataSource;

AthenaDataSource dataSource = new AthenaDataSource();
dataSource.setRegion("us-east-1");
dataSource.setOutputLocation("s3://some-bucket/and/a/prefix")

try (Connection connection = dataSource.getConnection()) {
}

Connecting with a connection pool

This example uses HikariCP, but of course other connection pools work too. The driver supports both JDBC URL and javax.sql.DataSource based configurations, and both use the same property names.

import com.zaxxer.hikari.HikariConfig;
import com.zaxxer.hikari.HikariDataSource;
import java.sql.Connection;
import javax.sql.DataSource;

HikariConfig config = new HikariConfig();
config.setJdbcUrl("jdbc:athena:default"); // or setDataSourceClassName("io.burt.athena.AthenaDataSource")
config.addDataSourceProperty("region", "us-east-1"); // or set AWS_REGION
config.addDataSourceProperty("outputLocation", "s3://some-bucket/and/a/prefix");
DataSource dataSource = new HikariDataSource(config);

try (Connection connection = dataSource.getConnection()) {
  // use connection
}

Using a connection

Once you have a java.sql.Connection instance you can use it as you would one from any other JDBC driver.

import java.sql.ResultSet;
import java.sql.Statement;

try (
  Statement statement = connection.createStatement();
  ResultSet resultSet = statement.executeQuery("SELECT 'Hello from Athena'")
) {
  resultSet.next();
  System.out.println(resultSet.getString(1));
}

Getting the query execution ID from a ResultSet

import io.burt.athena.AthenaResultSet;
import java.sql.ResultSet;
import java.sql.ResultSetMetaData;
import java.sql.Statement;

try (
  Statement statement = connection.createStatement();
  ResultSet resultSet = statement.executeQuery("SELECT 'Hello from Athena'")
) {
  ResultSetMetaData metaData = resultSet.getMetaData();
  if (metaData.isWrapperFor(AthenaResultSetMetaData.class)) {
    AthenaResultSetMetaData unwrappedMetaData = metaData.unwrap(AthenaResultSetMetaData.class);
    String queryExecutionId = unwrappedMetaData.getQueryExecutionId();
    System.out.println(queryExecutionId);
  }
}

Providing client request tokens

By setting a client request token on a query execution you can make Athena reuse a previous result set if the exact same query has already been run. If you run the same query multiple times this can save money and improve performance.

import io.burt.athena.AthenaStatement;
import java.sql.ResultSet;
import java.sql.Statement;
import org.apache.commons.codec.binary.Hex;

try (Statement statement = connection.createStatement()) {
  if (statement.isWrapperFor(AthenaStatement.class)) {
    AthenaStatement unwrappedStatement = statement.unwrap(AthenaStatement.class);
    unwrappedStatement.setClientRequestTokenProvider(sql -> Optional.ofNullable(Hex.encodeHexString(sql)));
  }
  for (int i = 0; i < 10; i++) {
    try (ResultSet resultSet = statement.executeQuery("SELECT 'Hello from Athena'")) {
        resultSet.next();
        System.err.println(resultSet.getString(1));
      }
    }
  }
}

The client request token provider is a Function<String, Optional<String>>, and receives the SQL that will be executed, and should return the token to use for the request, wrapped in an java.util.Optional.

Description

Why another Athena JDBC driver?

There is already an existing JDBC driver for Athena, so why another one?

There is nothing fundamentally wrong with the existing driver. It was developed by Simba, who have a lot of experience building JDBC drivers, but who do not have much experience with Athena. Just providing a JDBC layer for running queries in Athena is not enough.

These are some of the issues we have found with the existing driver:

This alternative Athena JDBC driver aims to fix these issues, and hopefully even more things that we haven't even thought of yet.

Limitations & known issues

If you get a java.lang.UnsupportedOperationException it means that you've found a feature that hasn't been implemented yet. Some of these will eventually get implemented, but some may not make sense for Athena. If you rely on one of these open an issue and describe your use case.

If you get a java.sql.SQLFeatureNotSupportedException exception it means that the feature or operation either is not supported by Athena, or that it's unclear how it could be supported. Feel free to open pull requests that adds support for it if you have an idea for how it could be done.

Before you open an issue, check the API documentation and tests, there might be clues and hints there. Looking at the commit history of a method can also lead to insights.

These are some specific limitations and known issues that you might run into:

Contributing

See CONTRIBUTING.md.

Copyright

© 2019 Theo Hultberg and contributors, see LICENSE.txt (BSD 3-Clause).