azavea / hiveless

Scala API for Hive UDFs with the GIS extension
Apache License 2.0
6 stars 1 forks source link
geospatial gis scala spark typelevel

Hiveless

CI Maven Badge Snapshots Badge

Hiveless is a Scala library for working with Spark and Hive using a more expressive typed API. It adds typed HiveUDFs and implements Spatial Hive UDFs. It consists of the following modules:

Quick Start

To use Hiveless in your project add the following in your build.sbt file as needed:

resolvers ++= Seq(
  // for snapshot artifacts only
  "oss-sonatype" at "https://oss.sonatype.org/content/repositories/snapshots"
)

libraryDependencies ++= List(
  "com.azavea" %% "hiveless-core"          % "<latest version>",
  "com.azavea" %% "hiveless-spatial"       % "<latest version>",
  "com.azavea" %% "hiveless-spatial-index" % "<latest version>"
)

Hiveless Spatial supported GIS functions

CREATE OR REPLACE FUNCTION st_geometryFromText as 'com.azavea.hiveless.spatial.ST_GeomFromWKT';
CREATE OR REPLACE FUNCTION st_intersects as 'com.azavea.hiveless.spatial.ST_Intersects';
CREATE OR REPLACE FUNCTION st_simplify as 'com.azavea.hiveless.spatial.ST_Simplify';
 -- ...and more

The full list of supported functions can be found here.

Spatial Query Optimizations

There are two types of supported optimizations: ST_Intersects and ST_Contains, which allow Spark to push down predicates when possible.

To enable optimizations:

import com.azavea.hiveless.spark.sql.rules.SpatialFilterPushdownRules

val spark: SparkSession = ???
SpatialFilterPushdownRules.registerOptimizations(sparkContext.sqlContext)

It is also possible to set it through the Spark configuration via the optimizations injector:

import com.azavea.hiveless.spark.sql.SpatialFilterPushdownOptimizations

val conf: SparkConfig = ???
config.set("spark.sql.extensions", classOf[SpatialFilterPushdownOptimizations].getName)

License

Code is provided under the Apache 2.0 license available at http://opensource.org/licenses/Apache-2.0, as well as in the LICENSE file. This is the same license used as Spark.