RedisLabs / spark-redis

A connector for Spark that allows reading and writing to/from Redis cluster
BSD 3-Clause "New" or "Revised" License
940 stars 372 forks source link
dataframe java redis spark

Integration license Release Maven Central Javadocs Codecov

Discord Twitch YouTube Twitter

Spark-Redis

A library for reading and writing data in Redis using Apache Spark.

Spark-Redis provides access to all of Redis' data structures - String, Hash, List, Set and Sorted Set - from Spark as RDDs. It also supports reading and writing with DataFrames and Spark SQL syntax.

The library can be used both with Redis stand-alone as well as clustered databases. When used with Redis cluster, Spark-Redis is aware of its partitioning scheme and adjusts in response to resharding and node failure events.

Spark-Redis also supports Spark Streaming (DStreams) and Structured Streaming.

Version compatibility and branching

The library has several branches, each corresponds to a different supported Spark version. For example, 'branch-2.3' works with any Spark 2.3.x version. The master branch contains the recent development for the next release.

Spark-Redis Spark Redis Supported Scala Versions
master 3.2.x >=2.9.0 2.12
3.0 3.0.x >=2.9.0 2.12
2.4, 2.5, 2.6 2.4.x >=2.9.0 2.11, 2.12
2.3 2.3.x >=2.9.0 2.11
1.4 1.4.x 2.10

Known limitations

Additional considerations

This library is a work in progress so the API may change before the official release.

Documentation

Please make sure you use documentation from the correct branch (2.4, 2.3, etc).

Contributing

You're encouraged to contribute to the Spark-Redis project.

There are two ways you can do so:

Submit Issues

If you encounter an issue while using the library, please report it via the project's issues tracker.

Author Pull Requests

Code contributions to the Spark-Redis project can be made using pull requests. To submit a pull request:

  1. Fork this project.
  2. Make and commit your changes.
  3. Submit your changes as a pull request.