RedisLabs / spark-redis

A connector for Spark that allows reading and writing to/from Redis cluster
BSD 3-Clause "New" or "Revised" License
936 stars 367 forks source link

Cannot use in Databricks JedisConnectionException: Could not get a resource from the pool #357

Open juancresc opened 1 year ago

juancresc commented 1 year ago

I'm currently testing this in pyspark

df.write\
  .format("org.apache.spark.sql.redis")\
  .option("table", "mytable")\
  .option("infer.schema", True)\
  .option("spark.redis.host","somehost")\
  .option("host","somehost")\
  .option("spark.redis.port", "6666")\
  .option("port", "6666")\
  .option("spark.redis.ssl", False)\
  .option("auth", "")\
  .option("timeout", 5000)\
  .option("key.column", "key")\
  .save()
# JedisConnectionException: Could not get a resource from the pool

I've installed this spark_redis_2_4_0_jar_with_dependencies.jar From here: https://repo1.maven.org/maven2/com/redislabs/spark-redis/2.4.0/ The notebook currently runs: 10.4 LTS ML (includes Apache Spark 3.2.1, Scala 2.12)

I'm able to connect to redis from the notebook using the redis lib from python

tonofll commented 1 year ago

Ok so I was facing exactly the same issue and I managed to solve it. I tested it with version spark-redis 3.1.0, scala 2.12 and Spark 3.2.1 (Databricks runtime 10.4 LTS).

You must set the variables in Spark configuration before launching the cluster. Otherwise if you put them directly in your spark session through spark.conf.set("", "") or directly when reading/wrinting your dataframe as .option(...), it would raise JedisConnectionException

image
spark.redis.host <your_host>
spark.redis.port <your_port> // usually 6379
spark.redis.auth <your_auth_token> // if needed
spark.redis.ssl true // in case you connect using TLS (port 6380)

Example code (in Scala)

case class Person(name: String, age: Int)

val personSeq = Seq(Person("John", 30), Person("Peter", 45))
val df = spark.createDataFrame(personSeq)

df.write
  .format("org.apache.spark.sql.redis")
  .option("table", "person-db")
  .save()

// Read the same table afterwards
val df = spark.read
  .format("org.apache.spark.sql.redis")
  .option("table", "person-db")
  .load()
df.show()
adamwrobel-ext-gd commented 1 year ago

@tonofll hey sorry for asking in an old topic, I am having issues even adding the JAR to the cluster. How did you do it?

tonofll commented 1 year ago

@tonofll hey sorry for asking in an old topic, I am having issues even adding the JAR to the cluster. How did you do it?

To install de JAR in the cluster, just go to the cluster configuration and open Libraries tab:

image

Afterwards click Install new and search spark-redis library in Maven central repository:

image image image image

Once installed, simply restart the cluster and it should work properly. To avoid JedisConnectionException follow the steps in my previous comment.

adamwrobel-ext-gd commented 1 year ago

Oh yeah I just noticed you switched to Maven Central from Spark Packages. In there, the latest is 2.3.0. I managed today to workaround this by just pasting the coordinates, repository and clicking Install with no browsing. It worked too. Thanks!