RedisLabs / spark-redis

A connector for Spark that allows reading and writing to/from Redis cluster
BSD 3-Clause "New" or "Revised" License
935 stars 368 forks source link

Caused by: java.net.SocketTimeoutException: Read timed out #309

Closed DeepikaPrabha closed 1 year ago

DeepikaPrabha commented 3 years ago

I am trying to insert data to redis (Azure Cache for Redis) through spark. There are around 700 million rows and I am using spark-redis connector to insert data. It fails after sometime throwing this error. I am able to insert some rows but after sometime, some of the tasks start failing with the below error. I am running through jupyter notebook.

Caused by: redis.clients.jedis.exceptions.JedisConnectionException: java.net.SocketTimeoutException: Read timed out
    at redis.clients.jedis.util.RedisInputStream.ensureFill(RedisInputStream.java:205)
    at redis.clients.jedis.util.RedisInputStream.readByte(RedisInputStream.java:43)
    at redis.clients.jedis.Protocol.process(Protocol.java:155)
    at redis.clients.jedis.Protocol.read(Protocol.java:220)
    at redis.clients.jedis.Connection.readProtocolWithCheckingBroken(Connection.java:318)
    at redis.clients.jedis.Connection.getStatusCodeReply(Connection.java:236)
    at redis.clients.jedis.BinaryJedis.auth(BinaryJedis.java:2259)
    at redis.clients.jedis.JedisFactory.makeObject(JedisFactory.java:119)
    at org.apache.commons.pool2.impl.GenericObjectPool.create(GenericObjectPool.java:819)
    at org.apache.commons.pool2.impl.GenericObjectPool.borrowObject(GenericObjectPool.java:429)
    at org.apache.commons.pool2.impl.GenericObjectPool.borrowObject(GenericObjectPool.java:360)
    at redis.clients.jedis.util.Pool.getResource(Pool.java:50)
    ... 27 more
Caused by: java.net.SocketTimeoutException: Read timed out
    at java.net.SocketInputStream.socketRead0(Native Method)
    at java.net.SocketInputStream.socketRead(SocketInputStream.java:116)
    at java.net.SocketInputStream.read(SocketInputStream.java:171)
    at java.net.SocketInputStream.read(SocketInputStream.java:141)
    at java.net.SocketInputStream.read(SocketInputStream.java:127)
    at redis.clients.jedis.util.RedisInputStream.ensureFill(RedisInputStream.java:199)
    ... 38 more

This is the way I am trying to write data.

df.write
.option("host", REDIS_URL)
.option("port", 6379)
.option("auth", <PWD>)
.option("timeout", 20000)
.format("org.apache.spark.sql.redis")
.option("table", "testrediskeys").option("key.column", "dummy").mode("overwrite").save()
Spark : 3.0
Scala : 2.12
spark-redis: com.redislabs:spark-redis_2.12:2.6.0

Can someone help me understand what is the root cause of the issue?

Vertig00 commented 3 years ago

I have similar problem with Amazon Redis cache, turns out that cloud providers require more secured connection. To connect to Redis hosted on those providers you must enable SSH protocol during configuration (which is disabled by default). I add .config("spark.redis.ssl", true) during SparkSession config and it works