RedisLabs / spark-redis

A connector for Spark that allows reading and writing to/from Redis cluster
BSD 3-Clause "New" or "Revised" License
939 stars 372 forks source link

Read timeout while reading data from redis in batch #300

Open arpiv opened 3 years ago

arpiv commented 3 years ago

I am trying to read all the fields and its value present in a hash key from Redis in pyspark using spark-redis jar. I am able to read 500-600 fields but get Read time out error (snippet pasted below) while reading 5K-6K fields.

redis.clients.jedis.exceptions.JedisConnectionException: java.net.SocketTimeoutException: Read timed out
    at redis.clients.jedis.util.RedisInputStream.ensureFill(RedisInputStream.java:205)
    at redis.clients.jedis.util.RedisInputStream.readByte(RedisInputStream.java:43)
    at redis.clients.jedis.Protocol.process(Protocol.java:155)
    at redis.clients.jedis.Protocol.read(Protocol.java:220)
    at redis.clients.jedis.Connection.readProtocolWithCheckingBroken(Connection.java:318)
    at redis.clients.jedis.Connection.getBinaryMultiBulkReply(Connection.java:270)
    at redis.clients.jedis.Jedis.hgetAll(Jedis.java:942)

I tried increasing the timeout via spark.redis.timeout setting (tried setting it up to 3000 and higher), doing so i get Unexpected end of stream error. (snippet below)

redis.clients.jedis.exceptions.JedisConnectionException: Unexpected end of stream.
    at redis.clients.jedis.util.RedisInputStream.ensureFill(RedisInputStream.java:202)
    at redis.clients.jedis.util.RedisInputStream.readByte(RedisInputStream.java:43)
    at redis.clients.jedis.Protocol.process(Protocol.java:155)
    at redis.clients.jedis.Protocol.read(Protocol.java:220)
    at redis.clients.jedis.Connection.readProtocolWithCheckingBroken(Connection.java:318)
    at redis.clients.jedis.Connection.getBinaryMultiBulkReply(Connection.java:270)
    at redis.clients.jedis.Jedis.hgetAll(Jedis.java:942)

i am reading data using the following command below - data = spark.read.format("org.apache.spark.sql.redis").option('keys.pattern',"search_solution").option('infer.schema', 'true').load()

Is there any suggestion or possible areas to look at, to avoid these errors? Let me know if you have any knowledge on this issue.

ekavakakis commented 3 years ago

Any new on this? Have you managed to solve it?