I am trying to read all the fields and its value present in a hash key from Redis in pyspark using spark-redis jar. I am able to read 500-600 fields but get Read time out error (snippet pasted below) while reading 5K-6K fields.
redis.clients.jedis.exceptions.JedisConnectionException: java.net.SocketTimeoutException: Read timed out
at redis.clients.jedis.util.RedisInputStream.ensureFill(RedisInputStream.java:205)
at redis.clients.jedis.util.RedisInputStream.readByte(RedisInputStream.java:43)
at redis.clients.jedis.Protocol.process(Protocol.java:155)
at redis.clients.jedis.Protocol.read(Protocol.java:220)
at redis.clients.jedis.Connection.readProtocolWithCheckingBroken(Connection.java:318)
at redis.clients.jedis.Connection.getBinaryMultiBulkReply(Connection.java:270)
at redis.clients.jedis.Jedis.hgetAll(Jedis.java:942)
I tried increasing the timeout via spark.redis.timeout setting (tried setting it up to 3000 and higher), doing so i get Unexpected end of stream error. (snippet below)
redis.clients.jedis.exceptions.JedisConnectionException: Unexpected end of stream.
at redis.clients.jedis.util.RedisInputStream.ensureFill(RedisInputStream.java:202)
at redis.clients.jedis.util.RedisInputStream.readByte(RedisInputStream.java:43)
at redis.clients.jedis.Protocol.process(Protocol.java:155)
at redis.clients.jedis.Protocol.read(Protocol.java:220)
at redis.clients.jedis.Connection.readProtocolWithCheckingBroken(Connection.java:318)
at redis.clients.jedis.Connection.getBinaryMultiBulkReply(Connection.java:270)
at redis.clients.jedis.Jedis.hgetAll(Jedis.java:942)
i am reading data using the following command below -
data = spark.read.format("org.apache.spark.sql.redis").option('keys.pattern',"search_solution").option('infer.schema', 'true').load()
Is there any suggestion or possible areas to look at, to avoid these errors? Let me know if you have any knowledge on this issue.
I am trying to read all the fields and its value present in a hash key from Redis in pyspark using spark-redis jar. I am able to read 500-600 fields but get Read time out error (snippet pasted below) while reading 5K-6K fields.
I tried increasing the timeout via
spark.redis.timeout
setting (tried setting it up to 3000 and higher), doing so i getUnexpected end of stream
error. (snippet below)i am reading data using the following command below -
data = spark.read.format("org.apache.spark.sql.redis").option('keys.pattern',"search_solution").option('infer.schema', 'true').load()
Is there any suggestion or possible areas to look at, to avoid these errors? Let me know if you have any knowledge on this issue.