Open iyer-r opened 5 years ago
Hi @iyer-r,
I didn't fully understand the idea of continuing loading while some node is down. Could you please clarify how it should happen under the hood in your opinion? Should it wait until the node is available with a timeout? There might be a Jedis configuration option for that, I will need to check.
In general Spark makes several attempts (retries) before failing the job. Do you see it in your case? As a workaround you may try to increase the number of retries.
Hi!
Kubernetes deployment of Redis cluster uses persistent volumes and stateful sets. As a result, when one of the nodes goes down (for any reason), a new one comes back up with the same data.
Access to the Redis cluster is via a load balancer IP. Subsequently, individual requests can go to any of the internal nodes.
As you mention, there are attempts to retry, however, the retries are all to the internal node that went down and not to the load balancer IP. If the retry can be to the load balancer IP, then the save can continue on other nodes, while the node that went down, comes up.
Hope that makes sense.
Hello!
I am using Spark Redis to populate a Redis cluster as follows --
The Redis cluster itself is hosted on the Kubernetes Engine with a load balancer service and the individual nodes can down and come back during the load process.
If one of the nodes goes down, I see an error like --
Caused by: redis.clients.jedis.exceptions.JedisConnectionException: Failed connecting to host 10.4.1.162:6379
When this happens, the whole job terminates. Instead, I would like to ignore this exception and try to continue loading.
Is there a way to achieve that?
Regards