Closed basdag closed 9 years ago
On its own it's not necessarily bad to get the ConditionalCheckFailedException
error. Usually it just means that multiple workers tried to acquire the same shard at about the same time. The check is in place to make sure that only one of those attempts works.
It seems like at some point that error stops and you start getting "Found available shard" logs. I would expect to see those followed by attempts to spawn a new consumer to process them (you should see a "Spawning consumer" log when that happens).
From a quick look, one possible problem is that isShuttingDownFromError
is set here, which would cause the cluster not to try to spawn a consumer. In that case, I'd expect to see the cluster itself exiting soon, which isn't happening.
Can you try checking that?
Yes I was checking the ConsumerCluster
it makes total sense, because the cluster will receive the order to be reset in 40 seconds which is what it set to, so that is why Found available shard
is show repeatedly. However, the question would be if it is a good behavior for the cluster to reset itself just because the ConditionalCheckFailedException
? I think it would be need it to filter the kind of errors to see which ones makes sense to make the cluster reset, what do you think?
the question would be if it is a good behavior for the cluster to reset itself just because the ConditionalCheckFailedException
I don't think it will. The lease reservation (where the conditional check comes in) is done in the consumer. So the individual consumer will kill itself but the cluster (which owns the consumer) will keep running. In your logs, all of the "... exited" logs are from consumers exiting, not clusters.
To clarify, this is the terminology that the library uses.
launch-kinesis-cluster
command.@basdag is there anything here that is still a problem?
@evansolomon I tested the code this last couple days and still get indefinitely (more than an hour) the next log without the cluster exiting:
{"name":"KinesisCluster","hostname":"xxxx","pid":20,"level":30,"shardId":"shardId-000000000000","leaseCounter":1059,"msg":"Found available shard","time":"2015-02-17T09:37:07.142Z","v":0}
So as we highlight before, the issue its on the isShuttingDownFromError
flag, which block the cluster from spawning new consumers, like we discussed. This means that for some reason the cluster is not exiting when logAndEmitError
is called. Moreover and correct if I am wrong, I do not see anywhere where is declare the listener for the this.emit('error', err)
that is set inside logAndEmitError
. If this is true this can be a reason why the flag isShuttingDownFromError
is set but nothing happened after, and it gets inside an infinity loop of Found available shard
.
Also analyzing the errors in ConsumerCluster
, I found an interesting case here where I was wonder, how it is known or ensure that other consumers do not have correct shardIds either, in order to throw an error that will reset everything?
I think the non-exiting problem will be fixed by 593875796a63c92450bff69eaa46087d80257038.
The error highlighted is when the shardId
argument to spawn()
is false-y (e.g. an empty string). It's impossible to spawn a consumer for a false-y shard ID. I'm not sure I understand your question about other consumers having correct shard ID's, but it does seem like that's a case where those consumers' shutdown methods won't run currently.
The missing shard ID case should be handled a little better as of 84ff2d24a7f4f29a8ad2cfcdd29eaf7ef6faa449
Hi @evansolomon I apologize for the delay I will try it with this new logs and come back to you for the feedback. I will close this issue, and if something happen I will create a new one. Thanks for your time.
Hi again @evansolomon, running test like I mentioned, I found that when the shards in Kinesis are more than one - in my case 4 - an
ConditionalCheckFailedException
exception start rising constantly:Any idea what can be causing it?
Thanks again for your time and support.