Open ksemaev opened 4 years ago
It's a deep rabbit hole, but in order to troubleshoot that, you'd need to disable log blacklisting in your client config yaml file by setting it to an empty array, e.g.:
logging:
# Whatever other entries, followed by:
blacklist: []
After that, you'd collect the logs and watch what happens. My suspicion here is that you'll find the cluster state is not updating rapidly enough, or some similar problem. The "KeyError" exception indicates that the response Curator got back from Elasticsearch did not include a list of indices. What this might imply is that your action steps (which you did not share, so this is a guess) are "completed," but the cluster state hasn't updated to show that the index has been created. Sometimes it is, but that 5%-7% of the time it fails, it isn't.
Again, this is just a guess, since I'd need to see the log files to be completely certain. But it is what I suspect, based on what you've shared.
TY for the response @untergeek ! I will create the debug process in next days, please do not close the issue. But overall I think that indeed it's AWS ES doesn't update state, that's why I ask if it's possible to add the same delay option that we have for forcemerge https://www.elastic.co/guide/en/elasticsearch/client/curator/current/option_delay.html to reindex/all_other actions. Or maybe at least catch the error of target index not existing
To submit a bug or report an issue
When doing reindex, occasionally in 5-7% of cases I get
Failed to complete action: reindex. <type 'exceptions.KeyError'>: 'indices'
Expected Behavior
The task I have is quite simple - I get all index names that are readonly, and then one by one for each index:
Actual Behavior
The task runs fine, but sometimes (I could only connect it with big number of running reindex tasks) I get:
As I have a lot of reindex tasks - I get such error 5-7 times out of 100 (and this is oncce in an hour). There's no pattern, it could be any index at any time. The index could be 30Mb, could be 40Gb. It always reindexes from 3 primary 1 replica to 1 primary 1 replica index.
Specifications
AWS ES 7.1 Curator 5.8.1
Context (Environment)
Can we somehow catch this error and do retry? Maybe it is happening because index for reindex is in process of creation, and we can add delay option? Or maybe there's even a way to catch the verbose output to define the reason?