Allow retries for statuses other than 429 in streaming_bulk

david-a commented 5 years ago

Please allow retry on other statuses as well, not just 429. i.e. You can take in an argument which defaults to [429] or some callback to test the status or the error type. Use case: sometimes the elasticsearch cluster returns 403 - cluster_block_exception, like when in maintenance, we want to retry the failed items only.

Currently, with raise_on_error=False the errors are aggregated but without their data (because _process_bulk_chunk only adds the data when raise_on_error=True or in case of a TransportError), so we don't know which of them failed. with raise_on_error=True, the bulk stops whenever it encounters the error, and you can't tell in which chunk the error was found and which item should be retried.

https://github.com/elastic/elasticsearch-py/blob/master/elasticsearch/helpers/actions.py

lucasrcezimbra commented 4 years ago

@david-a

I was having the same problem, but with ConnectionTimeout. Reading the codebase, I found that the client receives the args retry_on_timeout and retry_on_status that are passed to Transport.

I set retry_on_timeout to True and this fixed my problem, like that:

client = Elasticsearch(hosts, retry_on_timeout=True)

Maybe passing a retry_on_status will work for you.

huntekah commented 3 years ago

I have to say, adding an option to handle other codes with exponential backoff in bulk() would prevent me from doubling code, just to use exponential backoff with 403 throttling exceptions like:

AuthorizationException(403, '403 Request throttled due to too many requests /my-index_write/_bulk')

exponential backoff works for both those errors, but elasticsearch-py detectsonly one of them :(.

retry_on_status is nice, but it will retry immediately, without any sleep in between.

elastic / elasticsearch-py

Allow retries for statuses other than 429 in streaming_bulk #1004