elastic / connectors

Source code for all Elastic connectors, developed by the Search team at Elastic, and home of our Python connector development framework
https://www.elastic.co/guide/en/enterprise-search/master/index.html
Other
62 stars 119 forks source link

ApiError(429, 'TOO_MANY_REQUESTS') while indexing the records in Elasticsearch #856

Closed prashant-elastic closed 1 month ago

prashant-elastic commented 1 year ago

Bug Description

Getting 429 too many requests error while indexing the sharepoint documents (large data set 2.5M) to elasticsearch

To Reproduce

Steps to reproduce the behavior:

  1. Take the latest code of main branch of github connectors-python repository
  2. Do all the necessary configuration changes in config.yml file
  3. On the Kibana, create an index > go to configuration tab > make config changes related to the sharepoint connector

Expected behavior

All records should be properly indexed in Elasticsearch

Actual behavior

429, too many requests error faced by the user as below and status of sync is Sync failure ApiError(429, "{'_shards': {'total': 2, 'successful': 0, 'failed': 2, 'failures': [{'shard': 0, 'index': '.elastic-connectors-v1', 'status': 'TOO_MANY_REQUESTS', 'reason': {'type': 'circuit_breaking_exception', 'reason': '[parent] Data too large, data for [indices:admin/refresh[s]] would be [1051923900/1003.1mb], which is larger than the limit of [1020054732/972.7mb], real usage: [1051923680/1003.1mb], new bytes reserved: [220/220b], usages [model_inference=0/0b, inflight_requests=32840454/31.3mb, request=0/0b, fielddata=247/247b, eql_sequence=0/0b]', 'bytes_wanted': 1051923900, 'bytes_limit': 1020054732, 'durability': 'TRANSIENT'}}]}}")

Only 138440 docs got indexed out of 250000 records

Screenshots

Screen Shot 2023-05-03 at 5 22 47 PM

Note - This test has been executed on sharepoint server. Attaching log files for more reference

prashant-elastic commented 1 year ago

We also checked this on 8.8 branch of github and faced the same issue.

danajuratoni commented 1 year ago

cc: @vidok

artem-shelkovnikov commented 1 year ago

We don't handle this sort of throttling when uploading to Elasticsearch - the error says that Elasticsearch used all its memory to ingest data and for now cannot ingest more data and we need to wait.

We need to add this error handling into framework.

artem-shelkovnikov commented 1 year ago

For now if you need to go on with your testing, just increase the memory available to Elasticsearch to double of your current value (I see you're allocating 1GB RAM to Elasticsearch which is too little)

artem-shelkovnikov commented 1 year ago

@danajuratoni the problem is not Sharepoint-specific too, it's a framework issue

prashant-elastic commented 1 year ago

Hey @artem-shelkovnikov Please find the screenshot attached which shows the configuration for the Elasticsearch cloud deployment that we used for testing. Do you recommend having an instance with any other configuration?

image (4)

artem-shelkovnikov commented 1 year ago

Hi @prashant-elastic, indeed - you can see that Master node is 1GB, you need to choose configuration with bigger master node if you want the error to go away while we're addressing the problem.

prashant-elastic commented 1 year ago

Hoi @artem-shelkovnikov Sure, I will try configuring an instance with bigger master node.

prashant-elastic commented 1 year ago

Hey @artem-shelkovnikov We tried looking for a way to configure an instance with bigger master node but did not find any luck. Can you please let us know from where to configure the same?

ppf2 commented 1 year ago

Try using 3 zones with 2G size per zone.

I thought we have some form of retry policy in place for backpressure from Elasticsearch on bulk indexing. Based on the attached log file, is the bug here that the general retry mechanism is not working at the framework level?

artem-shelkovnikov commented 1 year ago

I thought we have some form of retry policy in place for backpressure from Elasticsearch on bulk indexing. Based on the attached log file, is the bug here that the general retry mechanism is not working at the framework level?

I think we don't have one at all or it's broken

ppf2 commented 1 year ago

Seems like it retries 3 times with no delay on everything except for conflict errors and gives up?

artem-shelkovnikov commented 1 year ago

It retries 3 times only for conflict errors - only ConflictError is caught in except block, all other errors are just raised immediately

ppf2 commented 1 year ago

Here's an example retry policy for Elasticsearch bulk requests to consider:

artem-shelkovnikov commented 1 month ago

Closing as we've update backpressure logic to retry transparently