airbytehq / airbyte

The leading data integration platform for ETL / ELT data pipelines from APIs, databases & files to data warehouses, data lakes & data lakehouses. Both self-hosted and Cloud-hosted.
https://airbyte.com
Other
16.18k stars 4.14k forks source link

:bug: Source Mixpanel: 429 status code is not handled #9410

Closed octavia-squidington-iii closed 2 years ago

octavia-squidington-iii commented 2 years ago

Is this your first time deploying Airbyte: No OS Version / Instance: EC2 t2.medium Deployment: Docker Airbyte Version: 0.35.2-alpha Source name/version: Mixpanel 0.1.9 Destination name/version: AWS Redshift Description: Hi everyone, I am setting up a connection from Mixpanel to Redshift. The data appears to be loaded into Redshift properly, but one error is logged all the time when I sync. Does anybody know where it comes from?

When reading stream SourceMixpanel the connector does not handle 429 response code properly, leading to DefaultBackoffException. The connector should override backoff_time for smarter rate limit handling.

Error

Stream funnels: 429 Too Many Requests - {"request": "/api/2.0/funnels?from_date=2021-03-22&to_date=2021-04-20&funnel_id=15495008&unit=day", "error": "Query rate limit exceeded for project_id: 2526451. 1/5 queries running concurrently and 61/60 queries running in the last hour. For more information, please consult our documentation at https://help.mixpanel.com/hc/en-us/articles/115004602563-Rate-Limits-for-Export-API-Endpoints"}

Encountered an exception while reading stream SourceMixpanel
Traceback (most recent call last):
  File "/usr/local/lib/python3.7/site-packages/airbyte_cdk/sources/abstract_source.py", line 108, in read
    internal_config=internal_config,
  File "/usr/local/lib/python3.7/site-packages/airbyte_cdk/sources/abstract_source.py", line 141, in _read_stream
    for record in record_iterator:
  File "/usr/local/lib/python3.7/site-packages/airbyte_cdk/sources/abstract_source.py", line 185, in _read_incremental
    for record_counter, record_data in enumerate(records, start=1):
  File "/usr/local/lib/python3.7/site-packages/airbyte_cdk/sources/streams/http/http.py", line 352, in read_records
    response = self._send_request(request, request_kwargs)
  File "/airbyte/integration_code/source_mixpanel/source.py", line 78, in _send_request
    raise e
  File "/airbyte/integration_code/source_mixpanel/source.py", line 73, in _send_request
    return super()._send_request(request, request_kwargs)
  File "/usr/local/lib/python3.7/site-packages/airbyte_cdk/sources/streams/http/http.py", line 319, in _send_request
    return backoff_handler(user_backoff_handler)(request, request_kwargs)
  File "/usr/local/lib/python3.7/site-packages/backoff/_sync.py", line 94, in retry
    ret = target(*args, **kwargs)
  File "/usr/local/lib/python3.7/site-packages/backoff/_sync.py", line 94, in retry
    ret = target(*args, **kwargs)
  File "/usr/local/lib/python3.7/site-packages/airbyte_cdk/sources/streams/http/http.py", line 282, in _send
    raise DefaultBackoffException(request=request, response=response)

Logs

logs-34-0 (1).txt

Slack conversation

itaseskii commented 2 years ago

I'll claim this issue

misteryeo commented 2 years ago

@bazarnov FYI, @itaseskii will be working on closing this out so we don't have to address it.

roman-romanov-o commented 2 years ago

I've changed all query limits to 60 requests per hour, as it says in documentation:

https://help.mixpanel.com/hc/en-us/articles/115004602563-Rate-Limits-for-API-Endpoints