airbytehq / airbyte

The leading data integration platform for ETL / ELT data pipelines from APIs, databases & files to data warehouses, data lakes & data lakehouses. Both self-hosted and Cloud-hosted.
https://airbyte.com
Other
16.11k stars 4.12k forks source link

[source-mailchimp] - UnicodeError("label empty or too long") during Mailchimp source connection check in Airbyte #45593

Open EgonStep opened 1 month ago

EgonStep commented 1 month ago

Connector Name

source-mailchimp

Connector Version

2.0.18

What step the error happened?

During the sync

Relevant information

I've encountered a persistent issue with the Mailchimp source in Airbyte, where attempts to sync or test the connection fail due to a UnicodeError. This problem started a couple of days ago and has been consistently reproducible since then.

Environment

Airbyte version: 0.63.13 source-mailchimp version: 2.0.18 destination-snowflake: 3.11.10

This issue prevents any operation with the Mailchimp source, impacting our data integration processes. Any insights or workarounds would be greatly appreciated.

Full error log:

2024-09-16 12:30:55 platform > Docker volume job log path: /tmp/workspace/15813/0/logs.log
2024-09-16 12:30:55 platform > Executing worker wrapper. Airbyte version: 0.63.13
2024-09-16 12:30:55 platform > 
2024-09-16 12:30:55 platform > Using default value for environment variable SIDECAR_KUBE_CPU_LIMIT: '2.0'
2024-09-16 12:30:55 platform > ----- START CHECK -----
2024-09-16 12:30:55 platform > Using default value for environment variable SOCAT_KUBE_CPU_LIMIT: '2.0'
2024-09-16 12:30:55 platform > 
2024-09-16 12:30:55 platform > Using default value for environment variable SIDECAR_KUBE_CPU_REQUEST: '0.1'
2024-09-16 12:30:55 platform > Using default value for environment variable SOCAT_KUBE_CPU_REQUEST: '0.1'
2024-09-16 12:30:55 platform > Checking if airbyte/source-mailchimp:2.0.18 exists...
2024-09-16 12:30:55 platform > airbyte/source-mailchimp:2.0.18 was found locally.
2024-09-16 12:30:55 platform > Creating docker container = source-mailchimp-check-15813-0-xqbaw with resources io.airbyte.config.ResourceRequirements@78810b2a[cpuRequest=,cpuLimit=,memoryRequest=,memoryLimit=,additionalProperties={}] and allowedHosts io.airbyte.config.AllowedHosts@6bcb24fb[hosts=[*.api.mailchimp.com, login.mailchimp.com, *.datadoghq.com, *.datadoghq.eu, *.sentry.io],additionalProperties={}]
2024-09-16 12:30:55 platform > Preparing command: docker run --rm --init -i -w /data/15813/0 --log-driver none --name source-mailchimp-check-15813-0-xqbaw --network host -v airbyte_workspace:/data -v oss_local_root:/local -e DEPLOYMENT_MODE=OSS -e WORKER_CONNECTOR_IMAGE=airbyte/source-mailchimp:2.0.18 -e AUTO_DETECT_SCHEMA=true -e LAUNCHDARKLY_KEY= -e SOCAT_KUBE_CPU_REQUEST=0.1 -e SOCAT_KUBE_CPU_LIMIT=2.0 -e FIELD_SELECTION_WORKSPACES= -e USE_STREAM_CAPABLE_STATE=true -e AIRBYTE_ROLE=dev -e WORKER_ENVIRONMENT=DOCKER -e APPLY_FIELD_SELECTION=false -e WORKER_JOB_ATTEMPT=0 -e OTEL_COLLECTOR_ENDPOINT=http://host.docker.internal:4317 -e FEATURE_FLAG_CLIENT=config -e AIRBYTE_VERSION=0.63.13 -e WORKER_JOB_ID=15813 airbyte/source-mailchimp:2.0.18 check --config source_config.json
2024-09-16 12:30:55 platform > Reading messages from protocol version 0.2.0
2024-09-16 12:30:58 platform > Encountered an error trying to connect to stream campaigns. Error: 
 Traceback (most recent call last):
  File "/usr/local/lib/python3.10/encodings/idna.py", line 163, in encode
    raise UnicodeError("label empty or too long")
UnicodeError: label empty or too long

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/usr/local/lib/python3.10/site-packages/airbyte_cdk/sources/declarative/checks/check_stream.py", line 42, in check_connection
    stream_is_available, reason = availability_strategy.check_availability(stream, logger)
  File "/usr/local/lib/python3.10/site-packages/airbyte_cdk/sources/streams/http/availability_strategy.py", line 46, in check_availability
    self.get_first_record_for_slice(stream, stream_slice)
  File "/usr/local/lib/python3.10/site-packages/airbyte_cdk/sources/streams/availability_strategy.py", line 75, in get_first_record_for_slice
    return next(records_for_slice)
  File "/usr/local/lib/python3.10/site-packages/airbyte_cdk/sources/declarative/declarative_stream.py", line 136, in read_records
    yield from self.retriever.read_records(self.get_json_schema(), stream_slice)
  File "/usr/local/lib/python3.10/site-packages/airbyte_cdk/sources/declarative/retrievers/simple_retriever.py", line 375, in read_records
    for stream_data in self._read_pages(record_generator, self.state, _slice):
  File "/usr/local/lib/python3.10/site-packages/airbyte_cdk/sources/declarative/retrievers/simple_retriever.py", line 298, in _read_pages
    response = self._fetch_next_page(stream_state, stream_slice, next_page_token)
  File "/usr/local/lib/python3.10/site-packages/airbyte_cdk/sources/declarative/retrievers/simple_retriever.py", line 273, in _fetch_next_page
    return self.requester.send_request(
  File "/usr/local/lib/python3.10/site-packages/airbyte_cdk/sources/declarative/requesters/http_requester.py", line 305, in send_request
    request, response = self._http_client.send_request(
  File "/usr/local/lib/python3.10/site-packages/airbyte_cdk/sources/streams/http/http_client.py", line 382, in send_request
    response: requests.Response = self._send_with_retry(
  File "/usr/local/lib/python3.10/site-packages/airbyte_cdk/sources/streams/http/http_client.py", line 228, in _send_with_retry
    response = backoff_handler(rate_limit_backoff_handler(user_backoff_handler))(request, request_kwargs, log_formatter=log_formatter, exit_on_rate_limit=exit_on_rate_limit)  # type: ignore # mypy can't infer that backoff_handler wraps _send
  File "/usr/local/lib/python3.10/site-packages/backoff/_sync.py", line 105, in retry
    ret = target(*args, **kwargs)
  File "/usr/local/lib/python3.10/site-packages/backoff/_sync.py", line 105, in retry
    ret = target(*args, **kwargs)
  File "/usr/local/lib/python3.10/site-packages/backoff/_sync.py", line 105, in retry
    ret = target(*args, **kwargs)
  File "/usr/local/lib/python3.10/site-packages/airbyte_cdk/sources/streams/http/http_client.py", line 253, in _send
    response = self._session.send(request, **request_kwargs)
  File "/usr/local/lib/python3.10/site-packages/requests_cache/session.py", line 212, in send
    self.cache.create_key(request, **kwargs), request, self.settings
  File "/usr/local/lib/python3.10/site-packages/requests_cache/backends/base.py", line 127, in create_key
    return key_fn(
  File "/usr/local/lib/python3.10/site-packages/requests_cache/cache_keys.py", line 71, in create_key
    request = normalize_request(request, ignored_parameters)
  File "/usr/local/lib/python3.10/site-packages/requests_cache/cache_keys.py", line 127, in normalize_request
    norm_request.url = normalize_url(norm_request.url or '', ignored_parameters)
  File "/usr/local/lib/python3.10/site-packages/requests_cache/cache_keys.py", line 152, in normalize_url
    return url_normalize(url)
  File "/usr/local/lib/python3.10/site-packages/url_normalize/url_normalize.py", line 235, in url_normalize
    host=normalize_host(url_elements.host, charset),
  File "/usr/local/lib/python3.10/site-packages/url_normalize/url_normalize.py", line 107, in normalize_host
    host = host.encode("idna").decode(charset)
UnicodeError: encoding with 'idna' codec failed (UnicodeError: label empty or too long)

2024-09-16 12:30:58 platform > Check failed
2024-09-16 12:30:58 platform > Check connection job received output: io.airbyte.config.StandardCheckConnectionOutput@5fd1619c[status=failed,message="Unable to connect to stream campaigns - encoding with 'idna' codec failed (UnicodeError: label empty or too long)",additionalProperties={}]
2024-09-16 12:30:58 platform > 
2024-09-16 12:30:58 platform > ----- END CHECK -----
2024-09-16 12:30:58 platform > 
2024-09-16 12:30:58 platform > Retry State: RetryManager(completeFailureBackoffPolicy=BackoffPolicy(minInterval=PT10S, maxInterval=PT30M, base=3), partialFailureBackoffPolicy=null, successiveCompleteFailureLimit=5, totalCompleteFailureLimit=10, successivePartialFailureLimit=1000, totalPartialFailureLimit=20, successiveCompleteFailures=1, totalCompleteFailures=1, successivePartialFailures=0, totalPartialFailures=0)
 Backoff before next attempt: 10 seconds
2024-09-16 12:30:58 platform > Failing job: 15813, reason: Connection Check Failed ac4aa98f-bd3c-43f9-ab52-4cedeb24b211

Relevant log output

Encountered an error trying to connect to stream campaigns. Error: 
 Traceback (most recent call last):
  File "/usr/local/lib/python3.10/encodings/idna.py", line 163, in encode
    raise UnicodeError("label empty or too long")
UnicodeError: label empty or too long
The above exception was the direct cause of the following exception:

Contribute

marcosmarxm commented 1 month ago

@girarda it looks a problem in the Airbyte CDK, maybe the during pagination it could get the host/url to send the request?