airbytehq / airbyte

The leading data integration platform for ETL / ELT data pipelines from APIs, databases & files to data warehouses, data lakes & data lakehouses. Both self-hosted and Cloud-hosted.
https://airbyte.com
Other
14.71k stars 3.78k forks source link

[connector-builder] OAuth issue when using Microsoft #36598

Open Samuel-Dittmann opened 3 months ago

Samuel-Dittmann commented 3 months ago

Connector Name

source-custom_connector-sharepoint_lists

Connector Version

12

What step the error happened?

Configuring a new connector

Relevant information

Hey,

Im working on a custom connector using the UI builder to get data from a few of our SharePoint Online Lists into Databricks. I have configured each list as its own stream and authenticated using an Entra ID App Registration. In the builder everything works perfectly. As soon as I publish it and want to create a connection, I get the following error:

Configuration check failed

Unable to connect to stream cost_centers - HTTPSConnectionPool(host='[login.microsoftonline.com](http://login.microsoftonline.com/)', port=443): Max retries exceeded with url: /my-tenant-id/oauth2/v2.0/token (Caused by NewConnectionError('<urllib3.connection.HTTPSConnection object at 0x7fd22c6f54f0>: Failed to establish a new connection: [Errno 101] Network unreachable'))

However I will not get the error when I obtain the Bearer Token externaly via Postman and switch the connector to Bearer Type auth - then everything works as expected aswell.

Relevant log output

2024-03-28 08:23:26 platform > Docker volume job log path: /tmp/workspace/521cbc5c-37b2-4c4b-8749-e54417e74b34/0/logs.log
2024-03-28 08:23:26 platform > Executing worker wrapper. Airbyte version: 0.53.1
2024-03-28 08:23:26 platform > Attempt 0 to save workflow id for cancellation
2024-03-28 08:23:26 platform > 
2024-03-28 08:23:26 platform > ----- START CHECK -----
2024-03-28 08:23:26 platform > 
2024-03-28 08:23:26 platform > Using default value for environment variable SIDECAR_KUBE_CPU_LIMIT: '2.0'
2024-03-28 08:23:26 platform > Using default value for environment variable SOCAT_KUBE_CPU_LIMIT: '2.0'
2024-03-28 08:23:26 platform > Using default value for environment variable SIDECAR_KUBE_CPU_REQUEST: '0.1'
2024-03-28 08:23:26 platform > Using default value for environment variable SOCAT_KUBE_CPU_REQUEST: '0.1'
2024-03-28 08:23:26 platform > Checking if airbyte/source-declarative-manifest:0.65.0 exists...
2024-03-28 08:23:26 platform > airbyte/source-declarative-manifest:0.65.0 was found locally.
2024-03-28 08:23:26 platform > Creating docker container = source-declarative-manifest-check-521cbc5c-37b2-4c4b-8749-e54417e74b34-0-gzmxq with resources io.airbyte.config.ResourceRequirements@588ba809[cpuRequest=,cpuLimit=,memoryRequest=,memoryLimit=,additionalProperties={}] and allowedHosts null
2024-03-28 08:23:26 platform > Preparing command: docker run --rm --init -i -w /data/521cbc5c-37b2-4c4b-8749-e54417e74b34/0 --log-driver none --name source-declarative-manifest-check-521cbc5c-37b2-4c4b-8749-e54417e74b34-0-gzmxq --network host -v airbyte_workspace:/data -v /tmp/airbyte_local:/local -e DEPLOYMENT_MODE=OSS -e WORKER_CONNECTOR_IMAGE=airbyte/source-declarative-manifest:0.65.0 -e AUTO_DETECT_SCHEMA=true -e LAUNCHDARKLY_KEY= -e SOCAT_KUBE_CPU_REQUEST=0.1 -e SOCAT_KUBE_CPU_LIMIT=2.0 -e FIELD_SELECTION_WORKSPACES= -e USE_STREAM_CAPABLE_STATE=true -e WORKER_ENVIRONMENT=DOCKER -e AIRBYTE_ROLE=dev -e APPLY_FIELD_SELECTION=false -e WORKER_JOB_ATTEMPT=0 -e OTEL_COLLECTOR_ENDPOINT=http://host.docker.internal:4317 -e FEATURE_FLAG_CLIENT=config -e AIRBYTE_VERSION=0.53.1 -e WORKER_JOB_ID=521cbc5c-37b2-4c4b-8749-e54417e74b34 airbyte/source-declarative-manifest:0.65.0 check --config source_config.json
2024-03-28 08:23:26 platform > Reading messages from protocol version 0.2.0
2024-03-28 08:23:28 platform > Encountered an error trying to connect to stream cost_centers. Error: 
 Traceback (most recent call last):
  File "/usr/local/lib/python3.9/site-packages/urllib3/connection.py", line 198, in _new_conn
    sock = connection.create_connection(
  File "/usr/local/lib/python3.9/site-packages/urllib3/util/connection.py", line 85, in create_connection
    raise err
  File "/usr/local/lib/python3.9/site-packages/urllib3/util/connection.py", line 73, in create_connection
    sock.connect(sa)
OSError: [Errno 101] Network unreachable

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/usr/local/lib/python3.9/site-packages/urllib3/connectionpool.py", line 793, in urlopen
    response = self._make_request(
  File "/usr/local/lib/python3.9/site-packages/urllib3/connectionpool.py", line 491, in _make_request
    raise new_e
  File "/usr/local/lib/python3.9/site-packages/urllib3/connectionpool.py", line 467, in _make_request
    self._validate_conn(conn)
  File "/usr/local/lib/python3.9/site-packages/urllib3/connectionpool.py", line 1099, in _validate_conn
    conn.connect()
  File "/usr/local/lib/python3.9/site-packages/urllib3/connection.py", line 616, in connect
    self.sock = sock = self._new_conn()
  File "/usr/local/lib/python3.9/site-packages/urllib3/connection.py", line 213, in _new_conn
    raise NewConnectionError(
urllib3.exceptions.NewConnectionError: <urllib3.connection.HTTPSConnection object at 0x7f2122aa9580>: Failed to establish a new connection: [Errno 101] Network unreachable

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/usr/local/lib/python3.9/site-packages/requests/adapters.py", line 486, in send
    resp = conn.urlopen(
  File "/usr/local/lib/python3.9/site-packages/urllib3/connectionpool.py", line 847, in urlopen
    retries = retries.increment(
  File "/usr/local/lib/python3.9/site-packages/urllib3/util/retry.py", line 515, in increment
    raise MaxRetryError(_pool, url, reason) from reason  # type: ignore[arg-type]
urllib3.exceptions.MaxRetryError: HTTPSConnectionPool(host='login.microsoftonline.com', port=443): Max retries exceeded with url: /<sanitized tenant ID>/oauth2/v2.0/token (Caused by NewConnectionError('<urllib3.connection.HTTPSConnection object at 0x7f2122aa9580>: Failed to establish a new connection: [Errno 101] Network unreachable'))

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/local/lib/python3.9/site-packages/airbyte_cdk/sources/declarative/checks/check_stream.py", line 42, in check_connection
    stream_is_available, reason = availability_strategy.check_availability(stream, logger, source)
  File "/usr/local/lib/python3.9/site-packages/airbyte_cdk/sources/streams/http/availability_strategy.py", line 50, in check_availability
    get_first_record_for_slice(stream, stream_slice)
  File "/usr/local/lib/python3.9/site-packages/airbyte_cdk/sources/streams/utils/stream_helper.py", line 40, in get_first_record_for_slice
    return next(records_for_slice)
  File "/usr/local/lib/python3.9/site-packages/airbyte_cdk/sources/declarative/declarative_stream.py", line 104, in read_records
    yield from self.retriever.read_records(self.get_json_schema(), stream_slice)
  File "/usr/local/lib/python3.9/site-packages/airbyte_cdk/sources/declarative/retrievers/simple_retriever.py", line 324, in read_records
    for stream_data in self._read_pages(record_generator, self.state, stream_slice):
  File "/usr/local/lib/python3.9/site-packages/airbyte_cdk/sources/declarative/retrievers/simple_retriever.py", line 288, in _read_pages
    response = self._fetch_next_page(stream_state, stream_slice, next_page_token)
  File "/usr/local/lib/python3.9/site-packages/airbyte_cdk/sources/declarative/retrievers/simple_retriever.py", line 263, in _fetch_next_page
    return self.requester.send_request(
  File "/usr/local/lib/python3.9/site-packages/airbyte_cdk/sources/declarative/requesters/http_requester.py", line 454, in send_request
    headers=self._request_headers(stream_state, stream_slice, next_page_token, request_headers),
  File "/usr/local/lib/python3.9/site-packages/airbyte_cdk/sources/declarative/requesters/http_requester.py", line 308, in _request_headers
    headers = self._get_request_options(
  File "/usr/local/lib/python3.9/site-packages/airbyte_cdk/sources/declarative/requesters/http_requester.py", line 292, in _get_request_options
    auth_options_method(),
  File "/usr/local/lib/python3.9/site-packages/airbyte_cdk/sources/streams/http/requests_native_auth/abstract_oauth.py", line 56, in get_auth_header
    return {"Authorization": f"Bearer {self.get_access_token()}"}
  File "/usr/local/lib/python3.9/site-packages/airbyte_cdk/sources/streams/http/requests_native_auth/abstract_oauth.py", line 61, in get_access_token
    token, expires_in = self.refresh_access_token()
  File "/usr/local/lib/python3.9/site-packages/airbyte_cdk/sources/streams/http/requests_native_auth/abstract_oauth.py", line 150, in refresh_access_token
    response_json = self._get_refresh_access_token_response()
  File "/usr/local/lib/python3.9/site-packages/backoff/_sync.py", line 105, in retry
    ret = target(*args, **kwargs)
  File "/usr/local/lib/python3.9/site-packages/airbyte_cdk/sources/streams/http/requests_native_auth/abstract_oauth.py", line 118, in _get_refresh_access_token_response
    response = requests.request(method="POST", url=self.get_token_refresh_endpoint(), data=self.build_refresh_request_body())
  File "/usr/local/lib/python3.9/site-packages/requests/api.py", line 59, in request
    return session.request(method=method, url=url, **kwargs)
  File "/usr/local/lib/python3.9/site-packages/requests/sessions.py", line 589, in request
    resp = self.send(prep, **send_kwargs)
  File "/usr/local/lib/python3.9/site-packages/requests/sessions.py", line 703, in send
    r = adapter.send(request, **kwargs)
  File "/usr/local/lib/python3.9/site-packages/requests/adapters.py", line 519, in send
    raise ConnectionError(e, request=request)
requests.exceptions.ConnectionError: HTTPSConnectionPool(host='login.microsoftonline.com', port=443): Max retries exceeded with url: /<sanitized tenant ID>/oauth2/v2.0/token (Caused by NewConnectionError('<urllib3.connection.HTTPSConnection object at 0x7f2122aa9580>: Failed to establish a new connection: [Errno 101] Network unreachable'))

2024-03-28 08:23:28 platform > Check failed
2024-03-28 08:23:28 platform > Check connection job received output: io.airbyte.config.StandardCheckConnectionOutput@3df9a14a[status=failed,message="Unable to connect to stream cost_centers - HTTPSConnectionPool(host='login.microsoftonline.com', port=443): Max retries exceeded with url: /<sanitized tenant ID>/oauth2/v2.0/token (Caused by NewConnectionError('<urllib3.connection.HTTPSConnection object at 0x7f2122aa9580>: Failed to establish a new connection: [Errno 101] Network unreachable'))",additionalProperties={}]
2024-03-28 08:23:28 platform > 
2024-03-28 08:23:28 platform > ----- END CHECK -----
2024-03-28 08:23:28 platform >

Contribute

marcosmarxm commented 2 months ago

@Samuel-Dittmann are you able to run tests and read command when developing the connector? Can you share the YAML file?

Samuel-Dittmann commented 2 months ago

Hey @marcosmarxm the tests run fine and display the data that I want to see even when Im using OAuth2.

Its just when I publish it and try to use it in an active connection when it does not work. Also here is the yaml which I shrinked down and removed my tenant-id to only show one stream:

spec:
  type: Spec
  connection_specification:
    type: object
    $schema: http://json-schema.org/draft-07/schema#
    required:
      - client_id
      - client_secret
    properties:
      client_id:
        type: string
        order: 0
        title: Client ID
        airbyte_secret: true
      client_secret:
        type: string
        order: 1
        title: Client secret
        airbyte_secret: true
    additionalProperties: true
type: DeclarativeSource
check:
  type: CheckStream
  stream_names:
    - sharepoint_users
streams:
  - name: sharepoint_users
    type: DeclarativeStream
    retriever:
      type: SimpleRetriever
      paginator:
        type: NoPagination
      requester:
        path: sites/root/lists('Benutzerinformationsliste')/items?expand=fields
        type: HttpRequester
        url_base: https://graph.microsoft.com/v1.0/
        http_method: GET
        authenticator:
          type: OAuthAuthenticator
          scopes: []
          client_id: '{{ config[''client_id''] }}'
          grant_type: client_credentials
          client_secret: '{{ config[''client_secret''] }}'
          refresh_request_body:
            scope: https://graph.microsoft.com/.default
          token_refresh_endpoint: >-
            https://login.microsoftonline.com/<my-tenant-id>/oauth2/v2.0/token
        request_headers: {}
        request_body_json: {}
        request_parameters: {}
      record_selector:
        type: RecordSelector
        extractor:
          type: DpathExtractor
          field_path:
            - value
            - '*'
            - fields
      partition_router: []
    primary_key:
      - id
    schema_loader:
      type: InlineSchemaLoader
      schema:
        type: object
        $schema: http://json-schema.org/schema#
        properties:
          id:
            type: string
          Edit:
            type: string
          Name:
            type: string
          EMail:
            type: string
          Notes:
            type: string
          Title:
            type: string
          Office:
            type: string
          Created:
            type: string
          Deleted:
            type: boolean
          ImnName:
            type: string
          Picture:
            type: object
            properties:
              Url:
                type: string
              Description:
                type: string
          EditUser:
            type: string
          JobTitle:
            type: string
          LastName:
            type: string
          Modified:
            type: string
          UserName:
            type: string
          FirstName:
            type: string
          LinkTitle:
            type: string
          WorkPhone:
            type: string
          Department:
            type: string
          SipAddress:
            type: string
          '@odata.etag':
            type: string
          Attachments:
            type: boolean
          ContentType:
            type: string
          IsSiteAdmin:
            type: boolean
          MobilePhone:
            type: string
          UserSelection:
            type: string
          AuthorLookupId:
            type: string
          EditorLookupId:
            type: string
          ItemChildCount:
            type: string
          UserInfoHidden:
            type: boolean
          _ComplianceTag:
            type: string
          ContentTypeDisp:
            type: string
          LinkTitleNoMenu:
            type: string
          FolderChildCount:
            type: string
          _ComplianceFlags:
            type: string
          _UIVersionString:
            type: string
          AppAuthorLookupId:
            type: string
          AppEditorLookupId:
            type: string
          SPSResponsibility:
            type: string
          SPSPictureTimestamp:
            type: string
          _ComplianceTagUserId:
            type: string
          PictureOnly_Size_36px:
            type: string
          PictureOnly_Size_48px:
            type: string
          PictureOnly_Size_72px:
            type: string
          NameWithPictureAndDetails:
            type: string
          _ComplianceTagWrittenTime:
            type: string
          SPSPicturePlaceholderState:
            type: number
          SPSPictureExchangeSyncState:
            type: number

version: 0.65.0
metadata:
  autoImportSchema:
    countries: true
    locations: true
    delegation: true
    cost_centers: true
    company_brands: true
    legal_entities: true
    country_regions: true
    sharepoint_users: true
    cost_center_types: true
    delegation_data_scopes: true
    legal_entity_categories: true
    division_organisational_elements: true

Please note, that I have already tried to delete all streams except for one - this did not solve the problem either. Thats why I think it could be an authentication issue within Airbyte.