airbytehq / airbyte

The leading data integration platform for ETL / ELT data pipelines from APIs, databases & files to data warehouses, data lakes & data lakehouses. Both self-hosted and Cloud-hosted.
https://airbyte.com
Other
15.72k stars 4.03k forks source link

[source-google-search-console] 401 Client Error: Unauthorized for url while having access to the domain queried #34663

Open MartinPrejean opened 8 months ago

MartinPrejean commented 8 months ago

Connector Name

source-google-search-console

Connector Version

1.3.6

What step the error happened?

During the sync

Relevant information

I am using the connector using the service account key + i have siteRestrictedUser permission level on that service account

While querying the 4 different streams of the connector, I have an unauthorized error for the stream "search_analytics_keyword_site_report_by_site"

I don't know why it happens because I query the rest of the data just fine. Is there a difference between this stream and the others ?

Here are the 4 streams queried

Thank you for your response !

Relevant log output

2024-01-30 15:50:57 source > {
  "error": {
    "code": 401,
    "message": "Request is missing required authentication credential. Expected OAuth 2 access token, login cookie or other valid authentication credential. See https://developers.google.com/identity/sign-in/web/devconsole-project.",
    "errors": [
      {
        "message": "Login Required.",
        "domain": "global",
        "reason": "required",
        "location": "Authorization",
        "locationType": "header"
      }
    ],
    "status": "UNAUTHENTICATED",
    "details": [
      {
        "@type": "type.googleapis.com/google.rpc.ErrorInfo",
        "reason": "CREDENTIALS_MISSING",
        "domain": "googleapis.com",
        "metadata": {
          "method": "google.searchconsole.v1.searchanalytics.SearchAnalyticsService.Query",
          "service": "searchconsole.googleapis.com"
        }
      }
    ]
  }
}
2024-01-30 15:50:57 source > Encountered an exception while reading stream search_analytics_keyword_site_report_by_site
Traceback (most recent call last):
  File "/usr/local/lib/python3.9/site-packages/airbyte_cdk/sources/abstract_source.py", line 116, in read
    stream_is_available, reason = stream_instance.check_availability(logger, self)
  File "/usr/local/lib/python3.9/site-packages/airbyte_cdk/sources/streams/core.py", line 211, in check_availability
    return self.availability_strategy.check_availability(self, logger, source)
  File "/usr/local/lib/python3.9/site-packages/airbyte_cdk/sources/streams/http/availability_strategy.py", line 56, in check_availability
    is_available, reason = self.handle_http_error(stream, logger, source, error)
  File "/usr/local/lib/python3.9/site-packages/airbyte_cdk/sources/streams/http/availability_strategy.py", line 85, in handle_http_error
    raise error
  File "/usr/local/lib/python3.9/site-packages/airbyte_cdk/sources/streams/http/availability_strategy.py", line 50, in check_availability  File "/usr/local/lib/python3.9/site-packages/airbyte_cdk/sources/streams/http/availability_strategy.py", line 50, in check_availability
    get_first_record_for_slice(stream, stream_slice)
  File "/usr/local/lib/python3.9/site-packages/airbyte_cdk/sources/streams/utils/stream_helper.py", line 40, in get_first_record_for_slice
    return next(records_for_slice)
  File "/usr/local/lib/python3.9/site-packages/airbyte_cdk/sources/streams/http/http.py", line 463, in read_records
    yield from self._read_pages(
  File "/usr/local/lib/python3.9/site-packages/airbyte_cdk/sources/streams/http/http.py", line 479, in _read_pages
    request, response = self._fetch_next_page(stream_slice, stream_state, next_page_token)
  File "/usr/local/lib/python3.9/site-packages/airbyte_cdk/sources/streams/http/http.py", line 500, in _fetch_next_page
    json=self.request_body_json(stream_state=stream_state, stream_slice=stream_slice, next_page_token=next_page_token),
  File "/airbyte/integration_code/source_google_search_console/streams.py", line 371, in request_body_json
    keywords = {record["searchAppearance"] for record in keywords_records}
  File "/airbyte/integration_code/source_google_search_console/streams.py", line 371, in <setcomp>
    keywords = {record["searchAppearance"] for record in keywords_records}
  File "/usr/local/lib/python3.9/site-packages/airbyte_cdk/sources/streams/http/http.py", line 463, in read_records
    yield from self._read_pages(
  File "/usr/local/lib/python3.9/site-packages/airbyte_cdk/sources/streams/http/http.py", line 479, in _read_pages
    request, response = self._fetch_next_page(stream_slice, stream_state, next_page_token)
  File "/usr/local/lib/python3.9/site-packages/airbyte_cdk/sources/streams/http/http.py", line 505, in _fetch_next_page
    response = self._send_request(request, request_kwargs)
  File "/usr/local/lib/python3.9/site-packages/airbyte_cdk/sources/streams/http/http.py", line 403, in _send_request
    return backoff_handler(user_backoff_handler)(request, request_kwargs)
  File "/usr/local/lib/python3.9/site-packages/backoff/_sync.py", line 105, in retry
    ret = target(*args, **kwargs)
  File "/usr/local/lib/python3.9/site-packages/backoff/_sync.py", line 105, in retry
    ret = target(*args, **kwargs)
  File "/usr/local/lib/python3.9/site-packages/airbyte_cdk/sources/streams/http/http.py", line 362, in _send
    raise exc
  File "/usr/local/lib/python3.9/site-packages/airbyte_cdk/sources/streams/http/http.py", line 359, in _send
    response.raise_for_status()
  File "/usr/local/lib/python3.9/site-packages/requests/models.py", line 1021, in raise_for_status
    raise HTTPError(http_error_msg, response=self)
requests.exceptions.HTTPError: 401 Client Error: Unauthorized for url: https://www.googleapis.com/webmasters/v3/sites/sc-domain.com/searchAnalytics/query

Contribute

bgadrian commented 4 months ago

I also encountered this problem.

The endpoint used by search reports is https://developers.google.com/webmaster-tools/v1/searchanalytics/query which requires webmaster role.

Not sure if is related but in my case the google service account user does Not have delegating domain-wide authority to the service account, because there is no Workplace admin account (according to this connector we have to folow the steps https://developers.google.com/identity/protocols/oauth2/service-account#delegatingauthority )

In Search Console upgrading from restricted user to full user https://support.google.com/webmasters/answer/7687615?hl=en has no effect, error persists.

IMO if there is no other way to give the webmaster scope to this user it should state in the connector documentation that requires Webmaster account (that is a Google paid service), and does Not work with google analytics free normal accounts. The current documentation states "Note on delegating domain-wide authority to the service account", a note is just a note not a mandatory step, so maybe this phrase needs to be more concise.

affected schemas are search_analytics_keyword_page_report search_analytics_keyword_site_report_by_page search_analytics_keyword_site_report_by_site sitemaps