airbytehq / airbyte

The leading data integration platform for ETL / ELT data pipelines from APIs, databases & files to data warehouses, data lakes & data lakehouses. Both self-hosted and Cloud-hosted.
https://airbyte.com
Other
16.16k stars 4.13k forks source link

[source-google-search-console] service account with insufficient permission throws domain validation error instead #34285

Open willi-mueller opened 10 months ago

willi-mueller commented 10 months ago

Connector Name

source-google-search-console

Connector Version

1.3.6

What step the error happened?

Updating the connector

Relevant information

On Airbyte OSS I get the following error:

"InvalidSiteURLValidationError('The following URLs are not permitted: sc-domain:sipgate.io')"

The error is thrown on domains, such as:

However, these domains work:

What's puzzling: The domains listed above work on Airbyte SaaS. See screenshot: Screenshot Airbyte SaaS working

But, on Airbyte OSS, the same version of the connector does not accept these domains. The only difference is the authentication method. Airbyte SaaS uses OAuth whereas Airbyte OSS uses a service account.

See screenshot Airbyte OSS: Screenshot Airbyte OSS not working

Relevant log output

2024-01-16 09:28:39 platform > Docker volume job log path: /tmp/workspace/121f582f-12ab-4d65-b897-697fe50d4557/0/logs.log
2024-01-16 09:28:39 platform > Executing worker wrapper. Airbyte version: 0.50.41
2024-01-16 09:28:39 platform > Attempt 0 to save workflow id for cancellation
2024-01-16 09:28:39 platform > 
2024-01-16 09:28:39 platform > Using default value for environment variable SIDECAR_KUBE_CPU_LIMIT: '2.0'
2024-01-16 09:28:39 platform > Using default value for environment variable SOCAT_KUBE_CPU_LIMIT: '2.0'
2024-01-16 09:28:39 platform > ----- START check-orchestrator -----
2024-01-16 09:28:39 platform > 
2024-01-16 09:28:39 platform > Using default value for environment variable SIDECAR_KUBE_CPU_REQUEST: '0.1'
2024-01-16 09:28:39 platform > Using default value for environment variable SOCAT_KUBE_CPU_REQUEST: '0.1'
2024-01-16 09:28:39 platform > Using default value for environment variable LAUNCHDARKLY_KEY: ''
2024-01-16 09:28:39 platform > Checking if airbyte/source-google-search-console:1.3.6 exists...
2024-01-16 09:28:39 platform > airbyte/source-google-search-console:1.3.6 was found locally.
2024-01-16 09:28:39 platform > Creating docker container = source-google-search-console-check-121f582f-12ab-4d65-b897-697fe50d4557-0-ftokp with resources io.airbyte.config.ResourceRequirements@6186155b[cpuRequest=,cpuLimit=,memoryRequest=,memoryLimit=,additionalProperties={}] and allowedHosts null
2024-01-16 09:28:39 platform > Preparing command: docker run --rm --init -i -w /data/121f582f-12ab-4d65-b897-697fe50d4557/0 --log-driver none --name source-google-search-console-check-121f582f-12ab-4d65-b897-697fe50d4557-0-ftokp --network host -v airbyte_workspace:/data -v /tmp/airbyte_local:/local -e DEPLOYMENT_MODE=OSS -e WORKER_CONNECTOR_IMAGE=airbyte/source-google-search-console:1.3.6 -e AUTO_DETECT_SCHEMA=true -e LAUNCHDARKLY_KEY= -e SOCAT_KUBE_CPU_REQUEST=0.1 -e SOCAT_KUBE_CPU_LIMIT=2.0 -e FIELD_SELECTION_WORKSPACES= -e USE_STREAM_CAPABLE_STATE=true -e WORKER_ENVIRONMENT=DOCKER -e AIRBYTE_ROLE= -e APPLY_FIELD_SELECTION=false -e WORKER_JOB_ATTEMPT=0 -e OTEL_COLLECTOR_ENDPOINT=http://host.docker.internal:4317 -e FEATURE_FLAG_CLIENT=config -e AIRBYTE_VERSION=0.50.41 -e WORKER_JOB_ID=121f582f-12ab-4d65-b897-697fe50d4557 airbyte/source-google-search-console:1.3.6 check --config source_config.json
2024-01-16 09:28:39 platform > Reading messages from protocol version 0.2.0
2024-01-16 09:28:40 platform > Check failed
2024-01-16 09:28:41 platform > Check connection job received output: io.airbyte.config.StandardCheckConnectionOutput@4e200677[status=failed,message="InvalidSiteURLValidationError('The following URLs are not permitted: test.de')",additionalProperties={}]
2024-01-16 09:28:41 platform > 
2024-01-16 09:28:41 platform > ----- END check-orchestrator -----
2024-01-16 09:28:41 platform >

Contribute

marcosmarxm commented 10 months ago

@willi-mueller any change you're copying-paste the parameter? Did you try typing it in OSS? The code are the same an this step basically make a call to urlparse function.

Did you try to create the source without the domain? (maybe bad credentials + wrong error are dissembling another problem)

willi-mueller commented 10 months ago

Thank you very much @marcosmarxm for looking at it so quickly and providing helpful pointers!

It was a rights problem.

The solution was that the service account (used for OSS) had insufficient rights to fetch the data from Google about certain domains I entered. However, the OAuth account on Airbyte SaaS has the privilege to access that data.

Thus, I wonder if we can improve the error message here and clarify that the format of the URL is fine, it's just that the requesting credentials are not authorized to get data about that URL.

What do you think?

marcosmarxm commented 10 months ago

Totally agree! I'll update the title to match the real problem

willi-mueller commented 10 months ago

Wonderful, thank you!

On 16-Jan-2024, at 23:26, Marcos Marx @.***> wrote:

Totally agree! I'll update the title to match the real problem — Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you were mentioned.Message ID: @.***>

willi-mueller commented 10 months ago

Totally agree! I'll update the title to match the real problem

I updated the title because I think in this way it's more clear that when the service account has insufficient permissions the connector throws InvalidSiteURLValidationError instead of some insufficient rights error.