airbytehq / airbyte

The leading data integration platform for ETL / ELT data pipelines from APIs, databases & files to data warehouses, data lakes & data lakehouses. Both self-hosted and Cloud-hosted.
https://airbyte.com
Other
16.13k stars 4.12k forks source link

[source-mssql] No longer able to create connection between source and destination #30287

Closed dipeshkd closed 6 months ago

dipeshkd commented 1 year ago

Connector Name

source-mssql

Connector Version

2.0.1

What step the error happened?

Configuring a new connector

Revelant information

Hi,

I am on Airbyte version 0.50.28 (Also tested this on 0.50.24) I am no longer able to create a connection with MSSQL as a source. I have tested this with S3 and local Postgres destination. This was working earlier in the week.

The connection takes a long time when trying to connect to my production MSSQL image

I get the following error image

looking at the logs from worker it seems to stop at discoverPrimaryKeys I have removed the list of tables with ""

Relevant log output

2023-09-08 21:50:06 INFO i.a.w.t.TemporalAttemptExecution(get):120 - Docker volume job log path: /tmp/workspace/be422ca2-28a9-45ea-8a55-b47ae877c6bf/0/logs.log
2023-09-08 21:50:06 INFO i.a.w.t.TemporalAttemptExecution(get):125 - Executing worker wrapper. Airbyte version: 0.50.24
2023-09-08 21:50:06 INFO i.a.a.c.AirbyteApiClient(retryWithJitterThrows):290 - Attempt 0 to save workflow id for cancellation
2023-09-08 21:50:06 INFO i.a.c.EnvConfigs(getEnvOrDefault):1228 - Using default value for environment variable SIDECAR_KUBE_CPU_LIMIT: '2.0'
2023-09-08 21:50:06 INFO i.a.c.EnvConfigs(getEnvOrDefault):1228 - Using default value for environment variable SOCAT_KUBE_CPU_LIMIT: '2.0'
2023-09-08 21:50:06 INFO i.a.c.EnvConfigs(getEnvOrDefault):1228 - Using default value for environment variable SIDECAR_KUBE_CPU_REQUEST: '0.1'
2023-09-08 21:50:06 INFO i.a.c.EnvConfigs(getEnvOrDefault):1228 - Using default value for environment variable SOCAT_KUBE_CPU_REQUEST: '0.1'
2023-09-08 21:50:06 INFO i.a.c.EnvConfigs(getEnvOrDefault):1228 - Using default value for environment variable LAUNCHDARKLY_KEY: ''
2023-09-08 21:50:06 INFO i.a.c.i.LineGobbler(voidCall):149 - Checking if airbyte/source-mssql:2.0.1 exists...
2023-09-08 21:50:06 INFO i.a.c.i.LineGobbler(voidCall):149 - airbyte/source-mssql:2.0.1 was found locally.
2023-09-08 21:50:06 INFO i.a.w.p.DockerProcessFactory(create):143 - Creating docker container = source-mssql-discover-be422ca2-28a9-45ea-8a55-b47ae877c6bf-0-zswpw with resources io.airbyte.config.ResourceRequirements@569c8b43[cpuRequest=,cpuLimit=,memoryRequest=,memoryLimit=,additionalProperties={}] and allowedHosts null
2023-09-08 21:50:06 INFO i.a.w.p.DockerProcessFactory(create):196 - Preparing command: docker run --rm --init -i -w /data/be422ca2-28a9-45ea-8a55-b47ae877c6bf/0 --log-driver none --name source-mssql-discover-be422ca2-28a9-45ea-8a55-b47ae877c6bf-0-zswpw --network host -v airbyte_workspace:/data -v /tmp/airbyte_local:/local -e DEPLOYMENT_MODE=OSS -e WORKER_CONNECTOR_IMAGE=airbyte/source-mssql:2.0.1 -e AUTO_DETECT_SCHEMA=false -e LAUNCHDARKLY_KEY= -e SOCAT_KUBE_CPU_REQUEST=0.1 -e SOCAT_KUBE_CPU_LIMIT=2.0 -e USE_STREAM_CAPABLE_STATE=true -e FIELD_SELECTION_WORKSPACES= -e WORKER_ENVIRONMENT=DOCKER -e AIRBYTE_ROLE= -e APPLY_FIELD_SELECTION=false -e WORKER_JOB_ATTEMPT=0 -e OTEL_COLLECTOR_ENDPOINT=http://host.docker.internal:4317 -e FEATURE_FLAG_CLIENT=config -e AIRBYTE_VERSION=0.50.24 -e WORKER_JOB_ID=be422ca2-28a9-45ea-8a55-b47ae877c6bf airbyte/source-mssql:2.0.1 discover --config source_config.json
2023-09-08 21:50:06 INFO i.a.w.i.VersionedAirbyteStreamFactory(create):177 - Reading messages from protocol version 0.2.0
2023-09-08 21:50:08 INFO i.a.w.i.VersionedAirbyteStreamFactory(internalLog):312 - INFO i.a.i.s.m.MssqlSource(main):569 starting source: class io.airbyte.integrations.source.mssql.MssqlSource
2023-09-08 21:50:08 INFO i.a.w.i.VersionedAirbyteStreamFactory(internalLog):312 - INFO i.a.i.b.IntegrationCliParser(parseOptions):126 integration args: {discover=null, config=source_config.json}
2023-09-08 21:50:08 INFO i.a.w.i.VersionedAirbyteStreamFactory(internalLog):312 - INFO i.a.i.b.IntegrationRunner(runInternal):130 Running integration: io.airbyte.integrations.base.ssh.SshWrappedSource
2023-09-08 21:50:08 INFO i.a.w.i.VersionedAirbyteStreamFactory(internalLog):312 - INFO i.a.i.b.IntegrationRunner(runInternal):131 Command: DISCOVER
2023-09-08 21:50:08 INFO i.a.w.i.VersionedAirbyteStreamFactory(internalLog):312 - INFO i.a.i.b.IntegrationRunner(runInternal):132 Integration config: IntegrationConfig{command=DISCOVER, configPath='source_config.json', catalogPath='null', statePath='null'}
2023-09-08 21:50:08 WARN i.a.w.i.VersionedAirbyteStreamFactory(internalLog):309 - WARN c.n.s.JsonMetaSchema(newValidator):278 Unknown keyword order - you should define your own Meta Schema. If the keyword is irrelevant for validation, just use a NonValidationKeyword
2023-09-08 21:50:08 WARN i.a.w.i.VersionedAirbyteStreamFactory(internalLog):309 - WARN c.n.s.JsonMetaSchema(newValidator):278 Unknown keyword airbyte_secret - you should define your own Meta Schema. If the keyword is irrelevant for validation, just use a NonValidationKeyword
2023-09-08 21:50:08 WARN i.a.w.i.VersionedAirbyteStreamFactory(internalLog):309 - WARN c.n.s.JsonMetaSchema(newValidator):278 Unknown keyword display_type - you should define your own Meta Schema. If the keyword is irrelevant for validation, just use a NonValidationKeyword
2023-09-08 21:50:09 INFO i.a.w.i.VersionedAirbyteStreamFactory(internalLog):312 - INFO i.a.i.b.s.SshTunnel(getInstance):204 Starting connection with method: NO_TUNNEL
2023-09-08 21:50:09 INFO i.a.w.i.VersionedAirbyteStreamFactory(internalLog):312 - INFO c.z.h.HikariDataSource(<init>):80 HikariPool-1 - Starting...
2023-09-08 21:50:09 INFO i.a.w.i.VersionedAirbyteStreamFactory(internalLog):312 - INFO c.z.h.HikariDataSource(<init>):82 HikariPool-1 - Start completed.
2023-09-08 21:50:10 INFO i.a.w.i.VersionedAirbyteStreamFactory(internalLog):312 - INFO i.a.i.s.j.AbstractJdbcSource(discoverInternal):170 Internal schemas to exclude: [spt_fallback_db, spt_monitor, cdc, spt_values, INFORMATION_SCHEMA, spt_fallback_usg, MSreplication_options, sys, spt_fallback_dev]
2023-09-08 21:50:13 INFO i.a.w.i.VersionedAirbyteStreamFactory(internalLog):312 - INFO i.a.i.s.j.AbstractJdbcSource(discoverPrimaryKeys):277 Discover primary keys for tables: [<incomplete list of tables>
2023-09-08 21:53:49 INFO i.a.w.i.VersionedAirbyteStreamFactory(internalLog):312 - INFO c.z.h.HikariDataSource(close):350 HikariPool-1 - Shutdown initiated...
2023-09-08 21:53:49 INFO i.a.w.i.VersionedAirbyteStreamFactory(internalLog):312 - INFO c.z.h.HikariDataSource(close):352 HikariPool-1 - Shutdown completed.
2023-09-08 21:53:49 INFO i.a.w.i.VersionedAirbyteStreamFactory(internalLog):312 - INFO i.a.i.b.IntegrationRunner(runInternal):226 Completed integration: io.airbyte.integrations.base.ssh.SshWrappedSource
2023-09-08 21:53:49 INFO i.a.w.i.VersionedAirbyteStreamFactory(internalLog):312 - INFO i.a.i.s.m.MssqlSource(main):571 completed source: class io.airbyte.integrations.source.mssql.MssqlSource
2023-09-08 21:53:49 INFO i.a.a.c.AirbyteApiClient(retryWithJitterThrows):290 - Attempt 0 to call to write discover schema result
2023-09-08 21:53:49 INFO i.a.a.c.AirbyteApiClient(retryWithJitterThrows):295 - Attempt 0 to call to write discover schema result error: io.airbyte.api.client.invoker.generated.ApiException: java.io.IOException: HTTP/1.1 header parser received no bytes
2023-09-08 21:53:58 INFO i.a.a.c.AirbyteApiClient(retryWithJitterThrows):290 - Attempt 1 to call to write discover schema result
2023-09-08 21:53:58 INFO i.a.a.c.AirbyteApiClient(retryWithJitterThrows):295 - Attempt 1 to call to write discover schema result error: io.airbyte.api.client.invoker.generated.ApiException: java.io.IOException: HTTP/1.1 header parser received no bytes
2023-09-08 21:54:03 INFO i.a.a.c.AirbyteApiClient(retryWithJitterThrows):290 - Attempt 2 to call to write discover schema result
2023-09-08 21:54:03 INFO i.a.a.c.AirbyteApiClient(retryWithJitterThrows):295 - Attempt 2 to call to write discover schema result error: io.airbyte.api.client.invoker.generated.ApiException: java.io.IOException: HTTP/1.1 header parser received no bytes
2023-09-08 21:55:06 WARN i.t.i.s.WorkflowExecuteRunnable(throwAndFailWorkflowExecution):134 - Workflow execution failure WorkflowId='44ecb67c-a1db-41b0-9c1c-acee7acb9bec', RunId=0c4d99a9-414a-4ea3-bd5d-189c24528799, WorkflowType='DiscoverCatalogWorkflow'
io.temporal.failure.ActivityFailure: scheduledEventId=5, startedEventId=6, activityType='Run', activityId='3d85096f-6ae0-3774-af86-3aec6304a0e1', identity='', retryState=RETRY_STATE_NON_RETRYABLE_FAILURE
    at java.lang.Thread.getStackTrace(Thread.java:2550) ~[?:?]
    at io.temporal.internal.sync.ActivityStubBase.execute(ActivityStubBase.java:49) ~[temporal-sdk-1.17.0.jar:?]
    at io.temporal.internal.sync.ActivityInvocationHandler.lambda$getActivityFunc$0(ActivityInvocationHandler.java:78) ~[temporal-sdk-1.17.0.jar:?]
    at io.temporal.internal.sync.ActivityInvocationHandlerBase.invoke(ActivityInvocationHandlerBase.java:60) ~[temporal-sdk-1.17.0.jar:?]
    at jdk.proxy2.$Proxy84.run(Unknown Source) ~[?:?]
    at io.airbyte.workers.temporal.discover.catalog.DiscoverCatalogWorkflowImpl.run(DiscoverCatalogWorkflowImpl.java:37) ~[io.airbyte-airbyte-workers-0.50.24.jar:?]
    at DiscoverCatalogWorkflowImplProxy.run$accessor$fUevEZLV(Unknown Source) ~[?:?]
    at DiscoverCatalogWorkflowImplProxy$auxiliary$b9Q37NQ9.call(Unknown Source) ~[?:?]
    at io.airbyte.workers.temporal.support.TemporalActivityStubInterceptor.execute(TemporalActivityStubInterceptor.java:79) ~[io.airbyte-airbyte-workers-0.50.24.jar:?]
    at DiscoverCatalogWorkflowImplProxy.run(Unknown Source) ~[?:?]
    at jdk.internal.reflect.DirectMethodHandleAccessor.invoke(DirectMethodHandleAccessor.java:104) ~[?:?]
    at java.lang.reflect.Method.invoke(Method.java:578) ~[?:?]
    at io.temporal.internal.sync.POJOWorkflowImplementationFactory$POJOWorkflowImplementation$RootWorkflowInboundCallsInterceptor.execute(POJOWorkflowImplementationFactory.java:302) ~[temporal-sdk-1.17.0.jar:?]
    at io.temporal.internal.sync.POJOWorkflowImplementationFactory$POJOWorkflowImplementation.execute(POJOWorkflowImplementationFactory.java:277) ~[temporal-sdk-1.17.0.jar:?]
    at io.temporal.internal.sync.WorkflowExecuteRunnable.run(WorkflowExecuteRunnable.java:71) ~[temporal-sdk-1.17.0.jar:?]
    at io.temporal.internal.sync.SyncWorkflow.lambda$start$0(SyncWorkflow.java:116) ~[temporal-sdk-1.17.0.jar:?]
    at io.temporal.internal.sync.CancellationScopeImpl.run(CancellationScopeImpl.java:102) ~[temporal-sdk-1.17.0.jar:?]
    at io.temporal.internal.sync.WorkflowThreadImpl$RunnableWrapper.run(WorkflowThreadImpl.java:106) ~[temporal-sdk-1.17.0.jar:?]
    at io.temporal.worker.ActiveThreadReportingExecutor.lambda$submit$0(ActiveThreadReportingExecutor.java:53) ~[temporal-sdk-1.17.0.jar:?]
    at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:577) ~[?:?]
    at java.util.concurrent.FutureTask.run(FutureTask.java:317) ~[?:?]
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1144) ~[?:?]
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:642) ~[?:?]
    at java.lang.Thread.run(Thread.java:1589) ~[?:?]
Caused by: io.temporal.failure.TimeoutFailure: message='activity ScheduleToClose timeout', timeoutType=TIMEOUT_TYPE_SCHEDULE_TO_CLOSE
2023-09-08 21:55:26 WARN i.a.c.t.CancellationHandler$TemporalCancellationHandler(checkAndHandleCancellation):79 - The job timeout and was not a sync, we will destroy the pods related to it
io.temporal.client.ActivityNotExistsException: WorkflowId=44ecb67c-a1db-41b0-9c1c-acee7acb9bec, RunId=0c4d99a9-414a-4ea3-bd5d-189c24528799, ActivityType=Run, ActivityId=3d85096f-6ae0-3774-af86-3aec6304a0e1
    at io.temporal.internal.activity.HeartbeatContextImpl.sendHeartbeatRequest(HeartbeatContextImpl.java:200) ~[temporal-sdk-1.17.0.jar:?]
    at io.temporal.internal.activity.HeartbeatContextImpl.doHeartBeatLocked(HeartbeatContextImpl.java:144) ~[temporal-sdk-1.17.0.jar:?]
    at io.temporal.internal.activity.HeartbeatContextImpl.lambda$scheduleNextHeartbeatLocked$0(HeartbeatContextImpl.java:167) ~[temporal-sdk-1.17.0.jar:?]
    at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:577) ~[?:?]
    at java.util.concurrent.FutureTask.run(FutureTask.java:317) ~[?:?]
    at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:304) ~[?:?]
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1144) ~[?:?]
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:642) ~[?:?]
    at java.lang.Thread.run(Thread.java:1589) ~[?:?]
Caused by: io.grpc.StatusRuntimeException: NOT_FOUND: workflow execution already completed
    at io.grpc.stub.ClientCalls.toStatusRuntimeException(ClientCalls.java:271) ~[grpc-stub-1.55.1.jar:1.55.1]
    at io.grpc.stub.ClientCalls.getUnchecked(ClientCalls.java:252) ~[grpc-stub-1.55.1.jar:1.55.1]
    at io.grpc.stub.ClientCalls.blockingUnaryCall(ClientCalls.java:165) ~[grpc-stub-1.55.1.jar:1.55.1]
    at io.temporal.api.workflowservice.v1.WorkflowServiceGrpc$WorkflowServiceBlockingStub.recordActivityTaskHeartbeat(WorkflowServiceGrpc.java:3813) ~[temporal-serviceclient-1.17.0.jar:?]
    at io.temporal.internal.client.ActivityClientHelper.sendHeartbeatRequest(ActivityClientHelper.java:67) ~[temporal-sdk-1.17.0.jar:?]
    at io.temporal.internal.activity.HeartbeatContextImpl.sendHeartbeatRequest(HeartbeatContextImpl.java:185) ~[temporal-sdk-1.17.0.jar:?]
    ... 8 more

Contribute

dipeshkd commented 1 year ago

I ran the mssql discover docker with below

docker run --init -i -w /data/932356e1-1114-49ba-997d-0d105b08c306/0 --log-driver json-file --name source-mssql-discover-test --network host -v airbyte_workspace:/data -v /tmp/airbyte_local:/local -e DEPLOYMENT_MODE=OSS -e WORKER_CONNECTOR_IMAGE=airbyte/source-mssql:2.0.1 -e AUTO_DETECT_SCHEMA=true -e LAUNCHDARKLY_KEY= -e SOCAT_KUBE_CPU_REQUEST=0.1 -e SOCAT_KUBE_CPU_LIMIT=2.0 -e USE_STREAM_CAPABLE_STATE=true -e FIELD_SELECTION_WORKSPACES= -e AIRBYTE_ROLE= -e WORKER_ENVIRONMENT=DOCKER -e APPLY_FIELD_SELECTION=false -e WORKER_JOB_ATTEMPT=0 -e OTEL_COLLECTOR_ENDPOINT=http://host.docker.internal:4317 -e FEATURE_FLAG_CLIENT=config -e AIRBYTE_VERSION=0.50.28 -e WORKER_JOB_ID=932356e1-1114-49ba-997d-0d105b08c306 airbyte/source-mssql:2.0.1 discover --config source_config.json

here is the output.

It seems to be taking a long time to get the table columns. It took 6 mins to get the columns. Not sure why its taking so long.

2023-09-09 12:22:45 {"type":"LOG","log":{"level":"INFO","message":"INFO i.a.i.s.m.MssqlSource(main):569 starting source: class io.airbyte.integrations.source.mssql.MssqlSource"}}
2023-09-09 12:22:46 {"type":"LOG","log":{"level":"INFO","message":"INFO i.a.i.b.IntegrationCliParser(parseOptions):126 integration args: {discover=null, config=source_config.json}"}}
2023-09-09 12:22:46 {"type":"LOG","log":{"level":"INFO","message":"INFO i.a.i.b.IntegrationRunner(runInternal):130 Running integration: io.airbyte.integrations.base.ssh.SshWrappedSource"}}
2023-09-09 12:22:46 {"type":"LOG","log":{"level":"INFO","message":"INFO i.a.i.b.IntegrationRunner(runInternal):131 Command: DISCOVER"}}
2023-09-09 12:22:46 {"type":"LOG","log":{"level":"INFO","message":"INFO i.a.i.b.IntegrationRunner(runInternal):132 Integration config: IntegrationConfig{command=DISCOVER, configPath='source_config.json', catalogPath='null', statePath='null'}"}}
2023-09-09 12:22:46 {"type":"LOG","log":{"level":"WARN","message":"WARN c.n.s.JsonMetaSchema(newValidator):278 Unknown keyword order - you should define your own Meta Schema. If the keyword is irrelevant for validation, just use a NonValidationKeyword"}}
2023-09-09 12:22:46 {"type":"LOG","log":{"level":"WARN","message":"WARN c.n.s.JsonMetaSchema(newValidator):278 Unknown keyword airbyte_secret - you should define your own Meta Schema. If the keyword is irrelevant for validation, just use a NonValidationKeyword"}}
2023-09-09 12:22:46 {"type":"LOG","log":{"level":"WARN","message":"WARN c.n.s.JsonMetaSchema(newValidator):278 Unknown keyword display_type - you should define your own Meta Schema. If the keyword is irrelevant for validation, just use a NonValidationKeyword"}}
2023-09-09 12:22:46 {"type":"LOG","log":{"level":"INFO","message":"INFO i.a.i.b.s.SshTunnel(getInstance):204 Starting connection with method: NO_TUNNEL"}}
2023-09-09 12:22:46 {"type":"LOG","log":{"level":"INFO","message":"INFO c.z.h.HikariDataSource(<init>):80 HikariPool-1 - Starting..."}}
2023-09-09 12:22:46 {"type":"LOG","log":{"level":"INFO","message":"INFO c.z.h.HikariDataSource(<init>):82 HikariPool-1 - Start completed."}}
2023-09-09 12:22:47 {"type":"LOG","log":{"level":"INFO","message":"INFO i.a.i.s.j.AbstractJdbcSource(discoverInternal):170 Internal schemas to exclude: [spt_fallback_db, spt_monitor, cdc, spt_values, INFORMATION_SCHEMA, spt_fallback_usg, MSreplication_options, sys, spt_fallback_dev]"}}
2023-09-09 12:22:50 {"type":"LOG","log":{"level":"INFO","message":"INFO i.a.i.s.j.AbstractJdbcSource(discoverPrimaryKeys):277 Discover primary keys for tables: <LONG LIST OF TABLES>}}
2023-09-09 12:28:34 {"type":"LOG","log":{"level":"INFO","message":"INFO c.z.h.HikariDataSource(close):350 HikariPool-1 - Shutdown initiated..."}}
2023-09-09 12:28:34 {"type":"LOG","log":{"level":"INFO","message":"INFO c.z.h.HikariDataSource(close):352 HikariPool-1 - Shutdown completed."}}
2023-09-09 12:28:34 <LONG LIST OF COLUMNS>
2023-09-09 12:28:34 {"type":"LOG","log":{"level":"INFO","message":"INFO i.a.i.b.IntegrationRunner(runInternal):226 Completed integration: io.airbyte.integrations.base.ssh.SshWrappedSource"}}
2023-09-09 12:28:34 {"type":"LOG","log":{"level":"INFO","message":"INFO i.a.i.s.m.MssqlSource(main):571 completed source: class io.airbyte.integrations.source.mssql.MssqlSource"}}
dipeshkd commented 1 year ago

Hi, I downgraded airbyte to 0.50.20 and I am able to create a connection. MSSQL Connector version is 2.0.1 seems something happened from 0.50.20 to 0.50.24 where the discover command times out.

Is the there a flag to manually set the timeout duration?

tjsoliveira commented 1 year ago

I am experiencing the same issue with MSSQL and Salesforce. @dipeshkd, how did you downgrade Airbyte? I recently updated my Airbyte to version 0.50.29, and I'm not sure if I can revert to a previous version.

dipeshkd commented 1 year ago

Hi @tjsoliveira,

Under Airbyte deployment, you have to execute this command to run Airbyte. ./run-ab-platform.sh

This is where you change the version by editing the VERSION at the top of the page.

Once you have done this you can follow the upgrade steps

Here are the complete steps that I did:

  1. Change version on ./run-ab-platform.sh to VERSION=0.50.20 (or any other release versions that you need)
  2. If your instance is running, bring it down docker compose down
  3. Remove your docker-compose and .env files ./run-ab-platform.sh -r
  4. Then bring you instance back up for the required version ./run-ab-platform.sh -b
    • This will download the docker-compose.yml and .env files for your required version
    • if you have customized these files you can run ./run-ab-platform.sh -d and it will download them without running docker compose up

Warning You may have to reset your configuration if the Airbyte instance does not start properly. Which means redoing all sources, destination and connections

evantahler commented 6 months ago

Since this issue was written, we've dramatically revamped the connector. Please try your sync again with the latest connectors and platform!