airbytehq / airbyte

The leading data integration platform for ETL / ELT data pipelines from APIs, databases & files to data warehouses, data lakes & data lakehouses. Both self-hosted and Cloud-hosted.
https://airbyte.com
Other
15.53k stars 4k forks source link

[destination-clickhouse] is not syncing data to the main tables but only creating the internal tables #36199

Open bhaskar-pv opened 6 months ago

bhaskar-pv commented 6 months ago

Connector Name

destination-clickhouse

Connector Version

v1.0.0

What step the error happened?

During the sync

Relevant information

I am trying to fetch data from Jira to Clickhouse. Clickhouse created a database airbyte_internal but after that it didnt create tables into the database which i provided into the configuration. Also there is no error in the logs

Relevant log output

2024-03-15 17:24:11 platform > Cloud storage job log path: /workspace/9373761/0/logs.log
2024-03-15 17:24:14 INFO i.a.w.l.p.s.m.Stage(apply):39 - APPLY Stage: CLAIM — (workloadId = e4f2c611-28c5-411a-a8e2-f3007c434837_9373761_0_sync) — (dataplaneId = prod-dataplane-gcp-us-west3-0)
2024-03-15 17:24:39 INFO i.m.r.Micronaut(start):100 - Startup completed in 11409ms. Server Running: http://orchestrator-repl-job-9373761-attempt-0:9000
2024-03-15 17:24:46 replication-orchestrator > Writing async status INITIALIZING for KubePodInfo[namespace=jobs, name=orchestrator-repl-job-9373761-attempt-0, mainContainerInfo=KubeContainerInfo[image=airbyte/container-orchestrator:dev-7c09730061, pullPolicy=IfNotPresent]]...
2024-03-15 17:24:46 replication-orchestrator > sourceLauncherConfig is: io.airbyte.persistence.job.models.IntegrationLauncherConfig@5246453e[jobId=9373761,attemptId=0,connectionId=e4f2c611-28c5-411a-a8e2-f3007c434837,workspaceId=73b9a1d6-99e6-40c1-bbdd-60a479d677dd,dockerImage=airbyte/source-jira:1.1.0,normalizationDockerImage=<null>,supportsDbt=false,normalizationIntegrationType=<null>,protocolVersion=Version{version='0.2.0', major='0', minor='2', patch='0'},isCustomConnector=false,allowedHosts=io.airbyte.config.AllowedHosts@299f9a81[hosts=[team-odr5bnlmfc62.atlassian.net, *.datadoghq.com, *.datadoghq.eu, *.sentry.io],additionalProperties={}],additionalEnvironmentVariables=<null>,additionalLabels={connection_id=e4f2c611-28c5-411a-a8e2-f3007c434837, job_id=9373761, attempt_id=0, workspace_id=73b9a1d6-99e6-40c1-bbdd-60a479d677dd, airbyte=job-pod, mutex_key=e4f2c611-28c5-411a-a8e2-f3007c434837, workload_id=e4f2c611-28c5-411a-a8e2-f3007c434837_9373761_0_sync, auto_id=98d3b3f7-b650-4668-948c-155989ca7cff},priority=<null>,additionalProperties={}]
2024-03-15 17:24:46 replication-orchestrator > Attempt 0 to get the source definition for feature flag checks
2024-03-15 17:24:47 replication-orchestrator > Attempt 0 to get the source definition
2024-03-15 17:24:47 replication-orchestrator > Concurrent stream read enabled? false
2024-03-15 17:24:47 replication-orchestrator > Setting up source...
2024-03-15 17:24:47 replication-orchestrator > Using default value for environment variable SIDECAR_KUBE_MEMORY_LIMIT: '50Mi'
2024-03-15 17:24:47 replication-orchestrator > Using default value for environment variable SIDECAR_MEMORY_REQUEST: '25Mi'
2024-03-15 17:24:47 replication-orchestrator > Using default value for environment variable SIDECAR_KUBE_CPU_LIMIT: '2.0'
2024-03-15 17:24:47 replication-orchestrator > Using default value for environment variable SIDECAR_KUBE_CPU_REQUEST: '0.1'
2024-03-15 17:24:47 replication-orchestrator > Using default value for environment variable SIDECAR_KUBE_MEMORY_LIMIT: '50Mi'
2024-03-15 17:24:47 replication-orchestrator > Using default value for environment variable SIDECAR_MEMORY_REQUEST: '25Mi'
2024-03-15 17:24:47 replication-orchestrator > Using default value for environment variable SIDECAR_KUBE_CPU_LIMIT: '2.0'
2024-03-15 17:24:47 replication-orchestrator > Using default value for environment variable SIDECAR_KUBE_CPU_REQUEST: '0.1'
2024-03-15 17:24:47 replication-orchestrator > Using default value for environment variable SIDECAR_KUBE_MEMORY_LIMIT: '50Mi'
2024-03-15 17:24:47 replication-orchestrator > Using default value for environment variable SIDECAR_MEMORY_REQUEST: '25Mi'
2024-03-15 17:24:47 replication-orchestrator > Setting up destination...
2024-03-15 17:24:47 replication-orchestrator > Setting up replication worker...
2024-03-15 17:24:48 replication-orchestrator > Running replication worker...
2024-03-15 17:24:48 replication-orchestrator > start sync worker. job id: 9373761 attempt id: 0
2024-03-15 17:24:48 replication-orchestrator > 
2024-03-15 17:24:48 replication-orchestrator > configured sync modes: {null.application_roles=full_refresh - overwrite}
2024-03-15 17:24:48 replication-orchestrator > ----- START REPLICATION -----
2024-03-15 17:24:48 replication-orchestrator > 
2024-03-15 17:24:48 replication-orchestrator > Running destination...
2024-03-15 17:24:48 replication-orchestrator > Using default value for environment variable SIDECAR_KUBE_CPU_LIMIT: '2.0'
2024-03-15 17:24:48 replication-orchestrator > Using default value for environment variable SIDECAR_KUBE_CPU_REQUEST: '0.1'
2024-03-15 17:24:48 replication-orchestrator > Attempting to start pod = destination-clickhouse-strict-encrypt-write-9373761-0-ksddk for airbyte/destination-clickhouse-strict-encrypt:1.0.0 with resources ConnectorResourceRequirements[main=io.airbyte.config.ResourceRequirements@440309c5[cpuRequest=0.2,cpuLimit=1,memoryRequest=1Gi,memoryLimit=2Gi,additionalProperties={}], heartbeat=io.airbyte.config.ResourceRequirements@50c442a5[cpuRequest=0.05,cpuLimit=0.2,memoryRequest=25Mi,memoryLimit=50Mi,additionalProperties={}], stdErr=io.airbyte.config.ResourceRequirements@4eb313ed[cpuRequest=0.01,cpuLimit=0.5,memoryRequest=25Mi,memoryLimit=50Mi,additionalProperties={}], stdIn=io.airbyte.config.ResourceRequirements@3fc92211[cpuRequest=0.1,cpuLimit=1,memoryRequest=25Mi,memoryLimit=50Mi,additionalProperties={}], stdOut=io.airbyte.config.ResourceRequirements@63d8590c[cpuRequest=0.01,cpuLimit=0.5,memoryRequest=25Mi,memoryLimit=50Mi,additionalProperties={}]] and allowedHosts null
2024-03-15 17:24:48 replication-orchestrator > destination-clickhouse-strict-encrypt-write-9373761-0-ksddk stdoutLocalPort = 9877
2024-03-15 17:24:48 replication-orchestrator > destination-clickhouse-strict-encrypt-write-9373761-0-ksddk stderrLocalPort = 9878
2024-03-15 17:24:48 replication-orchestrator > Creating stdout socket server...
2024-03-15 17:24:48 replication-orchestrator > Creating stderr socket server...
2024-03-15 17:24:48 replication-orchestrator > Creating pod destination-clickhouse-strict-encrypt-write-9373761-0-ksddk...
2024-03-15 17:24:49 replication-orchestrator > Waiting for init container to be ready before copying files...
2024-03-15 17:24:50 replication-orchestrator > Init container ready..
2024-03-15 17:24:50 replication-orchestrator > Copying files...
2024-03-15 17:24:50 replication-orchestrator > Uploading file: destination_config.json
2024-03-15 17:24:50 replication-orchestrator > kubectl cp /tmp/662791f6-c92a-400a-99e1-caabc7d8702b/destination_config.json jobs/destination-clickhouse-strict-encrypt-write-9373761-0-ksddk:/config/destination_config.json -c init --retries=3
2024-03-15 17:24:50 replication-orchestrator > Waiting for kubectl cp to complete
2024-03-15 17:24:51 replication-orchestrator > kubectl cp complete, closing process
2024-03-15 17:24:51 replication-orchestrator > Uploading file: destination_catalog.json
2024-03-15 17:24:51 replication-orchestrator > kubectl cp /tmp/d6ccc111-c15f-4406-9d62-e948df633cb8/destination_catalog.json jobs/destination-clickhouse-strict-encrypt-write-9373761-0-ksddk:/config/destination_catalog.json -c init --retries=3
2024-03-15 17:24:51 replication-orchestrator > Waiting for kubectl cp to complete
2024-03-15 17:24:51 replication-orchestrator > kubectl cp complete, closing process
2024-03-15 17:24:51 replication-orchestrator > Uploading file: FINISHED_UPLOADING
2024-03-15 17:24:51 replication-orchestrator > kubectl cp /tmp/d9998e01-fd71-4e9d-9242-04ff91d01085/FINISHED_UPLOADING jobs/destination-clickhouse-strict-encrypt-write-9373761-0-ksddk:/config/FINISHED_UPLOADING -c init --retries=3
2024-03-15 17:24:51 replication-orchestrator > Waiting for kubectl cp to complete
2024-03-15 17:24:51 replication-orchestrator > kubectl cp complete, closing process
2024-03-15 17:24:51 replication-orchestrator > Waiting until pod is ready...
2024-03-15 17:24:52 replication-orchestrator > Setting stdout...
2024-03-15 17:24:52 replication-orchestrator > Setting stderr...
2024-03-15 17:24:53 replication-orchestrator > Reading pod IP...
2024-03-15 17:24:53 replication-orchestrator > Pod IP: 172.25.12.67
2024-03-15 17:24:53 replication-orchestrator > Creating stdin socket...
2024-03-15 17:24:53 replication-orchestrator > Writing messages to protocol version 0.2.0
2024-03-15 17:24:53 replication-orchestrator > Reading messages from protocol version 0.2.0
2024-03-15 17:24:53 replication-orchestrator > Using default value for environment variable SIDECAR_KUBE_CPU_LIMIT: '2.0'
2024-03-15 17:24:53 replication-orchestrator > Using default value for environment variable SIDECAR_KUBE_CPU_REQUEST: '0.1'
2024-03-15 17:24:53 replication-orchestrator > Attempting to start pod = source-jira-read-9373761-0-gtjhb for airbyte/source-jira:1.1.0 with resources ConnectorResourceRequirements[main=io.airbyte.config.ResourceRequirements@6d9f624[cpuRequest=0.2,cpuLimit=1,memoryRequest=1Gi,memoryLimit=2Gi,additionalProperties={}], heartbeat=io.airbyte.config.ResourceRequirements@50c442a5[cpuRequest=0.05,cpuLimit=0.2,memoryRequest=25Mi,memoryLimit=50Mi,additionalProperties={}], stdErr=io.airbyte.config.ResourceRequirements@6ce7d6ab[cpuRequest=0.01,cpuLimit=0.5,memoryRequest=25Mi,memoryLimit=50Mi,additionalProperties={}], stdIn=null, stdOut=io.airbyte.config.ResourceRequirements@49d658bf[cpuRequest=0.2,cpuLimit=1,memoryRequest=25Mi,memoryLimit=50Mi,additionalProperties={}]] and allowedHosts io.airbyte.config.AllowedHosts@299f9a81[hosts=[team-odr5bnlmfc62.atlassian.net, *.datadoghq.com, *.datadoghq.eu, *.sentry.io],additionalProperties={}]
2024-03-15 17:24:53 replication-orchestrator > source-jira-read-9373761-0-gtjhb stdoutLocalPort = 9879
2024-03-15 17:24:53 replication-orchestrator > source-jira-read-9373761-0-gtjhb stderrLocalPort = 9880
2024-03-15 17:24:53 replication-orchestrator > Creating stdout socket server...
2024-03-15 17:24:53 replication-orchestrator > Creating stderr socket server...
2024-03-15 17:24:53 replication-orchestrator > Creating pod source-jira-read-9373761-0-gtjhb...
2024-03-15 17:24:53 replication-orchestrator > Waiting for init container to be ready before copying files...
2024-03-15 17:24:54 replication-orchestrator > Init container ready..
2024-03-15 17:24:54 replication-orchestrator > Copying files...
2024-03-15 17:24:54 replication-orchestrator > Uploading file: input_state.json
2024-03-15 17:24:54 replication-orchestrator > kubectl cp /tmp/36967abf-2356-471f-8d36-4c4157ff323d/input_state.json jobs/source-jira-read-9373761-0-gtjhb:/config/input_state.json -c init --retries=3
2024-03-15 17:24:54 replication-orchestrator > Waiting for kubectl cp to complete
2024-03-15 17:24:54 replication-orchestrator > kubectl cp complete, closing process
2024-03-15 17:24:54 replication-orchestrator > Uploading file: source_config.json
2024-03-15 17:24:54 replication-orchestrator > kubectl cp /tmp/a2572251-05d6-4d3f-af65-822748ef696b/source_config.json jobs/source-jira-read-9373761-0-gtjhb:/config/source_config.json -c init --retries=3
2024-03-15 17:24:54 replication-orchestrator > Waiting for kubectl cp to complete
2024-03-15 17:24:54 replication-orchestrator > kubectl cp complete, closing process
2024-03-15 17:24:54 replication-orchestrator > Uploading file: source_catalog.json
2024-03-15 17:24:54 replication-orchestrator > kubectl cp /tmp/22b6db7a-b858-455d-b85b-086db2092657/source_catalog.json jobs/source-jira-read-9373761-0-gtjhb:/config/source_catalog.json -c init --retries=3
2024-03-15 17:24:54 replication-orchestrator > Waiting for kubectl cp to complete
2024-03-15 17:24:55 replication-orchestrator > kubectl cp complete, closing process
2024-03-15 17:24:55 replication-orchestrator > Uploading file: FINISHED_UPLOADING
2024-03-15 17:24:55 replication-orchestrator > kubectl cp /tmp/b007711c-84b4-498d-bccc-ab53065aeb49/FINISHED_UPLOADING jobs/source-jira-read-9373761-0-gtjhb:/config/FINISHED_UPLOADING -c init --retries=3
2024-03-15 17:24:55 replication-orchestrator > Waiting for kubectl cp to complete
2024-03-15 17:24:55 replication-orchestrator > kubectl cp complete, closing process
2024-03-15 17:24:55 replication-orchestrator > Waiting until pod is ready...
2024-03-15 17:24:55 replication-orchestrator > Setting stdout...
2024-03-15 17:24:55 replication-orchestrator > Setting stderr...
2024-03-15 17:24:56 replication-orchestrator > Reading pod IP...
2024-03-15 17:24:56 replication-orchestrator > Pod IP: 172.25.6.117
2024-03-15 17:24:56 replication-orchestrator > Using null stdin output stream...
2024-03-15 17:24:56 replication-orchestrator > Reading messages from protocol version 0.2.0
2024-03-15 17:24:56 replication-orchestrator > Writing async status RUNNING for KubePodInfo[namespace=jobs, name=orchestrator-repl-job-9373761-attempt-0, mainContainerInfo=KubeContainerInfo[image=airbyte/container-orchestrator:dev-7c09730061, pullPolicy=IfNotPresent]]...
2024-03-15 17:24:56 replication-orchestrator > Destination output thread started.
2024-03-15 17:24:56 replication-orchestrator > Replication thread started.
2024-03-15 17:24:56 replication-orchestrator > Starting source heartbeat check. Will check every 1 minutes.
2024-03-15 17:24:56 replication-orchestrator > Waiting for source and destination threads to complete.
2024-03-15 17:24:56 replication-orchestrator > Starting workload heartbeat
2024-03-15 17:24:56 destination > 2024-03-15T17:24:55,002`main`1`INFO`i.a.i.d.c.ClickhouseDestinationStrictEncrypt(main):34 - starting destination: class io.airbyte.integrations.destination.clickhouse.ClickhouseDestinationStrictEncrypt
2024-03-15 17:24:56 destination > 2024-03-15T17:24:55,377`main`1`INFO`i.a.c.i.b.IntegrationCliParser(parseOptions):126 - integration args: {catalog=destination_catalog.json, write=null, config=destination_config.json}
2024-03-15 17:24:56 destination > 2024-03-15T17:24:55,378`main`1`INFO`i.a.c.i.b.IntegrationRunner(runInternal):132 - Running integration: io.airbyte.integrations.destination.clickhouse.ClickhouseDestinationStrictEncrypt
2024-03-15 17:24:56 destination > 2024-03-15T17:24:55,379`main`1`INFO`i.a.c.i.b.IntegrationRunner(runInternal):133 - Command: WRITE
2024-03-15 17:24:56 destination > 2024-03-15T17:24:55,379`main`1`INFO`i.a.c.i.b.IntegrationRunner(runInternal):134 - Integration config: IntegrationConfig{command=WRITE, configPath='destination_config.json', catalogPath='destination_catalog.json', statePath='null'}
2024-03-15 17:24:56 destination > 2024-03-15T17:24:55,779`main`1`WARN`c.n.s.JsonMetaSchema(newValidator):278 - Unknown keyword order - you should define your own Meta Schema. If the keyword is irrelevant for validation, just use a NonValidationKeyword
2024-03-15 17:24:56 destination > 2024-03-15T17:24:55,783`main`1`WARN`c.n.s.JsonMetaSchema(newValidator):278 - Unknown keyword airbyte_secret - you should define your own Meta Schema. If the keyword is irrelevant for validation, just use a NonValidationKeyword
2024-03-15 17:24:56 destination > 2024-03-15T17:24:55,808`main`1`INFO`i.a.c.i.b.s.SshWrappedDestination(getSerializedMessageConsumer):113 - No SSH connection options found, using defaults
2024-03-15 17:24:56 destination > 2024-03-15T17:24:56,491`main`1`INFO`i.a.c.i.b.s.SshTunnel(getInstance):252 - Starting connection with method: NO_TUNNEL
2024-03-15 17:24:56 destination > 2024-03-15T17:24:56,686`main`1`INFO`c.z.h.HikariDataSource(<init>):79 - HikariPool-1 - Starting...
2024-03-15 17:24:56 destination > 2024-03-15T17:24:56,702`main`1`INFO`c.z.h.HikariDataSource(<init>):81 - HikariPool-1 - Start completed.
2024-03-15 17:24:56 destination > 2024-03-15T17:24:56,872`main`1`INFO`i.a.c.i.d.j.JdbcBufferedConsumerFactory(lambda$toWriteConfig$0):122 - Write config: WriteConfig{streamName=application_roles, namespace=airbyte_data, outputSchemaName=airbyte_internal, tmpTableName=_airbyte_tmp_qqw_airbyte_data_raw__stream_application_roles, outputTableName=airbyte_data_raw__stream_application_roles, syncMode=overwrite}
2024-03-15 17:24:56 destination > 2024-03-15T17:24:56,885`main`1`INFO`i.a.c.i.d.b.BufferManager(<init>):53 - Max 'memory' available for buffer allocation 296 MB
2024-03-15 17:24:56 destination > 2024-03-15T17:24:56,895`pool-3-thread-1`17`INFO`i.a.c.i.d.b.BufferManager(printQueueInfo):118 - [ASYNC QUEUE INFO] Global: max: 296.96 MB, allocated: 10 MB (10.0 MB), % used: 0.03367428551701215 | State Manager memory usage: Allocated: 10 MB, Used: 0 bytes, percentage Used 0.000000
2024-03-15 17:24:56 destination > 2024-03-15T17:24:56,897`main`1`INFO`i.a.c.i.d.FlushWorkers(start):95 - Start async buffer supervisor
2024-03-15 17:24:56 destination > 2024-03-15T17:24:56,899`main`1`INFO`i.a.c.i.d.AsyncStreamConsumer(start):138 - class io.airbyte.cdk.integrations.destination_async.AsyncStreamConsumer started.
2024-03-15 17:24:56 destination > 2024-03-15T17:24:56,899`main`1`INFO`i.a.i.b.d.t.NoOpTyperDeduperWithV1V2Migrations(prepareTables):59 - Ensuring schemas exist for prepareTables with V1V2 migrations
2024-03-15 17:24:56 destination > 2024-03-15T17:24:56,899`pool-5-thread-1`19`INFO`i.a.c.i.d.FlushWorkers(printWorkerInfo):143 - [ASYNC WORKER INFO] Pool queue size: 0, Active threads: 0
2024-03-15 17:24:56 destination > 2024-03-15T17:24:56,971`main`1`WARN`i.a.i.b.d.t.NoOpTyperDeduperWithV1V2Migrations(prepareTables):79 - Could not prepare schemas or tables because this is not implemented for this destination, this should not be required for this destination to succeed
2024-03-15 17:24:56 destination > 2024-03-15T17:24:56,971`main`1`INFO`i.a.c.i.d.j.JdbcBufferedConsumerFactory(lambda$onStartFunction$1):165 - Preparing raw tables in destination started for 1 streams
2024-03-15 17:24:56 destination > 2024-03-15T17:24:56,971`main`1`INFO`i.a.c.i.d.j.JdbcBufferedConsumerFactory(lambda$onStartFunction$1):170 - Preparing raw table in destination started for stream application_roles. schema: airbyte_internal, table name: airbyte_data_raw__stream_application_roles
2024-03-15 17:24:57 source > Starting syncing SourceJira
2024-03-15 17:24:58 source > Marking stream application_roles as STARTED
2024-03-15 17:24:58 replication-orchestrator > Attempt 0 to stream status started null:application_roles
2024-03-15 17:24:58 source > Syncing stream: application_roles 
2024-03-15 17:24:58 source > Marking stream application_roles as RUNNING
2024-03-15 17:24:58 replication-orchestrator > Attempt 0 to update stream status running null:application_roles
2024-03-15 17:24:59 source > Read 2 records from application_roles stream
2024-03-15 17:24:59 source > Marking stream application_roles as STOPPED
2024-03-15 17:24:59 source > Finished syncing application_roles
2024-03-15 17:24:59 source > SourceJira runtimes:
Syncing stream application_roles 0:00:00.946772
2024-03-15 17:24:59 source > Finished syncing SourceJira
2024-03-15 17:24:59 replication-orchestrator > Source has no more messages, closing connection.
2024-03-15 17:24:59 replication-orchestrator > (pod: jobs / source-jira-read-9373761-0-gtjhb) - Closed all resources for pod
2024-03-15 17:24:59 replication-orchestrator > Total records read: 5 (2 KB)
2024-03-15 17:24:59 replication-orchestrator > Schema validation was performed to a max of 10 records with errors per stream.
2024-03-15 17:24:59 replication-orchestrator > One of source or destination thread complete. Waiting on the other.
2024-03-15 17:24:59 replication-orchestrator > thread status... heartbeat thread: false , replication thread: true
2024-03-15 17:24:59 replication-orchestrator > thread status... timeout thread: false , replication thread: true
2024-03-15 17:25:02 destination > 2024-03-15T17:25:02,072`main`1`INFO`i.a.c.i.d.j.JdbcBufferedConsumerFactory(lambda$onStartFunction$1):183 - Preparing raw tables in destination completed.
2024-03-15 17:25:02 destination > 2024-03-15T17:25:02,098`main`1`INFO`i.a.c.i.d.FlushWorkers(close):188 - Closing flush workers -- waiting for all buffers to flush
2024-03-15 17:25:02 destination > 2024-03-15T17:25:02,101`main`1`INFO`i.a.c.i.d.FlushWorkers(close):213 - REMAINING_BUFFERS_INFO
2024-03-15 17:25:02 destination >   Namespace: airbyte_data Stream: application_roles -- remaining records: 2
2024-03-15 17:25:02 destination > 2024-03-15T17:25:02,101`main`1`INFO`i.a.c.i.d.FlushWorkers(close):214 - Waiting for all streams to flush.
2024-03-15 17:25:02 destination > 2024-03-15T17:25:02,903`pool-6-thread-1`18`INFO`i.a.c.i.d.DetectStreamToFlush(getNextStreamToFlush):122 - flushing: trigger info: airbyte_data - application_roles, time trigger: false , size trigger: true current threshold b: 0 bytes, queue size b: 2.71 KB, penalty b: 0 bytes, after penalty b: 2.71 KB
2024-03-15 17:25:02 destination > 2024-03-15T17:25:02,905`pool-4-thread-1`28`INFO`i.a.c.i.d.FlushWorkers(lambda$flush$1):149 - Flush Worker (a177a) -- Worker picked up work.
2024-03-15 17:25:02 destination > 2024-03-15T17:25:02,906`pool-4-thread-1`28`INFO`i.a.c.i.d.FlushWorkers(lambda$flush$1):151 - Flush Worker (a177a) -- Attempting to read from queue namespace: airbyte_data, stream: application_roles.
2024-03-15 17:25:02 destination > 2024-03-15T17:25:02,907`pool-4-thread-1`28`INFO`i.a.c.i.d.GlobalMemoryManager(free):88 - Freeing 10482981 bytes..
2024-03-15 17:25:02 destination > 2024-03-15T17:25:02,910`pool-4-thread-1`28`INFO`i.a.c.i.d.FlushWorkers(lambda$flush$1):164 - Flush Worker (a177a) -- Batch contains: 2 records, 2.71 KB bytes.
2024-03-15 17:25:02 destination > 2024-03-15T17:25:02,911`pool-4-thread-1`28`INFO`i.a.i.d.c.ClickhouseSqlOperations(insertRecordsInternal):71 - actual size of batch: 2
2024-03-15 17:25:03 destination > 2024-03-15T17:25:03,102`main`1`INFO`i.a.c.i.d.FlushWorkers(close):217 - Closing flush workers -- all buffers flushed
2024-03-15 17:25:03 destination > 2024-03-15T17:25:03,102`main`1`INFO`i.a.c.i.d.GlobalMemoryManager(free):88 - Freeing 0 bytes..
2024-03-15 17:25:03 destination > 2024-03-15T17:25:03,103`main`1`INFO`i.a.c.i.d.FlushWorkers(close):225 - Closing flush workers -- supervisor shut down
2024-03-15 17:25:03 destination > 2024-03-15T17:25:03,103`main`1`INFO`i.a.c.i.d.FlushWorkers(close):227 - Closing flush workers -- Starting worker pool shutdown..
2024-03-15 17:25:04 destination > 2024-03-15T17:25:04,457`pool-4-thread-1`28`INFO`i.a.c.i.d.GlobalMemoryManager(free):88 - Freeing 0 bytes..
2024-03-15 17:25:04 destination > 2024-03-15T17:25:04,457`pool-4-thread-1`28`INFO`i.a.c.i.d.GlobalMemoryManager(free):88 - Freeing 2779 bytes..
2024-03-15 17:25:04 destination > 2024-03-15T17:25:04,458`pool-4-thread-1`28`INFO`i.a.c.i.d.FlushWorkers(lambda$flush$1):173 - Flush Worker (a177a) -- Worker finished flushing. Current queue size: 0
2024-03-15 17:25:04 destination > 2024-03-15T17:25:04,458`main`1`INFO`i.a.c.i.d.FlushWorkers(close):232 - Closing flush workers  -- workers shut down
2024-03-15 17:25:04 destination > 2024-03-15T17:25:04,459`main`1`INFO`i.a.c.i.d.b.BufferManager(close):92 - Buffers cleared..
2024-03-15 17:25:04 destination > 2024-03-15T17:25:04,460`main`1`INFO`i.a.i.b.d.t.NoOpTyperDeduperWithV1V2Migrations(typeAndDedupe):96 - Skipping TypeAndDedupe final
2024-03-15 17:25:04 destination > 2024-03-15T17:25:04,461`main`1`INFO`i.a.i.b.d.t.NoOpTyperDeduperWithV1V2Migrations(commitFinalTables):101 - Skipping commitFinalTables final
2024-03-15 17:25:04 destination > 2024-03-15T17:25:04,461`main`1`INFO`i.a.i.b.d.t.NoOpTyperDeduperWithV1V2Migrations(cleanup):106 - Cleaning Up type-and-dedupe thread pool
2024-03-15 17:25:04 destination > 2024-03-15T17:25:04,461`main`1`INFO`i.a.c.i.d.AsyncStreamConsumer(close):219 - class io.airbyte.cdk.integrations.destination_async.AsyncStreamConsumer closed
2024-03-15 17:25:04 destination > 2024-03-15T17:25:04,464`main`1`INFO`i.a.c.i.b.IntegrationRunner(runInternal):231 - Completed integration: io.airbyte.integrations.destination.clickhouse.ClickhouseDestinationStrictEncrypt
2024-03-15 17:25:04 destination > 2024-03-15T17:25:04,464`main`1`INFO`i.a.i.d.c.ClickhouseDestinationStrictEncrypt(main):36 - completed destination: class io.airbyte.integrations.destination.clickhouse.ClickhouseDestinationStrictEncrypt
2024-03-15 17:25:05 replication-orchestrator > (pod: jobs / destination-clickhouse-strict-encrypt-write-9373761-0-ksddk) - Closed all resources for pod
2024-03-15 17:25:05 replication-orchestrator > Source and destination threads complete.
2024-03-15 17:25:05 replication-orchestrator > Attempt 0 to update stream status complete null:application_roles
2024-03-15 17:25:05 replication-orchestrator > thread status... timeout thread: false , replication thread: true
2024-03-15 17:25:05 replication-orchestrator > sync summary: {
  "status" : "completed",
  "recordsSynced" : 0,
  "bytesSynced" : 0,
  "startTime" : 1710523488095,
  "endTime" : 1710523505694,
  "totalStats" : {
    "bytesCommitted" : 2435,
    "bytesEmitted" : 2435,
    "destinationStateMessagesEmitted" : 0,
    "destinationWriteEndTime" : 1710523505587,
    "destinationWriteStartTime" : 1710523488107,
    "meanSecondsBeforeSourceStateMessageEmitted" : 0,
    "maxSecondsBeforeSourceStateMessageEmitted" : 0,
    "maxSecondsBetweenStateMessageEmittedandCommitted" : 0,
    "meanSecondsBetweenStateMessageEmittedandCommitted" : 0,
    "recordsEmitted" : 2,
    "recordsCommitted" : 2,
    "replicationEndTime" : 1710523505684,
    "replicationStartTime" : 1710523488095,
    "sourceReadEndTime" : 1710523499622,
    "sourceReadStartTime" : 1710523493377,
    "sourceStateMessagesEmitted" : 0
  },
  "streamStats" : [ {
    "streamName" : "application_roles",
    "stats" : {
      "bytesCommitted" : 2435,
      "bytesEmitted" : 2435,
      "recordsEmitted" : 2,
      "recordsCommitted" : 2
    }
  } ]
}
2024-03-15 17:25:05 replication-orchestrator > failures: [ ]
2024-03-15 17:25:05 replication-orchestrator > 
2024-03-15 17:25:05 replication-orchestrator > ----- END REPLICATION -----
2024-03-15 17:25:05 replication-orchestrator > 
2024-03-15 17:25:07 replication-orchestrator > Returning output...
2024-03-15 17:25:07 replication-orchestrator > Writing async status SUCCEEDED for KubePodInfo[namespace=jobs, name=orchestrator-repl-job-9373761-attempt-0, mainContainerInfo=KubeContainerInfo[image=airbyte/container-orchestrator:dev-7c09730061, pullPolicy=IfNotPresent]]...
2024-03-15 17:24:42 INFO c.l.l.LDSLF4J$ChannelImpl(log):73 - Enabling streaming API
2024-03-15 17:24:42 INFO c.l.l.LDSLF4J$ChannelImpl(log):94 - Waiting up to 5000 milliseconds for LaunchDarkly client to start...
2024-03-15 17:24:45 INFO i.a.m.l.MetricClientFactory(initializeDatadogMetricClient):124 - Initializing DatadogMetricClient
2024-03-15 17:24:45 INFO i.a.m.l.DogStatsDMetricClient(initialize):52 - Starting DogStatsD client..
2024-03-15 17:25:07 INFO i.a.a.SegmentAnalyticsClient(close):223 - Closing Segment analytics client...
2024-03-15 17:25:07 INFO i.a.a.BlockingShutdownAnalyticsPlugin(waitForFlush):278 - Waiting for Segment analytic client to flush enqueued messages...
2024-03-15 17:25:07 INFO i.a.a.BlockingShutdownAnalyticsPlugin(waitForFlush):290 - Segment analytic client flush complete.
2024-03-15 17:25:07 INFO i.a.a.SegmentAnalyticsClient(close):227 - Segment analytics client closed.  No new events will be accepted.
2024-03-15 17:24:11 platform > Executing worker wrapper. Airbyte version: dev-7c09730061-cloud
2024-03-15 17:24:11 platform > Attempt 0 to save workflow id for cancellation
2024-03-15 17:24:11 platform > Creating workload e4f2c611-28c5-411a-a8e2-f3007c434837_9373761_0_sync
2024-03-15 17:24:14 platform > Unknown feature flag "workload.polling.interval"; returning default value
2024-03-15 17:24:14 platform > Workload e4f2c611-28c5-411a-a8e2-f3007c434837_9373761_0_sync is pending
2024-03-15 17:24:14 INFO i.a.w.l.c.WorkloadApiClient(claim):69 - Claimed: true for e4f2c611-28c5-411a-a8e2-f3007c434837_9373761_0_sync via API for prod-dataplane-gcp-us-west3-0
2024-03-15 17:24:14 INFO i.a.w.l.p.s.m.Stage(apply):39 - APPLY Stage: CHECK_STATUS — (workloadId = e4f2c611-28c5-411a-a8e2-f3007c434837_9373761_0_sync) — (dataplaneId = prod-dataplane-gcp-us-west3-0)
2024-03-15 17:24:14 INFO i.a.w.l.p.s.CheckStatusStage(applyStage):61 - No pod found running for workload e4f2c611-28c5-411a-a8e2-f3007c434837_9373761_0_sync
2024-03-15 17:24:14 INFO i.a.w.l.p.s.m.Stage(apply):39 - APPLY Stage: BUILD — (workloadId = e4f2c611-28c5-411a-a8e2-f3007c434837_9373761_0_sync) — (dataplaneId = prod-dataplane-gcp-us-west3-0)
2024-03-15 17:24:14 INFO i.a.a.c.AirbyteApiClient(retryWithJitterThrows):297 - Attempt 0 to retrieve the connection
2024-03-15 17:24:14 INFO i.a.a.c.AirbyteApiClient(retryWithJitterThrows):297 - Attempt 0 to retrieve the state
2024-03-15 17:24:15 INFO i.a.w.l.p.s.m.Stage(apply):39 - APPLY Stage: MUTEX — (workloadId = e4f2c611-28c5-411a-a8e2-f3007c434837_9373761_0_sync) — (dataplaneId = prod-dataplane-gcp-us-west3-0)
2024-03-15 17:24:15 INFO i.a.w.l.p.s.EnforceMutexStage(applyStage):55 - Mutex key: e4f2c611-28c5-411a-a8e2-f3007c434837 specified for workload: e4f2c611-28c5-411a-a8e2-f3007c434837_9373761_0_sync. Attempting to delete existing pods...
2024-03-15 17:24:15 INFO i.a.w.l.p.s.EnforceMutexStage(applyStage):67 - Mutex key: e4f2c611-28c5-411a-a8e2-f3007c434837 specified for workload: e4f2c611-28c5-411a-a8e2-f3007c434837_9373761_0_sync found no existing pods. Continuing...
2024-03-15 17:24:15 INFO i.a.w.l.p.s.m.Stage(apply):39 - APPLY Stage: LAUNCH — (workloadId = e4f2c611-28c5-411a-a8e2-f3007c434837_9373761_0_sync) — (dataplaneId = prod-dataplane-gcp-us-west3-0)
2024-03-15 17:24:56 INFO i.a.w.l.c.WorkloadApiClient(updateStatusToLaunched):54 - Attempting to update workload: e4f2c611-28c5-411a-a8e2-f3007c434837_9373761_0_sync to LAUNCHED.
2024-03-15 17:24:56 INFO i.a.w.l.p.h.SuccessHandler(accept):61 - Pipeline completed for workload: e4f2c611-28c5-411a-a8e2-f3007c434837_9373761_0_sync.
2024-03-15 17:25:14 platform > Workload e4f2c611-28c5-411a-a8e2-f3007c434837_9373761_0_sync has returned a terminal status of success.  Fetching output...
2024-03-15 17:25:14 platform > Replication output for workload e4f2c611-28c5-411a-a8e2-f3007c434837_9373761_0_sync : io.airbyte.config.ReplicationOutput@574b4a81[replicationAttemptSummary=io.airbyte.config.ReplicationAttemptSummary@3607ee90[status=completed,recordsSynced=0,bytesSynced=0,startTime=1710523488095,endTime=1710523505694,totalStats=io.airbyte.config.SyncStats@6438fd37[bytesCommitted=2435,bytesEmitted=2435,destinationStateMessagesEmitted=0,destinationWriteEndTime=1710523505587,destinationWriteStartTime=1710523488107,estimatedBytes=<null>,estimatedRecords=<null>,meanSecondsBeforeSourceStateMessageEmitted=0,maxSecondsBeforeSourceStateMessageEmitted=0,maxSecondsBetweenStateMessageEmittedandCommitted=0,meanSecondsBetweenStateMessageEmittedandCommitted=0,recordsEmitted=2,recordsCommitted=2,replicationEndTime=1710523505684,replicationStartTime=1710523488095,sourceReadEndTime=1710523499622,sourceReadStartTime=1710523493377,sourceStateMessagesEmitted=0,additionalProperties={}],streamStats=[io.airbyte.config.StreamSyncStats@2dfa2ffe[streamName=application_roles,streamNamespace=<null>,stats=io.airbyte.config.SyncStats@20e87782[bytesCommitted=2435,bytesEmitted=2435,destinationStateMessagesEmitted=<null>,destinationWriteEndTime=<null>,destinationWriteStartTime=<null>,estimatedBytes=<null>,estimatedRecords=<null>,meanSecondsBeforeSourceStateMessageEmitted=<null>,maxSecondsBeforeSourceStateMessageEmitted=<null>,maxSecondsBetweenStateMessageEmittedandCommitted=<null>,meanSecondsBetweenStateMessageEmittedandCommitted=<null>,recordsEmitted=2,recordsCommitted=2,replicationEndTime=<null>,replicationStartTime=<null>,sourceReadEndTime=<null>,sourceReadStartTime=<null>,sourceStateMessagesEmitted=<null>,additionalProperties={}],wasBackfilled=<null>,additionalProperties={}]],performanceMetrics=<null>,additionalProperties={}],state=<null>,outputCatalog=io.airbyte.protocol.models.ConfiguredAirbyteCatalog@29926e61[streams=[io.airbyte.protocol.models.ConfiguredAirbyteStream@ae0ff21[stream=io.airbyte.protocol.models.AirbyteStream@7699b45c[name=application_roles,jsonSchema={"type":"object","$schema":"http://json-schema.org/draft-07/schema#","properties":{"key":{"type":"string","description":"The key of the application role."},"name":{"type":"string","description":"The display name of the application role."},"groups":{"type":"array","items":{"type":"string"},"description":"The groups associated with the application role.","uniqueItems":true},"defined":{"type":"boolean","description":"Deprecated."},"platform":{"type":"boolean","description":"Indicates if the application role belongs to Jira platform (`jira-core`)."},"userCount":{"type":"integer","description":"The number of users counting against your license."},"groupDetails":{"type":["null","array"],"items":{"type":["null","object"]},"description":"Group Details"},"defaultGroups":{"type":"array","items":{"type":"string"},"description":"The groups that are granted default access for this application role.","uniqueItems":true},"numberOfSeats":{"type":"integer","description":"The maximum count of users on your license."},"remainingSeats":{"type":"integer","description":"The count of users remaining on your license."},"hasUnlimitedSeats":{"type":"boolean"},"selectedByDefault":{"type":"boolean","description":"Determines whether this application role should be selected by default on user creation."},"defaultGroupsDetails":{"type":["null","array"],"items":{"type":["null","object"],"properties":{"name":{"type":["null","string"]},"self":{"type":["null","string"]},"groupId":{"type":["null","string"]}}}},"userCountDescription":{"type":"string","description":"The [type of users](https://confluence.atlassian.com/x/lRW3Ng) being counted against your license."}},"description":"Details of an application role.","additionalProperties":true},supportedSyncModes=[full_refresh],sourceDefinedCursor=<null>,defaultCursorField=[],sourceDefinedPrimaryKey=[[key]],namespace=<null>,additionalProperties={}],syncMode=full_refresh,cursorField=[],destinationSyncMode=overwrite,primaryKey=[[key]],additionalProperties={}]],additionalProperties={}],failures=[],additionalProperties={}]

Contribute

AlexisSerneels commented 6 months ago

Same.

abhishekgahlot2 commented 5 months ago

This is happening to us as well the data is not being copied to main database.

Harshit-Zenskar commented 5 months ago

I have faced the same issue. Airbyte isn't even showing sync failed. Is it connector bug?

anthonator commented 5 months ago

I believe this is due to their rollout of Destinations V2. They seem to be pushing people to external orchestration systems. So I don't think this is a bug.

Here are some discussions I dug up that seem relevant.

https://github.com/airbytehq/airbyte/discussions/35339 https://github.com/airbytehq/airbyte/discussions/34860

From what I can see they seem to be focusing on E and L and pushing people to other platforms for T.

anthonator commented 5 months ago

Maybe @jbfbell, @rileybrook or @cgardens could shed some light on this?

abhishekgahlot2 commented 5 months ago

@anthonator however i tested with postgres destination the tables were created correctly in airbyte_internal database and main database where sync was suppose to happen but in case of clickhouse only airbyte_internal database tables were filled with data. no tables or data was present in main db specified in clickhouse destination

anthonator commented 5 months ago

@abhishekgahlot2 from my understanding each destination needs to implement normalization and the ClickHouse destination currently does not.

See https://github.com/airbytehq/airbyte/discussions/35339

jbfbell commented 5 months ago

@anthonator sorry for the delayed reply here but yes as of 1.0.0 we removed what we referred to as "normalization" or the creation of typed tables from Clickhouse. As You pointed out this was a result of the dv2 work. Normalization in its previous state was unmaintainable for us as a team and we are removing that previous implementation from the platform completely. While rolling out Dv2 to various destinations, this proved to be a time consuming process and we made the decision to pivot towards improving the underlying shared libraries. To put it another way, we would love to enable ourselves or the community to easily add a new v2 destination, but we are not there yet. However, we are actively working on getting there. Unfortunately Clickhouse fell on the other side of the cut line here.

Our hope was that by still moving the raw data rather than removing the Clickhouse connector completely, you could still build dbt models or other solutions on top of these tables.

While I understand this is likely not the response you're hoping for, thank you for bringing this up and contributing to that linked github discussion. It definitely helps with the prioritzation of this work.

abhishekgahlot2 commented 5 months ago

Are there any tools that i can use to convert the raw data to final tables meanwhile the support is coming for clickhouse in future.

Probably way to use the models generated by clickhouse and transform to final data.

anthonator commented 5 months ago

@abhishekgahlot2 they mention Airflow, Prefect and Dagster in https://github.com/airbytehq/airbyte/discussions/34860.

Also see https://airbyte.com/blog/integrating-airbyte-with-data-orchestrators-airflow-dagster-and-prefect

abhishekgahlot2 commented 5 months ago

Thank @anthonator gonna give it a try.

jesperbagge commented 5 months ago

Are there any tools that i can use to convert the raw data to final tables meanwhile the support is coming for clickhouse in future.

@abhishekgahlot2 ClickHouse comes with excellent JSONExtract-functions to parse the data from the column _airbyte_data. You can use these function when you query the data or use them in dbt tranformations.

@jbfbell Is there some kind of timeline when we can expect the ClickHouse connector to work as expected again?

abhishekgahlot2 commented 5 months ago

@jesperbagge jsonextract sounds like a good idea though i believe it will requires copying the whole data again because it won't support incremental append i believe or deduplication.

MeisterLone commented 3 months ago

@jbfbell Considering ClickHouse is virtually your only supported modern on-prem DB, I am surprised to see this connector isnt getting more attention. ClickHouse has seen broad adoption in the last couple of months everywhere we look.

o1lab commented 2 months ago

Oh same issue, ClickHouse is such a beast—it's disappointing to know normalization is not possible.

Our normalisation was straight forward and works with other DBs so seamlessly. JSONExtract as mentioned beats the purpose as we have quite a lot of tables and various sources too.

cc : Airbyte team @jbfbell

jesperbagge commented 2 months ago

JSONExtract as mentioned beats the purpose as we have quite a lot of tables and various sources too.

@o1lab Yeah, I came to that conclusion myself in the end for the same reasons. I downgraded to version 0.2.5 to at least have structured data.

Also, I'm a big fan of NocoDB!

MeisterLone commented 2 months ago

They should have atleast update the AB Cloud documentation for ClickHouse, at minimum. It is in a broken state as is

MeisterLone commented 2 months ago

I am in no position to contribute currently, but I'll share another insight here for when clickhouse gets some attention- currently even the internal tables do not append properly. I have had a test connection between stripe and clickhouse set up for several days now as well as the same connection with same schema set up between stripe and redshift. It seems that after sync'ing every 5 minutes for 5 days, the clickhouse internal raw tables plainly are missing some updates where the redshift matches stripe dashboard records perfectly. So just using JSONExtract functions on the clickhouse internal tables airbyte generates, is not going to be accurate.