The leading data integration platform for ETL / ELT data pipelines from APIs, databases & files to data warehouses, data lakes & data lakehouses. Both self-hosted and Cloud-hosted.
I synchronized the stream sponsored_product_ad_groups from source-amazon-ads to destination-clickhouse:0.2.5. In the raw table in the Destination,. The raw data json only contains the following 5 keys:
"adGroupId",
"name",
"campaignId",
"defaultBid",
"state"
and the JSON does not contain the follow:
extendedData.creationDateTime
extendedData.lastUpdateDateTime
extendedData.servingStatus
extendedData.servingStatusDetails
I can directly obtain these fields through the ads API V3.
or is any wrong in my source config (see the below image?)
Relevant log output
Logs: s_api_amazon_ads__to__d_db_clickhouse__daily
Attempt 1 of 1
3:14PM 01/05/2024
|
13.23 MB
|
92,701 records extracted
|
92,701 records loaded
|
Job id: 164
|
5m 24s
Search logs
2024-01-05 07:14:57 platform > Reading messages from protocol version 0.2.0
2024-01-05 07:14:57 platform > airbyte/destination-clickhouse:0.2.5 was found locally.
2024-01-05 07:14:57 platform > Creating docker container = destination-clickhouse-write-164-0-gpion with resources io.airbyte.config.ResourceRequirements@5a5e85a3[cpuRequest=,cpuLimit=,memoryRequest=,memoryLimit=,additionalProperties={}] and allowedHosts null
2024-01-05 07:14:57 platform > Preparing command: docker run --rm --init -i -w /data/164/0 --log-driver none --name destination-clickhouse-write-164-0-gpion --network host -v airbyte_workspace:/data -v /tmp/airbyte_local:/local -e DEPLOYMENT_MODE=OSS -e WORKER_CONNECTOR_IMAGE=airbyte/destination-clickhouse:0.2.5 -e AUTO_DETECT_SCHEMA=true -e LAUNCHDARKLY_KEY= -e SOCAT_KUBE_CPU_REQUEST=0.1 -e SOCAT_KUBE_CPU_LIMIT=2.0 -e FIELD_SELECTION_WORKSPACES= -e USE_STREAM_CAPABLE_STATE=true -e WORKER_ENVIRONMENT=DOCKER -e AIRBYTE_ROLE= -e APPLY_FIELD_SELECTION=false -e WORKER_JOB_ATTEMPT=0 -e OTEL_COLLECTOR_ENDPOINT=http://host.docker.internal:4317 -e FEATURE_FLAG_CLIENT=config -e AIRBYTE_VERSION=0.50.35 -e WORKER_JOB_ID=164 airbyte/destination-clickhouse:0.2.5 write --config destination_config.json --catalog destination_catalog.json
2024-01-05 07:14:57 platform > Writing messages to protocol version 0.2.0
2024-01-05 07:14:57 platform > Reading messages from protocol version 0.2.0
2024-01-05 07:14:57 platform > readFromSource: start
2024-01-05 07:14:57 platform > Starting source heartbeat check. Will check every 1 minutes.
2024-01-05 07:14:57 platform > processMessage: start
2024-01-05 07:14:57 platform > readFromDestination: start
2024-01-05 07:14:57 platform > writeToDestination: start
2024-01-05 07:15:00 source > Starting syncing SourceAmazonAds
2024-01-05 07:15:02 source > Marking stream sponsored_product_ad_groups as STARTED
2024-01-05 07:15:02 source > Syncing stream: sponsored_product_ad_groups
2024-01-05 07:15:02 source > Marking stream sponsored_product_ad_groups as RUNNING
2024-01-05 07:15:05 destination > INFO i.a.i.d.c.ClickhouseDestination(main):113 starting destination: class io.airbyte.integrations.destination.clickhouse.ClickhouseDestination
2024-01-05 07:15:06 destination > INFO i.a.i.b.IntegrationCliParser(parseOptions):126 integration args: {catalog=destination_catalog.json, write=null, config=destination_config.json}
2024-01-05 07:15:06 destination > INFO i.a.i.b.IntegrationRunner(runInternal):106 Running integration: io.airbyte.integrations.base.ssh.SshWrappedDestination
2024-01-05 07:15:06 destination > INFO i.a.i.b.IntegrationRunner(runInternal):107 Command: WRITE
2024-01-05 07:15:06 destination > INFO i.a.i.b.IntegrationRunner(runInternal):108 Integration config: IntegrationConfig{command=WRITE, configPath='destination_config.json', catalogPath='destination_catalog.json', statePath='null'}
2024-01-05 07:15:07 destination > WARN c.n.s.JsonMetaSchema(newValidator):278 Unknown keyword order - you should define your own Meta Schema. If the keyword is irrelevant for validation, just use a NonValidationKeyword
2024-01-05 07:15:07 destination > WARN c.n.s.JsonMetaSchema(newValidator):278 Unknown keyword airbyte_secret - you should define your own Meta Schema. If the keyword is irrelevant for validation, just use a NonValidationKeyword
2024-01-05 07:15:08 destination > INFO i.a.i.b.s.SshTunnel(getInstance):204 Starting connection with method: NO_TUNNEL
2024-01-05 07:15:08 destination > INFO c.z.h.HikariDataSource(<init>):80 HikariPool-1 - Starting...
2024-01-05 07:15:08 destination > INFO c.z.h.HikariDataSource(<init>):82 HikariPool-1 - Start completed.
2024-01-05 07:15:09 destination > INFO i.a.i.d.j.JdbcBufferedConsumerFactory(lambda$toWriteConfig$0):103 Write config: WriteConfig{streamName=amz_ads_sponsored_product_ad_groups, namespace=airbyte_ads, outputSchemaName=airbyte_ads, tmpTableName=_airbyte_tmp_boj_amz_ads_sponsored_product_ad_groups, outputTableName=_airbyte_raw_amz_ads_sponsored_product_ad_groups, syncMode=overwrite}
2024-01-05 07:15:09 destination > INFO i.a.i.d.b.BufferedStreamConsumer(startTracked):144 class io.airbyte.integrations.destination.buffered_stream_consumer.BufferedStreamConsumer started.
2024-01-05 07:15:09 destination > INFO i.a.i.d.j.JdbcBufferedConsumerFactory(lambda$onStartFunction$1):142 Preparing raw tables in destination started for 1 streams
2024-01-05 07:15:09 destination > INFO i.a.i.d.j.JdbcBufferedConsumerFactory(lambda$onStartFunction$1):147 Preparing raw table in destination started for stream amz_ads_sponsored_product_ad_groups. schema: airbyte_ads, table name: _airbyte_raw_amz_ads_sponsored_product_ad_groups
2024-01-05 07:15:11 destination > INFO i.a.i.d.j.JdbcBufferedConsumerFactory(lambda$onStartFunction$1):160 Preparing raw tables in destination completed.
2024-01-05 07:15:16 platform > Records read: 5000 (663 KB)
2024-01-05 07:15:30 platform > Records read: 10000 (1 MB)
2024-01-05 07:15:46 platform > Records read: 15000 (2 MB)
2024-01-05 07:16:00 platform > Records read: 20000 (3 MB)
2024-01-05 07:16:14 platform > Records read: 25000 (3 MB)
2024-01-05 07:16:28 platform > Records read: 30000 (4 MB)
2024-01-05 07:16:43 platform > Records read: 35000 (5 MB)
2024-01-05 07:16:58 platform > Records read: 40000 (5 MB)
2024-01-05 07:17:06 destination > INFO i.a.i.d.r.InMemoryRecordBufferingStrategy(flushAllBuffers):85 Flushing amz_ads_sponsored_product_ad_groups: 42700 records (24 MB)
2024-01-05 07:17:06 destination > INFO i.a.i.d.c.ClickhouseSqlOperations(insertRecordsInternal):67 actual size of batch: 42700
2024-01-05 07:17:11 destination > INFO i.a.i.d.r.InMemoryRecordBufferingStrategy(flushAllBuffers):91 Flushing completed for amz_ads_sponsored_product_ad_groups
2024-01-05 07:17:13 platform > Records read: 45000 (6 MB)
2024-01-05 07:17:30 platform > Records read: 50000 (7 MB)
2024-01-05 07:17:46 platform > Records read: 55000 (7 MB)
2024-01-05 07:18:02 platform > Records read: 60000 (8 MB)
2024-01-05 07:18:18 platform > Records read: 65000 (9 MB)
2024-01-05 07:18:35 platform > Records read: 70000 (10 MB)
2024-01-05 07:18:53 platform > Records read: 75000 (10 MB)
2024-01-05 07:19:10 platform > Records read: 80000 (11 MB)
2024-01-05 07:19:30 platform > Records read: 85000 (12 MB)
2024-01-05 07:19:40 destination > INFO i.a.i.d.r.InMemoryRecordBufferingStrategy(flushAllBuffers):85 Flushing amz_ads_sponsored_product_ad_groups: 44912 records (24 MB)
2024-01-05 07:19:40 destination > INFO i.a.i.d.c.ClickhouseSqlOperations(insertRecordsInternal):67 actual size of batch: 44912
2024-01-05 07:19:42 destination > INFO i.a.i.d.r.InMemoryRecordBufferingStrategy(flushAllBuffers):91 Flushing completed for amz_ads_sponsored_product_ad_groups
2024-01-05 07:19:49 platform > Records read: 90000 (12 MB)
2024-01-05 07:20:00 source > Read 92701 records from sponsored_product_ad_groups stream
2024-01-05 07:20:00 source > Marking stream sponsored_product_ad_groups as STOPPED
2024-01-05 07:20:00 source > Finished syncing sponsored_product_ad_groups
2024-01-05 07:20:00 source > SourceAmazonAds runtimes:
Syncing stream sponsored_product_ad_groups 0:04:58.244143
2024-01-05 07:20:00 source > Finished syncing SourceAmazonAds
2024-01-05 07:20:01 platform > Total records read: 92704 (13 MB)
2024-01-05 07:20:01 platform > Schema validation was performed to a max of 10 records with errors per stream.
2024-01-05 07:20:01 platform > readFromSource: done. (source.isFinished:true, fromSource.isClosed:false)
2024-01-05 07:20:01 platform > thread status... heartbeat thread: false , replication thread: true
2024-01-05 07:20:01 platform > processMessage: done. (fromSource.isDone:true, forDest.isClosed:false)
2024-01-05 07:20:01 platform > writeToDestination: done. (forDest.isDone:true, isDestRunning:true)
Connector Name
source-amazon-ads
Connector Version
3.4.2
What step the error happened?
During the sync
Relevant information
I synchronized the stream sponsored_product_ad_groups from source-amazon-ads to destination-clickhouse:0.2.5. In the raw table in the Destination,. The raw data json only contains the following 5 keys: "adGroupId", "name", "campaignId", "defaultBid", "state"
and the JSON does not contain the follow: extendedData.creationDateTime extendedData.lastUpdateDateTime extendedData.servingStatus extendedData.servingStatusDetails
I can directly obtain these fields through the ads API V3.
or is any wrong in my source config (see the below image?)
Relevant log output
Contribute