airbytehq / airbyte

The leading data integration platform for ETL / ELT data pipelines from APIs, databases & files to data warehouses, data lakes & data lakehouses. Both self-hosted and Cloud-hosted.
https://airbyte.com
Other
15.53k stars 4k forks source link

[source-GCS] Full Refresh - Overwrite sync resulted in empty destination #44497

Open m-ronchi opened 1 month ago

m-ronchi commented 1 month ago

Connector Name

source-gcs

Connector Version

0.4.15

What step the error happened?

None

Relevant information

I setup a GCS -> AWS Datalake connection with full refresh / overwrite on all streams.

the first sync (and each sync after a reset) results in the data correctly replicated. but subsequent syncs will only read new data from the source, as if I setup incremental, and therefore wiping all old data from the destination

image

Relevant log output

2024-08-21 00:01:30 platform > Cloud storage job log path: /workspace/148941/0/logs.log
2024-08-21 00:01:42 INFO i.m.r.Micronaut(lambda$start$2):98 - Startup completed in 1206ms. Server Running: http://orchestrator-repl-job-148941-attempt-0:9000
2024-08-21 00:01:43 replication-orchestrator > Writing async status INITIALIZING for KubePodInfo[namespace=airbyte, name=orchestrator-repl-job-148941-attempt-0, mainContainerInfo=KubeContainerInfo[image=airbyte/container-orchestrator:0.50.50, pullPolicy=IfNotPresent]]...
2024-08-21 00:01:42 INFO i.a.f.ConfigFileClient(<init>):105 - path /flags does not exist, will return default flag values
2024-08-21 00:01:43 INFO i.a.c.EnvConfigs(getEnvOrDefault):694 - Using default value for environment variable STATE_STORAGE_S3_ACCESS_KEY: ''
2024-08-21 00:01:43 INFO i.a.c.EnvConfigs(getEnvOrDefault):694 - Using default value for environment variable STATE_STORAGE_S3_SECRET_ACCESS_KEY: ''
2024-08-21 00:01:43 INFO i.a.m.l.MetricClientFactory(initializeOpenTelemetryMetricClient):133 - Initializing OpenTelemetryMetricClient
2024-08-21 00:02:22 INFO i.a.a.SegmentAnalyticsClient(close):226 - Closing Segment analytics client...
2024-08-21 00:02:22 INFO i.a.a.BlockingShutdownAnalyticsPlugin(waitForFlush):281 - Waiting for Segment analytic client to flush enqueued messages...
2024-08-21 00:02:22 INFO i.a.a.BlockingShutdownAnalyticsPlugin(waitForFlush):293 - Segment analytic client flush complete.
2024-08-21 00:02:22 INFO i.a.a.SegmentAnalyticsClient(close):230 - Segment analytics client closed.  No new events will be accepted.
2024-08-21 00:01:43 replication-orchestrator > sourceLauncherConfig is: io.airbyte.persistence.job.models.IntegrationLauncherConfig@1e9d7366[jobId=148941,attemptId=0,connectionId=ee323115-263c-4210-be0a-b6f083cfa30e,workspaceId=9cbed2b3-bcdd-4542-8bae-95ecef8edc7b,dockerImage=airbyte/source-gcs:0.4.15,normalizationDockerImage=<null>,supportsDbt=false,normalizationIntegrationType=<null>,protocolVersion=Version{version='0.2.0', major='0', minor='2', patch='0'},isCustomConnector=false,allowedHosts=<null>,additionalEnvironmentVariables=<null>,additionalLabels=<null>,additionalProperties={}]
2024-08-21 00:01:43 replication-orchestrator > Attempt 0 to get the source definition for feature flag checks
2024-08-21 00:01:44 replication-orchestrator > Attempt 0 to get the source definition
2024-08-21 00:01:44 replication-orchestrator > Concurrent stream read enabled? false
2024-08-21 00:01:44 replication-orchestrator > Setting up source...
2024-08-21 00:01:44 replication-orchestrator > Using default value for environment variable STATE_STORAGE_S3_ACCESS_KEY: ''
2024-08-21 00:01:44 replication-orchestrator > Using default value for environment variable STATE_STORAGE_S3_SECRET_ACCESS_KEY: ''
2024-08-21 00:01:44 replication-orchestrator > Using default value for environment variable SIDECAR_KUBE_MEMORY_LIMIT: '50Mi'
2024-08-21 00:01:44 replication-orchestrator > Using default value for environment variable SIDECAR_MEMORY_REQUEST: '25Mi'
2024-08-21 00:01:44 replication-orchestrator > Using default value for environment variable SIDECAR_KUBE_CPU_LIMIT: '2.0'
2024-08-21 00:01:44 replication-orchestrator > Using default value for environment variable SIDECAR_KUBE_CPU_REQUEST: '0.1'
2024-08-21 00:01:44 replication-orchestrator > Using default value for environment variable SIDECAR_KUBE_MEMORY_LIMIT: '50Mi'
2024-08-21 00:01:44 replication-orchestrator > Using default value for environment variable SIDECAR_MEMORY_REQUEST: '25Mi'
2024-08-21 00:01:44 replication-orchestrator > Using default value for environment variable SIDECAR_KUBE_CPU_LIMIT: '2.0'
2024-08-21 00:01:44 replication-orchestrator > Using default value for environment variable SIDECAR_KUBE_CPU_REQUEST: '0.1'
2024-08-21 00:01:44 replication-orchestrator > Using default value for environment variable SIDECAR_KUBE_MEMORY_LIMIT: '50Mi'
2024-08-21 00:01:44 replication-orchestrator > Using default value for environment variable SIDECAR_MEMORY_REQUEST: '25Mi'
2024-08-21 00:01:44 replication-orchestrator > Setting up destination...
2024-08-21 00:01:44 replication-orchestrator > Setting up replication worker...
2024-08-21 00:01:44 replication-orchestrator > Running replication worker...
2024-08-21 00:01:44 replication-orchestrator > start sync worker. job id: 148941 attempt id: 0
2024-08-21 00:01:44 replication-orchestrator > 
2024-08-21 00:01:44 replication-orchestrator > ----- START REPLICATION -----
2024-08-21 00:01:44 replication-orchestrator > 
2024-08-21 00:01:44 replication-orchestrator > Using default value for environment variable STATE_STORAGE_S3_ACCESS_KEY: ''
2024-08-21 00:01:44 replication-orchestrator > Using default value for environment variable STATE_STORAGE_S3_SECRET_ACCESS_KEY: ''
2024-08-21 00:01:44 replication-orchestrator > Using default value for environment variable SIDECAR_KUBE_CPU_LIMIT: '2.0'
2024-08-21 00:01:44 replication-orchestrator > Using default value for environment variable SIDECAR_KUBE_CPU_REQUEST: '0.1'
2024-08-21 00:01:44 replication-orchestrator > Using default value for environment variable LAUNCHDARKLY_KEY: ''
2024-08-21 00:01:44 replication-orchestrator > Using default value for environment variable FEATURE_FLAG_CLIENT: ''
2024-08-21 00:01:44 replication-orchestrator > Running destination...
2024-08-21 00:01:44 replication-orchestrator > Using default value for environment variable STATE_STORAGE_S3_ACCESS_KEY: ''
2024-08-21 00:01:44 replication-orchestrator > Using default value for environment variable STATE_STORAGE_S3_SECRET_ACCESS_KEY: ''
2024-08-21 00:01:44 replication-orchestrator > Using default value for environment variable SIDECAR_KUBE_CPU_LIMIT: '2.0'
2024-08-21 00:01:44 replication-orchestrator > Using default value for environment variable SIDECAR_KUBE_CPU_REQUEST: '0.1'
2024-08-21 00:01:44 replication-orchestrator > Using default value for environment variable LAUNCHDARKLY_KEY: ''
2024-08-21 00:01:44 replication-orchestrator > Using default value for environment variable FEATURE_FLAG_CLIENT: ''
2024-08-21 00:01:44 replication-orchestrator > Attempting to start pod = source-gcs-read-148941-0-alrjt for airbyte/source-gcs:0.4.15 with resources ConnectorResourceRequirements[main=io.airbyte.config.ResourceRequirements@56c5014[cpuRequest=2000m,cpuLimit=5000m,memoryRequest=512Mi,memoryLimit=1Gi,additionalProperties={}], heartbeat=io.airbyte.config.ResourceRequirements@42782ee3[cpuRequest=0.05,cpuLimit=0.2,memoryRequest=25Mi,memoryLimit=50Mi,additionalProperties={}], stdErr=io.airbyte.config.ResourceRequirements@1869ecde[cpuRequest=0.01,cpuLimit=0.5,memoryRequest=25Mi,memoryLimit=50Mi,additionalProperties={}], stdIn=null, stdOut=io.airbyte.config.ResourceRequirements@7cd2c0e[cpuRequest=0.5,cpuLimit=1,memoryRequest=25Mi,memoryLimit=50Mi,additionalProperties={}]] and allowedHosts null
2024-08-21 00:01:44 replication-orchestrator > Attempting to start pod = destination-aws-datalake-write-148941-0-ykqgs for airbyte/destination-aws-datalake:0.1.23 with resources ConnectorResourceRequirements[main=io.airbyte.config.ResourceRequirements@2fbbf1cb[cpuRequest=2000m,cpuLimit=5000m,memoryRequest=512Mi,memoryLimit=1Gi,additionalProperties={}], heartbeat=io.airbyte.config.ResourceRequirements@42782ee3[cpuRequest=0.05,cpuLimit=0.2,memoryRequest=25Mi,memoryLimit=50Mi,additionalProperties={}], stdErr=io.airbyte.config.ResourceRequirements@76c93626[cpuRequest=0.01,cpuLimit=0.5,memoryRequest=25Mi,memoryLimit=50Mi,additionalProperties={}], stdIn=io.airbyte.config.ResourceRequirements@2c1071b4[cpuRequest=0.5,cpuLimit=1,memoryRequest=25Mi,memoryLimit=50Mi,additionalProperties={}], stdOut=io.airbyte.config.ResourceRequirements@42ca35a2[cpuRequest=0.01,cpuLimit=0.5,memoryRequest=25Mi,memoryLimit=50Mi,additionalProperties={}]] and allowedHosts null
2024-08-21 00:01:44 replication-orchestrator > destination-aws-datalake-write-148941-0-ykqgs stdoutLocalPort = 9879
2024-08-21 00:01:44 replication-orchestrator > destination-aws-datalake-write-148941-0-ykqgs stderrLocalPort = 9880
2024-08-21 00:01:44 replication-orchestrator > source-gcs-read-148941-0-alrjt stdoutLocalPort = 9877
2024-08-21 00:01:44 replication-orchestrator > source-gcs-read-148941-0-alrjt stderrLocalPort = 9878
2024-08-21 00:01:44 replication-orchestrator > Using default value for environment variable STATE_STORAGE_S3_ACCESS_KEY: ''
2024-08-21 00:01:44 replication-orchestrator > Using default value for environment variable STATE_STORAGE_S3_SECRET_ACCESS_KEY: ''
2024-08-21 00:01:44 replication-orchestrator > Using default value for environment variable SYNC_JOB_INIT_RETRY_TIMEOUT_MINUTES: '5'
2024-08-21 00:01:44 replication-orchestrator > Creating stdout socket server...
2024-08-21 00:01:44 replication-orchestrator > Creating stderr socket server...
2024-08-21 00:01:44 replication-orchestrator > Creating stdout socket server...
2024-08-21 00:01:44 replication-orchestrator > Creating stderr socket server...
2024-08-21 00:01:44 replication-orchestrator > Creating pod source-gcs-read-148941-0-alrjt...
2024-08-21 00:01:44 replication-orchestrator > Creating pod destination-aws-datalake-write-148941-0-ykqgs...
2024-08-21 00:01:44 replication-orchestrator > Waiting for init container to be ready before copying files...
2024-08-21 00:01:44 replication-orchestrator > Waiting for init container to be ready before copying files...
2024-08-21 00:01:51 replication-orchestrator > Init container ready..
2024-08-21 00:01:51 replication-orchestrator > Copying files...
2024-08-21 00:01:51 replication-orchestrator > Uploading file: destination_config.json
2024-08-21 00:01:51 replication-orchestrator > kubectl cp /tmp/23259feb-cc9d-4f1a-924e-d63b0c1a09ab/destination_config.json airbyte/destination-aws-datalake-write-148941-0-ykqgs:/config/destination_config.json -c init --retries=3
2024-08-21 00:01:51 replication-orchestrator > Waiting for kubectl cp to complete
2024-08-21 00:01:51 replication-orchestrator > kubectl cp complete, closing process
2024-08-21 00:01:51 replication-orchestrator > Uploading file: destination_catalog.json
2024-08-21 00:01:51 replication-orchestrator > kubectl cp /tmp/c9617797-60b0-4c23-9954-aca93c58e408/destination_catalog.json airbyte/destination-aws-datalake-write-148941-0-ykqgs:/config/destination_catalog.json -c init --retries=3
2024-08-21 00:01:51 replication-orchestrator > Waiting for kubectl cp to complete
2024-08-21 00:01:51 replication-orchestrator > kubectl cp complete, closing process
2024-08-21 00:01:51 replication-orchestrator > Uploading file: FINISHED_UPLOADING
2024-08-21 00:01:51 replication-orchestrator > kubectl cp /tmp/4533c0e2-2333-4d37-8eb9-45009b01df5b/FINISHED_UPLOADING airbyte/destination-aws-datalake-write-148941-0-ykqgs:/config/FINISHED_UPLOADING -c init --retries=3
2024-08-21 00:01:51 replication-orchestrator > Waiting for kubectl cp to complete
2024-08-21 00:01:51 replication-orchestrator > kubectl cp complete, closing process
2024-08-21 00:01:51 replication-orchestrator > Waiting until pod is ready...
2024-08-21 00:01:55 replication-orchestrator > Init container ready..
2024-08-21 00:01:55 replication-orchestrator > Copying files...
2024-08-21 00:01:55 replication-orchestrator > Uploading file: input_state.json
2024-08-21 00:01:55 replication-orchestrator > kubectl cp /tmp/450ad0f1-2235-4440-b4c1-856abdbf4a6d/input_state.json airbyte/source-gcs-read-148941-0-alrjt:/config/input_state.json -c init --retries=3
2024-08-21 00:01:55 replication-orchestrator > Waiting for kubectl cp to complete
2024-08-21 00:01:55 replication-orchestrator > kubectl cp complete, closing process
2024-08-21 00:01:55 replication-orchestrator > Uploading file: source_config.json
2024-08-21 00:01:55 replication-orchestrator > kubectl cp /tmp/98264415-b74e-45ac-b4f3-22e02d218fed/source_config.json airbyte/source-gcs-read-148941-0-alrjt:/config/source_config.json -c init --retries=3
2024-08-21 00:01:55 replication-orchestrator > Waiting for kubectl cp to complete
2024-08-21 00:01:55 replication-orchestrator > kubectl cp complete, closing process
2024-08-21 00:01:55 replication-orchestrator > Uploading file: source_catalog.json
2024-08-21 00:01:55 replication-orchestrator > kubectl cp /tmp/2155ca37-a8b2-46a2-a648-dafc3b33b434/source_catalog.json airbyte/source-gcs-read-148941-0-alrjt:/config/source_catalog.json -c init --retries=3
2024-08-21 00:01:55 replication-orchestrator > Waiting for kubectl cp to complete
2024-08-21 00:01:56 replication-orchestrator > kubectl cp complete, closing process
2024-08-21 00:01:56 replication-orchestrator > Uploading file: FINISHED_UPLOADING
2024-08-21 00:01:56 replication-orchestrator > kubectl cp /tmp/1791dd06-2d45-4ff1-95b6-7836323fd3f3/FINISHED_UPLOADING airbyte/source-gcs-read-148941-0-alrjt:/config/FINISHED_UPLOADING -c init --retries=3
2024-08-21 00:01:56 replication-orchestrator > Waiting for kubectl cp to complete
2024-08-21 00:01:56 replication-orchestrator > kubectl cp complete, closing process
2024-08-21 00:01:56 replication-orchestrator > Waiting until pod is ready...
2024-08-21 00:02:07 replication-orchestrator > Setting stdout...
2024-08-21 00:02:07 replication-orchestrator > Setting stderr...
2024-08-21 00:02:08 replication-orchestrator > Reading pod IP...
2024-08-21 00:02:08 replication-orchestrator > Pod IP: 172.31.17.190
2024-08-21 00:02:08 replication-orchestrator > Creating stdin socket...
2024-08-21 00:02:08 replication-orchestrator > Writing messages to protocol version 0.2.0
2024-08-21 00:02:08 replication-orchestrator > Reading messages from protocol version 0.2.0
2024-08-21 00:02:10 replication-orchestrator > Setting stdout...
2024-08-21 00:02:10 replication-orchestrator > Setting stderr...
2024-08-21 00:02:11 replication-orchestrator > Reading pod IP...
2024-08-21 00:02:11 replication-orchestrator > Pod IP: 172.31.31.61
2024-08-21 00:02:11 replication-orchestrator > Using null stdin output stream...
2024-08-21 00:02:11 replication-orchestrator > Reading messages from protocol version 0.2.0
2024-08-21 00:02:11 replication-orchestrator > Writing async status RUNNING for KubePodInfo[namespace=airbyte, name=orchestrator-repl-job-148941-attempt-0, mainContainerInfo=KubeContainerInfo[image=airbyte/container-orchestrator:0.50.50, pullPolicy=IfNotPresent]]...
2024-08-21 00:02:11 replication-orchestrator > readFromSource: start
2024-08-21 00:02:11 replication-orchestrator > Starting source heartbeat check. Will check every 1 minutes.
2024-08-21 00:02:11 replication-orchestrator > processMessage: start
2024-08-21 00:02:11 replication-orchestrator > readFromDestination: start
2024-08-21 00:02:11 replication-orchestrator > writeToDestination: start
2024-08-21 00:02:11 destination > Begin writing to the destination...
2024-08-21 00:02:11 destination > Creating StreamWriter for google_play_store_raw:reviews
2024-08-21 00:02:11 destination > Creating StreamWriter for google_play_store_raw:ratings
2024-08-21 00:02:13 source > Starting syncing SourceGCS
2024-08-21 00:02:13 source > Marking stream reviews as STARTED
2024-08-21 00:02:13 source > Setting state of SourceGCS stream to {'history': {'https://storage.googleapis.com/<REDACTED>.csv': '2024-07-30T00:07:35.266000Z', 'https://storage.googleapis.com/<REDACTED>.csv': '2024-07-30T00:06:17.239000Z', 'https://storage.googleapis.com/<REDACTED>.csv': '2024-07-30T00:05:42.248000Z', 'https://storage.googleapis.com/<REDACTED>.csv': '2024-07-30T00:07:22.749000Z', 'https://storage.googleapis.com/<REDACTED>.csv': '2024-07-30T00:06:23.242000Z', 'https://storage.googleapis.com/<REDACTED>.csv': '2024-07-30T00:06:41.237000Z', 'https://storage.googleapis.com/<REDACTED>.csv': '2024-07-30T00:06:26.316000Z', 'https://storage.googleapis.com/<REDACTED>.csv': '2024-07-30T00:07:00.245000Z', 'https://storage.googleapis.com/<REDACTED>.csv': '2024-07-30T00:05:42.805000Z', 'https://storage.googleapis.com/<REDACTED>.csv': '2024-07-30T00:06:36.246000Z', 'https://storage.googleapis.com/<REDACTED>.csv': '2024-07-30T00:07:29.751000Z', ..., 'https://storage.googleapis.com/<REDACTED>.csv': '2024-08-20T00:09:40.001000Z'}, '_ab_source_file_last_modified': '2024-08-20T00:10:15.110000Z_https://storage.googleapis.com/<REDACTED>.csv'}
2024-08-21 00:02:13 source > Syncing stream: reviews 
2024-08-21 00:02:13 replication-orchestrator > Attempt 0 to stream status started null:reviews
2024-08-21 00:02:15 source > Read 0 records from reviews stream
2024-08-21 00:02:15 source > Marking stream reviews as STOPPED
2024-08-21 00:02:15 source > Finished syncing reviews
2024-08-21 00:02:15 source > SourceGCS runtimes:
Syncing stream reviews 0:00:02.068286
2024-08-21 00:02:15 replication-orchestrator > Source state message checksum is valid for stream _reviews.
2024-08-21 00:02:15 source > Marking stream ratings as STARTED
2024-08-21 00:02:15 replication-orchestrator > Attempt 0 to stream status started null:ratings
2024-08-21 00:02:15 source > Setting state of SourceGCS stream to {'history': {'https://storage.googleapis.com/<REDACTED>.csv': '2015-06-03T10:08:54.129000Z', 'https://storage.googleapis.com/<REDACTED>.csv': '2015-06-02T12:04:50.346000Z', 'https://storage.googleapis.com/<REDACTED>.csv': '2015-06-02T01:05:52.961000Z', 'https://storage.googleapis.com/<REDACTED>.csv': '2015-06-01T09:10:10.879000Z', 'https://storage.googleapis.com/<REDACTED>.csv': '2015-05-29T17:03:12.792000Z', ..., 'https://storage.googleapis.com/<REDACTED>.csv': '2024-08-18T07:24:09.283000Z', 'https://storage.googleapis.com/<REDACTED>.csv': '2024-08-19T19:23:46.305000Z'}, '_ab_source_file_last_modified': '2024-08-19T19:24:06.219000Z_https://storage.googleapis.com/<REDACTED>.csv'}
2024-08-21 00:02:15 source > Syncing stream: ratings 
2024-08-21 00:02:17 source > Marking stream ratings as RUNNING
2024-08-21 00:02:17 replication-orchestrator > Attempt 0 to update stream status running null:ratings
2024-08-21 00:02:17 destination > Got state message from source: flushing records for reviews
2024-08-21 00:02:17 destination > No messages to write to google_play_store_raw:reviews
2024-08-21 00:02:17 replication-orchestrator > Source state message checksum is valid for stream _ratings.
2024-08-21 00:02:17 replication-orchestrator > Could not find the state message with hash 1446639533 in the stagedStatsList
2024-08-21 00:02:17 replication-orchestrator > Unexpected state from destination for stream null:reviews, 1446639533 not found in the stored stateHashes
2024-08-21 00:02:17 replication-orchestrator > starting state flush thread for connectionId ee323115-263c-4210-be0a-b6f083cfa30e
2024-08-21 00:02:17 destination > Got state message from source: flushing records for ratings
2024-08-21 00:02:17 replication-orchestrator > Source state message checksum is valid for stream _ratings.
2024-08-21 00:02:18 replication-orchestrator > Source state message checksum is valid for stream _ratings.
2024-08-21 00:02:19 source > Read 5109 records from ratings stream
2024-08-21 00:02:19 source > Marking stream ratings as STOPPED
2024-08-21 00:02:19 source > Finished syncing ratings
2024-08-21 00:02:19 source > SourceGCS runtimes:
Syncing stream ratings 0:00:03.951741
Syncing stream reviews 0:00:02.068286
2024-08-21 00:02:19 source > Finished syncing SourceGCS
2024-08-21 00:02:19 replication-orchestrator > Could not find the state message with hash -186950687 in the stagedStatsList
2024-08-21 00:02:19 replication-orchestrator > Unexpected state from destination for stream null:ratings, -186950687 not found in the stored stateHashes
2024-08-21 00:02:19 destination > Got state message from source: flushing records for ratings
2024-08-21 00:02:19 replication-orchestrator > Records read: 5000 (5 MB)
2024-08-21 00:02:19 replication-orchestrator > Source state message checksum is valid for stream _ratings.
2024-08-21 00:02:19 replication-orchestrator > Attempt 0 to update stream status complete null:ratings
2024-08-21 00:02:19 replication-orchestrator > SOURCE analytics [airbyte/source-gcs:0.4.15] | Type: file-cdk-csv-stream-count | Value: 2
2024-08-21 00:02:20 replication-orchestrator > Could not find the state message with hash 1418956406 in the stagedStatsList
2024-08-21 00:02:20 replication-orchestrator > Unexpected state from destination for stream null:ratings, 1418956406 not found in the stored stateHashes
2024-08-21 00:02:20 destination > Got state message from source: flushing records for ratings
2024-08-21 00:02:20 replication-orchestrator > (pod: airbyte / source-gcs-read-148941-0-alrjt) - Closed all resources for pod
2024-08-21 00:02:20 replication-orchestrator > Total records read: 5120 (5 MB)
2024-08-21 00:02:20 replication-orchestrator > Schema validation was performed to a max of 10 records with errors per stream.
2024-08-21 00:02:20 replication-orchestrator > readFromSource: done. (source.isFinished:true, fromSource.isClosed:false)
2024-08-21 00:02:20 replication-orchestrator > thread status... heartbeat thread: false , replication thread: true
2024-08-21 00:02:20 replication-orchestrator > processMessage: done. (fromSource.isDone:true, forDest.isClosed:false)
2024-08-21 00:02:20 replication-orchestrator > writeToDestination: done. (forDest.isDone:true, isDestRunning:true)
2024-08-21 00:02:20 replication-orchestrator > thread status... timeout thread: false , replication thread: true
2024-08-21 00:02:20 replication-orchestrator > Could not find the state message with hash -1718877320 in the stagedStatsList
2024-08-21 00:02:20 replication-orchestrator > Unexpected state from destination for stream null:ratings, -1718877320 not found in the stored stateHashes
2024-08-21 00:02:20 destination > Got state message from source: flushing records for ratings
2024-08-21 00:02:21 replication-orchestrator > Could not find the state message with hash 544505776 in the stagedStatsList
2024-08-21 00:02:21 replication-orchestrator > Unexpected state from destination for stream null:ratings, 544505776 not found in the stored stateHashes
2024-08-21 00:02:21 destination > No messages to write to google_play_store_raw:reviews
2024-08-21 00:02:21 destination > No messages to write to google_play_store_raw:ratings
2024-08-21 00:02:21 destination > Writing complete.
2024-08-21 00:02:22 replication-orchestrator > (pod: airbyte / destination-aws-datalake-write-148941-0-ykqgs) - Closed all resources for pod
2024-08-21 00:02:22 replication-orchestrator > Attempt 0 to update stream status complete null:ratings
2024-08-21 00:02:22 replication-orchestrator > readFromDestination: done. (writeToDestFailed:false, dest.isFinished:true)
2024-08-21 00:02:22 replication-orchestrator > thread status... timeout thread: false , replication thread: true
2024-08-21 00:02:22 replication-orchestrator > Attempt 0 to Flush States from SyncPersistenceImpl
2024-08-21 00:02:22 replication-orchestrator > Attempt 0 to Flush Stats from SyncPersistenceImpl
2024-08-21 00:02:22 replication-orchestrator > Attempt 0 to update stream status complete null:reviews
2024-08-21 00:02:22 replication-orchestrator > sync summary: {
  "status" : "completed",
  "recordsSynced" : 0,
  "bytesSynced" : 0,
  "startTime" : 1724198504259,
  "endTime" : 1724198542378,
  "totalStats" : {
    "bytesCommitted" : 5739490,
    "bytesEmitted" : 5739490,
    "destinationStateMessagesEmitted" : 5,
    "destinationWriteEndTime" : 1724198542133,
    "destinationWriteStartTime" : 1724198504275,
    "meanSecondsBeforeSourceStateMessageEmitted" : 1,
    "maxSecondsBeforeSourceStateMessageEmitted" : 4,
    "maxSecondsBetweenStateMessageEmittedandCommitted" : 0,
    "meanSecondsBetweenStateMessageEmittedandCommitted" : 0,
    "recordsEmitted" : 5109,
    "recordsCommitted" : 5109,
    "replicationEndTime" : 1724198542337,
    "replicationStartTime" : 1724198504259,
    "sourceReadEndTime" : 1724198540457,
    "sourceReadStartTime" : 1724198504277,
    "sourceStateMessagesEmitted" : 5
  },
  "streamStats" : [ {
    "streamName" : "ratings",
    "stats" : {
      "bytesCommitted" : 5739490,
      "bytesEmitted" : 5739490,
      "recordsEmitted" : 5109,
      "recordsCommitted" : 5109
    }
  }, {
    "streamName" : "reviews",
    "stats" : {
      "bytesCommitted" : 0,
      "bytesEmitted" : 0,
      "recordsEmitted" : 0,
      "recordsCommitted" : 0
    }
  } ],
  "performanceMetrics" : {
    "processFromSource" : {
      "elapsedTimeInNanos" : 414331646,
      "executionCount" : 5120,
      "avgExecTimeInNanos" : 80924.149609375
    },
    "readFromSource" : {
      "elapsedTimeInNanos" : 8177765410,
      "executionCount" : 201383,
      "avgExecTimeInNanos" : 40608.02257390147
    },
    "processFromDest" : {
      "elapsedTimeInNanos" : 93394805,
      "executionCount" : 5,
      "avgExecTimeInNanos" : 1.8678961E7
    },
    "writeToDest" : {
      "elapsedTimeInNanos" : 1769427666,
      "executionCount" : 5114,
      "avgExecTimeInNanos" : 345996.8060226828
    },
    "readFromDest" : {
      "elapsedTimeInNanos" : 10492884390,
      "executionCount" : 842797,
      "avgExecTimeInNanos" : 12450.073256074713
    }
  }
}
2024-08-21 00:02:22 replication-orchestrator > failures: [ ]
2024-08-21 00:02:22 replication-orchestrator > Returning output...
2024-08-21 00:02:22 replication-orchestrator > 
2024-08-21 00:02:22 replication-orchestrator > ----- END REPLICATION -----
2024-08-21 00:02:22 replication-orchestrator > 
2024-08-21 00:02:22 replication-orchestrator > Writing async status SUCCEEDED for KubePodInfo[namespace=airbyte, name=orchestrator-repl-job-148941-attempt-0, mainContainerInfo=KubeContainerInfo[image=airbyte/container-orchestrator:0.50.50, pullPolicy=IfNotPresent]]...
2024-08-21 00:01:30 platform > Executing worker wrapper. Airbyte version: 0.50.50
2024-08-21 00:01:30 platform > Attempt 0 to save workflow id for cancellation
2024-08-21 00:01:30 platform > Creating orchestrator-repl-job-148941-attempt-0 for attempt number: 0
2024-08-21 00:01:30 platform > Successfully deleted all running pods for the connection!
2024-08-21 00:01:30 platform > Waiting for pod to be running...
2024-08-21 00:01:37 platform > Pod airbyte/orchestrator-repl-job-148941-attempt-0 is running on 172.31.18.15
2024-08-21 00:01:37 platform > Uploading file: envMap.json
2024-08-21 00:01:37 platform > kubectl cp /tmp/f39b69c6-994c-4762-8248-4d10260caf74/envMap.json airbyte/orchestrator-repl-job-148941-attempt-0:/config/envMap.json -c init --retries=3
2024-08-21 00:01:37 platform > Waiting for kubectl cp to complete
2024-08-21 00:01:37 platform > kubectl cp complete, closing process
2024-08-21 00:01:37 platform > Uploading file: application.txt
2024-08-21 00:01:37 platform > kubectl cp /tmp/792754e1-1bf3-477f-97a2-34fc863f2732/application.txt airbyte/orchestrator-repl-job-148941-attempt-0:/config/application.txt -c init --retries=3
2024-08-21 00:01:37 platform > Waiting for kubectl cp to complete
2024-08-21 00:01:37 platform > kubectl cp complete, closing process
2024-08-21 00:01:37 platform > Uploading file: jobRunConfig.json
2024-08-21 00:01:37 platform > kubectl cp /tmp/055c3ac2-69bc-444c-a2d7-9f4552c46407/jobRunConfig.json airbyte/orchestrator-repl-job-148941-attempt-0:/config/jobRunConfig.json -c init --retries=3
2024-08-21 00:01:37 platform > Waiting for kubectl cp to complete
2024-08-21 00:01:38 platform > kubectl cp complete, closing process
2024-08-21 00:01:38 platform > Uploading file: destinationLauncherConfig.json
2024-08-21 00:01:38 platform > kubectl cp /tmp/a7391eb7-04e4-4605-9044-69f679ae4591/destinationLauncherConfig.json airbyte/orchestrator-repl-job-148941-attempt-0:/config/destinationLauncherConfig.json -c init --retries=3
2024-08-21 00:01:38 platform > Waiting for kubectl cp to complete
2024-08-21 00:01:38 platform > kubectl cp complete, closing process
2024-08-21 00:01:38 platform > Uploading file: sourceLauncherConfig.json
2024-08-21 00:01:38 platform > kubectl cp /tmp/e7a7bf0c-514a-46d7-93b7-5c9f0dfdc2eb/sourceLauncherConfig.json airbyte/orchestrator-repl-job-148941-attempt-0:/config/sourceLauncherConfig.json -c init --retries=3
2024-08-21 00:01:38 platform > Waiting for kubectl cp to complete
2024-08-21 00:01:38 platform > kubectl cp complete, closing process
2024-08-21 00:01:38 platform > Uploading file: input.json
2024-08-21 00:01:38 platform > kubectl cp /tmp/f88aa59e-5f6c-48a3-bba1-fb99d2e5edb3/input.json airbyte/orchestrator-repl-job-148941-attempt-0:/config/input.json -c init --retries=3
2024-08-21 00:01:38 platform > Waiting for kubectl cp to complete
2024-08-21 00:01:38 platform > kubectl cp complete, closing process
2024-08-21 00:01:38 platform > Uploading file: KUBE_POD_INFO
2024-08-21 00:01:38 platform > kubectl cp /tmp/c695a736-7251-4742-a197-a917efbb1025/KUBE_POD_INFO airbyte/orchestrator-repl-job-148941-attempt-0:/config/KUBE_POD_INFO -c init --retries=3
2024-08-21 00:01:38 platform > Waiting for kubectl cp to complete
2024-08-21 00:01:38 platform > kubectl cp complete, closing process
2024-08-21 00:01:38 platform > Uploading file: FINISHED_UPLOADING
2024-08-21 00:01:38 platform > kubectl cp /tmp/570a2114-a642-4668-bbeb-52135299b8c3/FINISHED_UPLOADING airbyte/orchestrator-repl-job-148941-attempt-0:/config/FINISHED_UPLOADING -c init --retries=3
2024-08-21 00:01:38 platform > Waiting for kubectl cp to complete
2024-08-21 00:01:39 platform > kubectl cp complete, closing process
2024-08-21 00:02:24 platform > State Store reports orchestrator pod orchestrator-repl-job-148941-attempt-0 succeeded

Contribute

m-ronchi commented 1 month ago

downgrading to v0.3.7 made full refresh work again. I suspect it's related to https://github.com/airbytehq/airbyte/pull/35622

marcosmarxm commented 1 month ago

Thanks @m-ronchi are you also running the latest version of the platform?

m-ronchi commented 4 weeks ago

hi, no the platform is on 0.50.50

we are in the process of upgrading the helm chart, but it is more work than expected (e.g. changes in the values.yaml entries)