Closed alberttwong closed 1 year ago
Total records read: 0 (0 bytes) 2023-06-06 18:30:32 ?[32mINFO?[m i.a.w.i.FieldSelector(reportMetrics):122 - Schema validation was performed to a max of 10 > records with errors per stream. ... 2023-06-06 18:30:34 ?[32mINFO?[m i.a.w.g.DefaultReplicationWorker(getReplicationOutput):450 - failures: [ { "failureOrigin" : "source", "internalMessage" : "Source process exited with non-zero exit code 137", "externalMessage" : "Something went wrong within the source connector", "metadata" : { "attemptNumber" : 0, "jobId" : 2, "connector_command" : "read" }, "stacktrace" : "io.airbyte.workers.internal.exception.SourceException: Source process exited with non-zero exit code 137\n\tat io.airbyte.workers.general.DefaultReplicationWorker.lambda$readFromSrcAndWriteToDstRunnable$5(DefaultReplicationWorker.java:379)\n\tat java.base/java.util.concurrent.CompletableFuture$AsyncRun.run(CompletableFuture.java:1804)\n\tat java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1144)\n\tat java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:642)\n\tat java.base/java.lang.Thread.run(Thread.java:1589)\n", "timestamp" : 1686076232935 } ]
Could you check your source config or source URL, Seem like source connector process was killed? mostly because of OOM?
The source URL seems to be okay. I can access https://d37ci6vzurychx.cloudfront.net/trip-data/yellow_tripdata_2023-01.parquet just fine. Also I documented my steps at https://github.com/StarRocks/starrocks/discussions/23713
"metadata" : {
"attemptNumber" : 1,
"jobId" : 5,
"connector_command" : "read"
},
"stacktrace" : "io.airbyte.workers.internal.exception.SourceException: Source process exited with non-zero exit code 137\n\tat io.airbyte.workers.general.DefaultReplicationWorker.lambda$readFromSrcAndWriteToDstRunnable$5(DefaultReplicationWorker.java:379)\n\tat java.base/java.util.concurrent.CompletableFuture$AsyncRun.run(CompletableFuture.java:1804)\n\tat java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1144)\n\tat java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:642)\n\tat java.base/java.lang.Thread.run(Thread.java:1589)\n",
"timestamp" : 1686246768331
} ]
2023-06-08 17:52:49 INFO i.a.c.i.LineGobbler(voidCall):149 -
2023-06-08 17:52:49 INFO i.a.c.i.LineGobbler(voidCall):149 - ----- END REPLICATION -----
2023-06-08 17:52:49 INFO i.a.c.i.LineGobbler(voidCall):149 -
2023-06-08 17:52:49 INFO i.a.w.t.TemporalAttemptExecution(get):163 - Stopping cancellation check scheduling...
2023-06-08 17:52:49 INFO i.a.w.t.s.ReplicationActivityImpl(lambda$replicate$3):159 - sync summary: io.airbyte.config.StandardSyncOutput@2e2b0110[standardSyncSummary=io.airbyte.config.StandardSyncSummary@796f1f60[status=failed,recordsSynced=0,bytesSynced=0,startTime=1686246749880,endTime=1686246769528,totalStats=io.airbyte.config.SyncStats@68efefa0[bytesCommitted=0,bytesEmitted=0,destinationStateMessagesEmitted=0,destinationWriteEndTime=1686246769527,destinationWriteStartTime=1686246749961,estimatedBytes=<null>,estimatedRecords=<null>,meanSecondsBeforeSourceStateMessageEmitted=0,maxSecondsBeforeSourceStateMessageEmitted=0,maxSecondsBetweenStateMessageEmittedandCommitted=0,meanSecondsBetweenStateMessageEmittedandCommitted=0,recordsEmitted=0,recordsCommitted=0,replicationEndTime=1686246769528,replicationStartTime=1686246749880,sourceReadEndTime=1686246768315,sourceReadStartTime=1686246749918,sourceStateMessagesEmitted=0,additionalProperties={}],streamStats=[],additionalProperties={}],normalizationSummary=<null>,webhookOperationSummary=<null>,state=<null>,outputCatalog=io.airbyte.protocol.models.ConfiguredAirbyteCatalog@4663685c[streams=[io.airbyte.protocol.models.ConfiguredAirbyteStream@30b6d201[stream=io.airbyte.protocol.models.AirbyteStream@31235f68[name=nyc,jsonSchema={"$schema":"http://json-schema.org/draft-07/schema#","type":"object","properties":{"DOLocationID":{"type":["number","null"]},"RatecodeID":{"type":["number","null"]},"fare_amount":{"type":["number","null"]},"congestion_surcharge":{"type":["number","null"]},"tpep_dropoff_datetime":{"format":"date-time","type":["string","null"]},"VendorID":{"type":["number","null"]},"passenger_count":{"type":["number","null"]},"tolls_amount":{"type":["number","null"]},"improvement_surcharge":{"type":["number","null"]},"trip_distance":{"type":["number","null"]},"payment_type":{"type":["number","null"]},"store_and_fwd_flag":{"type":["string","null"]},"total_amount":{"type":["number","null"]},"extra":{"type":["number","null"]},"tip_amount":{"type":["number","null"]},"mta_tax":{"type":["number","null"]},"airport_fee":{"type":["number","null"]},"PULocationID":{"type":["number","null"]},"tpep_pickup_datetime":{"format":"date-time","type":["string","null"]}}},supportedSyncModes=[full_refresh],sourceDefinedCursor=<null>,defaultCursorField=[],sourceDefinedPrimaryKey=[],namespace=<null>,additionalProperties={}],syncMode=full_refresh,cursorField=[],destinationSyncMode=overwrite,primaryKey=[],additionalProperties={}]],additionalProperties={}],failures=[io.airbyte.config.FailureReason@26262403[failureOrigin=source,failureType=<null>,internalMessage=Source process exited with non-zero exit code 137,externalMessage=Something went wrong within the source connector,metadata=io.airbyte.config.Metadata@7bda9913[additionalProperties={attemptNumber=1, jobId=5, connector_command=read}],stacktrace=io.airbyte.workers.internal.exception.SourceException: Source process exited with non-zero exit code 137
at io.airbyte.workers.general.DefaultReplicationWorker.lambda$readFromSrcAndWriteToDstRunnable$5(DefaultReplicationWorker.java:379)
at java.base/java.util.concurrent.CompletableFuture$AsyncRun.run(CompletableFuture.java:1804)
at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1144)
at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:642)
at java.base/java.lang.Thread.run(Thread.java:1589)
,retryable=<null>,timestamp=1686246768331,additionalProperties={}]],commitStateAsap=true,additionalProperties={}]
2023-06-08 17:52:49 INFO i.a.w.t.s.ReplicationActivityImpl(lambda$replicate$3):164 - Sync summary length: 3459
ahh.. the parquet file is too big! I tried a 11meg parquet file and worked! https://d37ci6vzurychx.cloudfront.net/trip-data/green_tripdata_2023-01.parquet
Using StarRocks allin1 docker container with Airbyte.