airbytehq / airbyte

The leading data integration platform for ETL / ELT data pipelines from APIs, databases & files to data warehouses, data lakes & data lakehouses. Both self-hosted and Cloud-hosted.
https://airbyte.com
Other
15.73k stars 4.03k forks source link

[source-shopify] Exception raised on product_image stream #40700

Closed SebastienCY closed 3 months ago

SebastienCY commented 3 months ago

Connector Name

source-shopify

Connector Version

2.4.7

What step the error happened?

During the sync

Relevant information

We encountered new repeated failures on product_images stream sync. Could be a consequence of the fix related to this issue, released with v2.4.7.

Relevant log output

2024-07-02 21:31:25 source > Marking stream product_images as STARTED
2024-07-02 21:31:25 replication-orchestrator > Stream status TRACE received of status: STARTED for stream product_images
2024-07-02 21:31:25 source > Syncing stream: product_images 
2024-07-02 21:31:25 replication-orchestrator > Sending update for product_images - null -> RUNNING
2024-07-02 21:31:25 source > Stream: `product_images` requesting BULK Job for period: 2024-07-02T01:02:56+00:00 -- 2024-07-02T03:26:56+00:00. Slice size: `P0.0D`
2024-07-02 21:31:25 replication-orchestrator > Stream Status Update Received: product_images - RUNNING
2024-07-02 21:31:25 replication-orchestrator > Creating status: product_images - RUNNING
2024-07-02 21:31:25 source > Stream: `product_images`, the BULK Job: `gid://shopify/BulkOperation/4424728609089` is CREATED
2024-07-02 21:31:25 source > API Load: `REGULAR`
2024-07-02 21:31:30 source > Stream: `product_images`, the BULK Job: `gid://shopify/BulkOperation/4424728609089` is COMPLETED
2024-07-02 21:31:30 source > Stream: `product_images`, the BULK Job: `gid://shopify/BulkOperation/4424728609089` time elapsed: 4.1 sec.
2024-07-02 21:31:30 source > Marking stream product_images as RUNNING
2024-07-02 21:31:30 replication-orchestrator > Stream status TRACE received of status: RUNNING for stream product_images
2024-07-02 21:31:30 replication-orchestrator > readFromSource: source exception
io.airbyte.workers.internal.exception.SourceException: All the defined primary keys are null, the primary keys are: id
        at io.airbyte.workers.internal.BasicAirbyteMessageValidator.validate(BasicAirbyteMessageValidator.java:78) ~[io.airbyte-airbyte-commons-worker-dev.jar:?]
        at io.airbyte.workers.internal.VersionedAirbyteStreamFactory.toAirbyteMessage(VersionedAirbyteStreamFactory.java:327) ~[io.airbyte-airbyte-commons-worker-dev.jar:?]
        at java.base/java.util.stream.ReferencePipeline$7$1.accept(ReferencePipeline.java:273) ~[?:?]
        at java.base/java.util.stream.ReferencePipeline$15$1.accept(ReferencePipeline.java:541) ~[?:?]
        at java.base/java.util.Spliterators$IteratorSpliterator.tryAdvance(Spliterators.java:1950) ~[?:?]
        at java.base/java.util.stream.StreamSpliterators$WrappingSpliterator.lambda$initPartialTraversalState$0(StreamSpliterators.java:292) ~[?:?]
        at java.base/java.util.stream.StreamSpliterators$AbstractWrappingSpliterator.fillBuffer(StreamSpliterators.java:206) ~[?:?]
        at java.base/java.util.stream.StreamSpliterators$AbstractWrappingSpliterator.doAdvance(StreamSpliterators.java:169) ~[?:?]
        at java.base/java.util.stream.StreamSpliterators$WrappingSpliterator.tryAdvance(StreamSpliterators.java:298) ~[?:?]
        at java.base/java.util.Spliterators$1Adapter.hasNext(Spliterators.java:681) ~[?:?]
        at io.airbyte.workers.internal.DefaultAirbyteSource.isFinished(DefaultAirbyteSource.java:130) ~[io.airbyte-airbyte-commons-worker-dev.jar:?]
        at io.airbyte.workers.general.BufferedReplicationWorker.sourceIsFinished(BufferedReplicationWorker.java:345) ~[io.airbyte-airbyte-commons-worker-dev.jar:?]
        at io.airbyte.workers.general.BufferedReplicationWorker.readFromSource(BufferedReplicationWorker.java:356) ~[io.airbyte-airbyte-commons-worker-dev.jar:?]
        at io.airbyte.workers.general.BufferedReplicationWorker.lambda$runAsyncWithHeartbeatCheck$3(BufferedReplicationWorker.java:242) ~[io.airbyte-airbyte-commons-worker-dev.jar:?]
        at java.base/java.util.concurrent.CompletableFuture$AsyncRun.run(CompletableFuture.java:1804) ~[?:?]
        at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1144) ~[?:?]
        at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:642) ~[?:?]
        at java.base/java.lang.Thread.run(Thread.java:1583) [?:?]
2024-07-02 21:31:31 replication-orchestrator > readFromSource: done. (source.isFinished:false, fromSource.isClosed:false)
2024-07-02 21:31:35 replication-orchestrator > processMessage: done. (fromSource.isDone:true, forDest.isClosed:false)
2024-07-02 21:31:40 replication-orchestrator > writeToDestination: exception caught
java.lang.IllegalStateException: Source process is still alive, cannot retrieve exit value.
        at com.google.common.base.Preconditions.checkState(Preconditions.java:515) ~[guava-33.2.0-jre.jar:?]
        at io.airbyte.workers.internal.DefaultAirbyteSource.getExitValue(DefaultAirbyteSource.java:136) ~[io.airbyte-airbyte-commons-worker-dev.jar:?]
        at io.airbyte.workers.general.BufferedReplicationWorker.writeToDestination(BufferedReplicationWorker.java:454) ~[io.airbyte-airbyte-commons-worker-dev.jar:?]
        at io.airbyte.workers.general.BufferedReplicationWorker.lambda$runAsyncWithTimeout$5(BufferedReplicationWorker.java:263) ~[io.airbyte-airbyte-commons-worker-dev.jar:?]
        at java.base/java.util.concurrent.CompletableFuture$AsyncRun.run(CompletableFuture.java:1804) ~[?:?]
        at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1144) ~[?:?]
        at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:642) ~[?:?]
        at java.base/java.lang.Thread.run(Thread.java:1583) [?:?]

Contribute

marcosmarxm commented 3 months ago

@SebastienCY, could you please check the URL you linked in the issue? It isn't working.

bazarnov commented 3 months ago

Discussed here at first: https://github.com/airbytehq/airbyte/issues/39478#issuecomment-2206770514

bazarnov commented 3 months ago

Working on the fix.

bazarnov commented 3 months ago

The fix is here: https://github.com/airbytehq/airbyte/pull/40707

SebastienCY commented 2 months ago

Hi @bazarnov The issue didn't show again since we upgraded to the fixed version. Thank you for the fix !