awslabs / spark-sql-kinesis-connector

Spark Structured Streaming Kinesis Data Streams connector supports both GetRecords and SubscribeToShard (Enhanced Fan-Out, EFO)
Apache License 2.0
26 stars 13 forks source link

Job restart fails with "assertion failed" #34

Open shenavaa opened 1 month ago

shenavaa commented 1 month ago

There hasn't been any records to the stream after previous job run. Job failed to restart with "Assertion failed exception" the current batch id is -1 at restart which is less than previous batchId.

24/10/18 03:43:38 INFO KinesisV2MicrobatchStream: prevBatchId 36, currBatchId -1
24/10/18 03:43:38 ERROR MicroBatchExecution: Query [id = b06a3525-bb35-4e02-b34b-35652c12cbf6, runId = 343c0aca-9104-4e3d-813a-a754adfaac4e] terminated with error
org.apache.spark.SparkException: [INTERNAL_ERROR] The Spark SQL phase planning failed with an internal error. You hit a bug in Spark or the Spark plugins you use. Please, report this bug to the corresponding communities or vendors, and provide the full stack trace.
    at org.apache.spark.SparkException$.internalError(SparkException.scala:107) ~[spark-common-utils_2.12-3.5.1-amzn-1.jar:3.5.1-amzn-1]
    at org.apache.spark.sql.execution.QueryExecution$.toInternalError(QueryExecution.scala:701) ~[spark-sql_2.12-3.5.1-amzn-1.jar:3.5.1-amzn-1]
    at org.apache.spark.sql.execution.QueryExecution$.withInternalError(QueryExecution.scala:713) ~[spark-sql_2.12-3.5.1-amzn-1.jar:3.5.1-amzn-1]
    at org.apache.spark.sql.execution.QueryExecution.$anonfun$executePhase$1(QueryExecution.scala:277) ~[spark-sql_2.12-3.5.1-amzn-1.jar:3.5.1-amzn-1]
    at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:901) ~[spark-sql_2.12-3.5.1-amzn-1.jar:3.5.1-amzn-1]
    at org.apache.spark.sql.execution.QueryExecution.executePhase(QueryExecution.scala:276) ~[spark-sql_2.12-3.5.1-amzn-1.jar:3.5.1-amzn-1]
    at org.apache.spark.sql.execution.QueryExecution.getSparkPlan(QueryExecution.scala:195) ~[spark-sql_2.12-3.5.1-amzn-1.jar:3.5.1-amzn-1]
    at org.apache.spark.sql.execution.QueryExecution.sparkPlan$lzycompute(QueryExecution.scala:187) ~[spark-sql_2.12-3.5.1-amzn-1.jar:3.5.1-amzn-1]
    at org.apache.spark.sql.execution.QueryExecution.sparkPlan(QueryExecution.scala:187) ~[spark-sql_2.12-3.5.1-amzn-1.jar:3.5.1-amzn-1]
    at org.apache.spark.sql.execution.QueryExecution.$anonfun$getExecutedPlan$1(QueryExecution.scala:211) ~[spark-sql_2.12-3.5.1-amzn-1.jar:3.5.1-amzn-1]
    at org.apache.spark.sql.catalyst.QueryPlanningTracker.measurePhase(QueryPlanningTracker.scala:219) ~[spark-catalyst_2.12-3.5.1-amzn-1.jar:3.5.1-amzn-1]
    at org.apache.spark.sql.execution.QueryExecution.$anonfun$executePhase$2(QueryExecution.scala:277) ~[spark-sql_2.12-3.5.1-amzn-1.jar:3.5.1-amzn-1]
    at org.apache.spark.sql.execution.QueryExecution$.withInternalError(QueryExecution.scala:711) ~[spark-sql_2.12-3.5.1-amzn-1.jar:3.5.1-amzn-1]
    at org.apache.spark.sql.execution.QueryExecution.$anonfun$executePhase$1(QueryExecution.scala:277) ~[spark-sql_2.12-3.5.1-amzn-1.jar:3.5.1-amzn-1]
    at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:901) ~[spark-sql_2.12-3.5.1-amzn-1.jar:3.5.1-amzn-1]
    at org.apache.spark.sql.execution.QueryExecution.executePhase(QueryExecution.scala:276) ~[spark-sql_2.12-3.5.1-amzn-1.jar:3.5.1-amzn-1]
    at org.apache.spark.sql.execution.QueryExecution.getExecutedPlan(QueryExecution.scala:208) ~[spark-sql_2.12-3.5.1-amzn-1.jar:3.5.1-amzn-1]
    at org.apache.spark.sql.execution.QueryExecution.executedPlan$lzycompute(QueryExecution.scala:203) ~[spark-sql_2.12-3.5.1-amzn-1.jar:3.5.1-amzn-1]
    at org.apache.spark.sql.execution.QueryExecution.executedPlan(QueryExecution.scala:203) ~[spark-sql_2.12-3.5.1-amzn-1.jar:3.5.1-amzn-1]
    at org.apache.spark.sql.execution.streaming.MicroBatchExecution.$anonfun$runBatch$15(MicroBatchExecution.scala:720) ~[spark-sql_2.12-3.5.1-amzn-1.jar:3.5.1-amzn-1]
    at org.apache.spark.sql.execution.streaming.ProgressReporter.reportTimeTaken(ProgressReporter.scala:427) ~[spark-sql_2.12-3.5.1-amzn-1.jar:3.5.1-amzn-1]
    at org.apache.spark.sql.execution.streaming.ProgressReporter.reportTimeTaken$(ProgressReporter.scala:425) ~[spark-sql_2.12-3.5.1-amzn-1.jar:3.5.1-amzn-1]
    at org.apache.spark.sql.execution.streaming.StreamExecution.reportTimeTaken(StreamExecution.scala:67) ~[spark-sql_2.12-3.5.1-amzn-1.jar:3.5.1-amzn-1]
    at org.apache.spark.sql.execution.streaming.MicroBatchExecution.runBatch(MicroBatchExecution.scala:708) ~[spark-sql_2.12-3.5.1-amzn-1.jar:3.5.1-amzn-1]
    at org.apache.spark.sql.execution.streaming.MicroBatchExecution.$anonfun$runActivatedStream$2(MicroBatchExecution.scala:286) ~[spark-sql_2.12-3.5.1-amzn-1.jar:3.5.1-amzn-1]
    at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23) ~[scala-library-2.12.18.jar:?]
    at org.apache.spark.sql.execution.streaming.ProgressReporter.reportTimeTaken(ProgressReporter.scala:427) ~[spark-sql_2.12-3.5.1-amzn-1.jar:3.5.1-amzn-1]
    at org.apache.spark.sql.execution.streaming.ProgressReporter.reportTimeTaken$(ProgressReporter.scala:425) ~[spark-sql_2.12-3.5.1-amzn-1.jar:3.5.1-amzn-1]
    at org.apache.spark.sql.execution.streaming.StreamExecution.reportTimeTaken(StreamExecution.scala:67) ~[spark-sql_2.12-3.5.1-amzn-1.jar:3.5.1-amzn-1]
    at org.apache.spark.sql.execution.streaming.MicroBatchExecution.$anonfun$runActivatedStream$1(MicroBatchExecution.scala:249) ~[spark-sql_2.12-3.5.1-amzn-1.jar:3.5.1-amzn-1]
    at org.apache.spark.sql.execution.streaming.ProcessingTimeExecutor.execute(TriggerExecutor.scala:67) ~[spark-sql_2.12-3.5.1-amzn-1.jar:3.5.1-amzn-1]
    at org.apache.spark.sql.execution.streaming.MicroBatchExecution.runActivatedStream(MicroBatchExecution.scala:239) ~[spark-sql_2.12-3.5.1-amzn-1.jar:3.5.1-amzn-1]
    at org.apache.spark.sql.execution.streaming.StreamExecution.$anonfun$runStream$1(StreamExecution.scala:311) ~[spark-sql_2.12-3.5.1-amzn-1.jar:3.5.1-amzn-1]
    at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23) ~[scala-library-2.12.18.jar:?]
    at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:901) ~[spark-sql_2.12-3.5.1-amzn-1.jar:3.5.1-amzn-1]
    at org.apache.spark.sql.execution.streaming.StreamExecution.org$apache$spark$sql$execution$streaming$StreamExecution$$runStream(StreamExecution.scala:289) ~[spark-sql_2.12-3.5.1-amzn-1.jar:3.5.1-amzn-1]
    at org.apache.spark.sql.execution.streaming.StreamExecution$$anon$1.$anonfun$run$1(StreamExecution.scala:211) ~[spark-sql_2.12-3.5.1-amzn-1.jar:3.5.1-amzn-1]
    at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23) [scala-library-2.12.18.jar:?]
    at org.apache.spark.JobArtifactSet$.withActiveJobArtifactState(JobArtifactSet.scala:94) [spark-core_2.12-3.5.1-amzn-1.jar:3.5.1-amzn-1]
    at org.apache.spark.sql.execution.streaming.StreamExecution$$anon$1.run(StreamExecution.scala:211) [spark-sql_2.12-3.5.1-amzn-1.jar:3.5.1-amzn-1]
Caused by: java.lang.AssertionError: assertion failed
    at scala.Predef$.assert(Predef.scala:208) ~[scala-library-2.12.18.jar:?]
    at org.apache.spark.sql.connector.kinesis.KinesisV2MicrobatchStream.planInputPartitions(KinesisV2MicrobatchStream.scala:304) ~[spark-streaming-sql-kinesis-connector_2.12-1.4.0.jar:?]
    at org.apache.spark.sql.execution.datasources.v2.MicroBatchScanExec.inputPartitions$lzycompute(MicroBatchScanExec.scala:46) ~[spark-sql_2.12-3.5.1-amzn-1.jar:3.5.1-amzn-1]
    at org.apache.spark.sql.execution.datasources.v2.MicroBatchScanExec.inputPartitions(MicroBatchScanExec.scala:46) ~[spark-sql_2.12-3.5.1-amzn-1.jar:3.5.1-amzn-1]
    at org.apache.spark.sql.execution.datasources.v2.DataSourceV2ScanExecBase.supportsColumnar(DataSourceV2ScanExecBase.scala:179) ~[spark-sql_2.12-3.5.1-amzn-1.jar:3.5.1-amzn-1]
    at org.apache.spark.sql.execution.datasources.v2.DataSourceV2ScanExecBase.supportsColumnar$(DataSourceV2ScanExecBase.scala:175) ~[spark-sql_2.12-3.5.1-amzn-1.jar:3.5.1-amzn-1]
    at org.apache.spark.sql.execution.datasources.v2.MicroBatchScanExec.supportsColumnar(MicroBatchScanExec.scala:29) ~[spark-sql_2.12-3.5.1-amzn-1.jar:3.5.1-amzn-1]
    at org.apache.spark.sql.execution.datasources.v2.DataSourceV2Strategy.apply(DataSourceV2Strategy.scala:175) ~[spark-sql_2.12-3.5.1-amzn-1.jar:3.5.1-amzn-1]
    at org.apache.spark.sql.catalyst.planning.QueryPlanner.$anonfun$plan$1(QueryPlanner.scala:63) ~[spark-catalyst_2.12-3.5.1-amzn-1.jar:3.5.1-amzn-1]
    at scala.collection.Iterator$$anon$11.nextCur(Iterator.scala:486) ~[scala-library-2.12.18.jar:?]
    at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:492) ~[scala-library-2.12.18.jar:?]
    at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:491) ~[scala-library-2.12.18.jar:?]
    at org.apache.spark.sql.catalyst.planning.QueryPlanner.plan(QueryPlanner.scala:93) ~[spark-catalyst_2.12-3.5.1-amzn-1.jar:3.5.1-amzn-1]
    at org.apache.spark.sql.execution.SparkStrategies.plan(SparkStrategies.scala:74) ~[spark-sql_2.12-3.5.1-amzn-1.jar:3.5.1-amzn-1]
    at org.apache.spark.sql.catalyst.planning.QueryPlanner.$anonfun$plan$3(QueryPlanner.scala:78) ~[spark-catalyst_2.12-3.5.1-amzn-1.jar:3.5.1-amzn-1]
    at scala.collection.TraversableOnce$folder$1.apply(TraversableOnce.scala:196) ~[scala-library-2.12.18.jar:?]
    at scala.collection.TraversableOnce$folder$1.apply(TraversableOnce.scala:194) ~[scala-library-2.12.18.jar:?]
    at scala.collection.Iterator.foreach(Iterator.scala:943) ~[scala-library-2.12.18.jar:?]
    at scala.collection.Iterator.foreach$(Iterator.scala:943) ~[scala-library-2.12.18.jar:?]
    at scala.collection.AbstractIterator.foreach(Iterator.scala:1431) ~[scala-library-2.12.18.jar:?]
    at scala.collection.TraversableOnce.foldLeft(TraversableOnce.scala:199) ~[scala-library-2.12.18.jar:?]
    at scala.collection.TraversableOnce.foldLeft$(TraversableOnce.scala:192) ~[scala-library-2.12.18.jar:?]
    at scala.collection.AbstractIterator.foldLeft(Iterator.scala:1431) ~[scala-library-2.12.18.jar:?]
    at org.apache.spark.sql.catalyst.planning.QueryPlanner.$anonfun$plan$2(QueryPlanner.scala:75) ~[spark-catalyst_2.12-3.5.1-amzn-1.jar:3.5.1-amzn-1]
    at scala.collection.Iterator$$anon$11.nextCur(Iterator.scala:486) ~[scala-library-2.12.18.jar:?]
    at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:492) ~[scala-library-2.12.18.jar:?]
    at org.apache.spark.sql.catalyst.planning.QueryPlanner.plan(QueryPlanner.scala:93) ~[spark-catalyst_2.12-3.5.1-amzn-1.jar:3.5.1-amzn-1]
    at org.apache.spark.sql.execution.SparkStrategies.plan(SparkStrategies.scala:74) ~[spark-sql_2.12-3.5.1-amzn-1.jar:3.5.1-amzn-1]
    at org.apache.spark.sql.execution.QueryExecution$.createSparkPlan(QueryExecution.scala:655) ~[spark-sql_2.12-3.5.1-amzn-1.jar:3.5.1-amzn-1]
    at org.apache.spark.sql.execution.QueryExecution.$anonfun$getSparkPlan$1(QueryExecution.scala:195) ~[spark-sql_2.12-3.5.1-amzn-1.jar:3.5.1-amzn-1]
    at org.apache.spark.sql.catalyst.QueryPlanningTracker.measurePhase(QueryPlanningTracker.scala:219) ~[spark-catalyst_2.12-3.5.1-amzn-1.jar:3.5.1-amzn-1]
    at org.apache.spark.sql.execution.QueryExecution.$anonfun$executePhase$2(QueryExecution.scala:277) ~[spark-sql_2.12-3.5.1-amzn-1.jar:3.5.1-amzn-1]
    at org.apache.spark.sql.execution.QueryExecution$.withInternalError(QueryExecution.scala:711) ~[spark-sql_2.12-3.5.1-amzn-1.jar:3.5.1-amzn-1]
    ... 37 more
24/10/18 03:43:38 INFO KinesisV2MicrobatchStream: Stopping KinesisV2MicrobatchStream. deregisterStreamConsumer kinesis - 
24/10/18 03:43:38 INFO KinesisClientConsumerImpl: KinesisClientConsumerImpl.close for kinesis 
penniman26 commented 1 month ago

What version of spark-sql-kinesis-connector are you running?

edit: version 1.4.0 based on the logs

shenavaa commented 1 month ago

It’s 1.4 the stack trace has jar versions.


Amir H Shenavandeh EMail: shenavandeh {@} gmail {Dot} com

On Sat, 19 Oct 2024 at 3:06 PM, Andrew Penniman @.***> wrote:

What version of spark-sql-kinesis-connector are you running?

— Reply to this email directly, view it on GitHub https://github.com/awslabs/spark-sql-kinesis-connector/issues/34#issuecomment-2423556097, or unsubscribe https://github.com/notifications/unsubscribe-auth/AFASV6SIIWOCTUUN3E5LNXLZ4HLDLAVCNFSM6AAAAABQFB5JX2VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDIMRTGU2TMMBZG4 . You are receiving this because you authored the thread.Message ID: @.***>

penniman26 commented 1 month ago

Yep I was able to get it from the logs. The motivation for my question was because there was a recent release and wondering if this was a regression. Generally the more context you can provide upfront in a bug report the faster it'll get resolved without a lot of back and forth, but I will defer to others if more context would be useful for resolution here. Anyways, thanks for submitting the bug report!

hwanghw commented 1 month ago

Hi shenavaa, can you share your driver log? I tried several times to stop and restart the job from checkpoint when there is no new records to the stream but can't reproduce the problem.