allwefantasy / spark-binlog

A library for querying Binlog with Apache Spark structure streaming, for Spark SQL , DataFrames and [MLSQL](https://www.mlsql.tech).
Apache License 2.0
154 stars 54 forks source link

what's problem caused the stream program shut down ? #24

Open zhengqiangtan opened 4 years ago

zhengqiangtan commented 4 years ago

hi ,I'm sorry to bothered you, I met some issue like this , I can not find the reason why the stream program shutdown , This is the detail error log .

2020-01-14 16:55:00 ERROR MicroBatchExecution:91 - Query [id = 1e58880f-be83-47c3-b837-9d7c64a90982, runId = 9eecfa85-29c5-4246-963e-90fd09f6731c] terminated with error java.io.EOFException at java.io.DataInputStream.readInt(DataInputStream.java:392) at org.apache.spark.sql.mlsql.sources.mysql.binlog.BinLogSocketServerSerDer$class.readResponse(servers.scala:163) at org.apache.spark.sql.mlsql.sources.MLSQLBinLogSource.readResponse(MLSQLBinLogDataSource.scala:243) at org.apache.spark.sql.mlsql.sources.MLSQLBinLogSource.getLatestOffset(MLSQLBinLogDataSource.scala:324) at org.apache.spark.sql.mlsql.sources.MLSQLBinLogSource.getOffset(MLSQLBinLogDataSource.scala:334) at org.apache.spark.sql.execution.streaming.MicroBatchExecution$$anonfun$org$apache$spark$sql$execution$streaming$MicroBatchExecution$$constructNextBatch$1$$anonfun$5$$anonfun$apply$9.apply(MicroBatchExecution.scala:345) at org.apache.spark.sql.execution.streaming.MicroBatchExecution$$anonfun$org$apache$spark$sql$execution$streaming$MicroBatchExecution$$constructNextBatch$1$$anonfun$5$$anonfun$apply$9.apply(MicroBatchExecution.scala:345) at org.apache.spark.sql.execution.streaming.ProgressReporter$class.reportTimeTaken(ProgressReporter.scala:351) at org.apache.spark.sql.execution.streaming.StreamExecution.reportTimeTaken(StreamExecution.scala:58) at org.apache.spark.sql.execution.streaming.MicroBatchExecution$$anonfun$org$apache$spark$sql$execution$streaming$MicroBatchExecution$$constructNextBatch$1$$anonfun$5.apply(MicroBatchExecution.scala:344) at org.apache.spark.sql.execution.streaming.MicroBatchExecution$$anonfun$org$apache$spark$sql$execution$streaming$MicroBatchExecution$$constructNextBatch$1$$anonfun$5.apply(MicroBatchExecution.scala:341) at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234) at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234) at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59) at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:48) at scala.collection.TraversableLike$class.map(TraversableLike.scala:234) at scala.collection.AbstractTraversable.map(Traversable.scala:104) at org.apache.spark.sql.execution.streaming.MicroBatchExecution$$anonfun$org$apache$spark$sql$execution$streaming$MicroBatchExecution$$constructNextBatch$1.apply$mcZ$sp(MicroBatchExecution.scala:341) at org.apache.spark.sql.execution.streaming.MicroBatchExecution$$anonfun$org$apache$spark$sql$execution$streaming$MicroBatchExecution$$constructNextBatch$1.apply(MicroBatchExecution.scala:337) at org.apache.spark.sql.execution.streaming.MicroBatchExecution$$anonfun$org$apache$spark$sql$execution$streaming$MicroBatchExecution$$constructNextBatch$1.apply(MicroBatchExecution.scala:337) at org.apache.spark.sql.execution.streaming.MicroBatchExecution.withProgressLocked(MicroBatchExecution.scala:554) at org.apache.spark.sql.execution.streaming.MicroBatchExecution.org$apache$spark$sql$execution$streaming$MicroBatchExecution$$constructNextBatch(MicroBatchExecution.scala:337) at org.apache.spark.sql.execution.streaming.MicroBatchExecution$$anonfun$runActivatedStream$1$$anonfun$apply$mcZ$sp$1.apply$mcV$sp(MicroBatchExecution.scala:183) at org.apache.spark.sql.execution.streaming.MicroBatchExecution$$anonfun$runActivatedStream$1$$anonfun$apply$mcZ$sp$1.apply(MicroBatchExecution.scala:166) at org.apache.spark.sql.execution.streaming.MicroBatchExecution$$anonfun$runActivatedStream$1$$anonfun$apply$mcZ$sp$1.apply(MicroBatchExecution.scala:166) at org.apache.spark.sql.execution.streaming.ProgressReporter$class.reportTimeTaken(ProgressReporter.scala:351) at org.apache.spark.sql.execution.streaming.StreamExecution.reportTimeTaken(StreamExecution.scala:58) at org.apache.spark.sql.execution.streaming.MicroBatchExecution$$anonfun$runActivatedStream$1.apply$mcZ$sp(MicroBatchExecution.scala:166) at org.apache.spark.sql.execution.streaming.ProcessingTimeExecutor.execute(TriggerExecutor.scala:56) at org.apache.spark.sql.execution.streaming.MicroBatchExecution.runActivatedStream(MicroBatchExecution.scala:160) at org.apache.spark.sql.execution.streaming.StreamExecution.org$apache$spark$sql$execution$streaming$StreamExecution$$runStream(StreamExecution.scala:279) at org.apache.spark.sql.execution.streaming.StreamExecution$$anon$1.run(StreamExecution.scala:189) 2020-01-14 16:55:00 INFO DAGScheduler:54 - Asked to cancel job group 8913a720-df1b-427b-abfc-18cae73935dc 2020-01-14 16:55:00 INFO YarnScheduler:54 - Cancelling stage 3 2020-01-14 16:55:00 INFO YarnScheduler:54 - Killing all running tasks in stage 3: Stage cancelled 2020-01-14 16:55:00 INFO YarnScheduler:54 - Stage 3 was cancelled 2020-01-14 16:55:00 INFO DAGScheduler:54 - ResultStage 3 (start at NativeMethodAccessorImpl.java:0) failed in 86450.992 s due to Job 1 cancelled part of cancelled job group 8913a720-df1b-427b-abfc-18cae73935dc 2020-01-14 16:55:00 INFO DAGScheduler:54 - Job 1 failed: start at NativeMethodAccessorImpl.java:0, took 86450.997796 s Exception in thread "launch-binlog-socket-server-in-spark-job" org.apache.spark.SparkException: Job 1 cancelled part of cancelled job group 8913a720-df1b-427b-abfc-18cae73935dc at org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1887) at org.apache.spark.scheduler.DAGScheduler.handleJobCancellation(DAGScheduler.scala:1822) at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleJobGroupCancelled$1.apply$mcVI$sp(DAGScheduler.scala:906) at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleJobGroupCancelled$1.apply(DAGScheduler.scala:906) at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleJobGroupCancelled$1.apply(DAGScheduler.scala:906) at scala.collection.mutable.HashSet.foreach(HashSet.scala:78) at org.apache.spark.scheduler.DAGScheduler.handleJobGroupCancelled(DAGScheduler.scala:906) at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:2077) at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:2057) at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:2046) at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:49) at org.apache.spark.scheduler.DAGScheduler.runJob(DAGScheduler.scala:737) at org.apache.spark.SparkContext.runJob(SparkContext.scala:2061) at org.apache.spark.SparkContext.runJob(SparkContext.scala:2082) at org.apache.spark.SparkContext.runJob(SparkContext.scala:2101) at org.apache.spark.SparkContext.runJob(SparkContext.scala:2126) at org.apache.spark.rdd.RDD$$anonfun$collect$1.apply(RDD.scala:945) at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151) at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:112) at org.apache.spark.rdd.RDD.withScope(RDD.scala:363) at org.apache.spark.rdd.RDD.collect(RDD.scala:944) at org.apache.spark.sql.mlsql.sources.MLSQLBinLogDataSource.org$apache$spark$sql$mlsql$sources$MLSQLBinLogDataSource$$launchBinlogServer$1(MLSQLBinLogDataSource.scala:207) at org.apache.spark.sql.mlsql.sources.MLSQLBinLogDataSource$$anon$4.run(MLSQLBinLogDataSource.scala:214) Traceback (most recent call last): File "/data/sync/ods_users_df/stream2hdfs.py", line 36, in query.awaitTermination() File "/opt/spark-2.4.0-bin-hadoop2.7/python/lib/pyspark.zip/pyspark/sql/streaming.py", line 103, in awaitTermination File "/opt/spark-2.4.0-bin-hadoop2.7/python/lib/py4j-0.10.7-src.zip/py4j/java_gateway.py", line 1257, in call File "/opt/spark-2.4.0-bin-hadoop2.7/python/lib/pyspark.zip/pyspark/sql/utils.py", line 75, in deco

allwefantasy commented 4 years ago

I guess the executor who runs binlog server may be killed by yarn. Try to increase the spark executor overhead memory.

zhengqiangtan commented 4 years ago

image

zhengqiangtan commented 4 years ago

2020-01-15 12:09:26 ERROR MLSQLBinLogDataSource:91 - next on empty iterator 2020-01-16 01:31:49 ERROR YarnScheduler:70 - Lost executor 2 on zmbd-pm07: Executor heartbeat timed out after 155288 ms 2020-01-16 01:31:49 ERROR YarnScheduler:70 - Lost executor 5 on zmbd-pm13: Executor heartbeat timed out after 152209 ms 2020-01-16 01:31:49 ERROR YarnScheduler:70 - Lost executor 8 on zmbd-pm11: Executor heartbeat timed out after 152101 ms 2020-01-16 01:31:49 ERROR YarnScheduler:70 - Lost executor 7 on zmbd-pm07: Executor heartbeat timed out after 150332 ms 2020-01-16 01:31:49 ERROR YarnScheduler:70 - Lost executor 1 on zmbd-pm07: Executor heartbeat timed out after 156505 ms 2020-01-16 01:31:49 ERROR YarnScheduler:70 - Lost executor 4 on zmbd-pm13: Executor heartbeat timed out after 147583 ms 2020-01-16 01:31:49 ERROR YarnScheduler:70 - Lost executor 6 on zmbd-pm16: Executor heartbeat timed out after 150414 ms 2020-01-16 01:31:49 ERROR YarnScheduler:70 - Lost executor 3 on zmbd-pm07: Executor heartbeat timed out after 150221 ms 2020-01-16 01:33:49 ERROR TransportChannelHandler:144 - Connection to /172.20.1.10:36940 has been quiet for 120000 ms while there are outstanding requests. Assuming connection is dead; please adjust spark.network.timeout if this is wrong. 2020-01-16 01:33:50 ERROR TransportResponseHandler:144 - Still have 1 requests outstanding when connection from /172.20.1.10:36940 is closed 2020-01-16 01:33:50 ERROR YarnSchedulerBackend$YarnSchedulerEndpoint:91 - Sending KillExecutors(List(2)) to AM was unsuccessful 2020-01-16 01:33:50 ERROR Utils:91 - Uncaught exception in thread kill-executor-thread 2020-01-16 01:33:50 ERROR TransportClient:331 - Failed to send RPC RPC 8516244618252785373 to /172.20.1.10:36940: java.nio.channels.ClosedChannelException 2020-01-16 01:33:50 ERROR YarnSchedulerBackend$YarnSchedulerEndpoint:91 - Sending KillExecutors(List(5)) to AM was unsuccessful 2020-01-16 01:33:50 ERROR Utils:91 - Uncaught exception in thread kill-executor-thread 2020-01-16 01:33:50 ERROR TransportClient:331 - Failed to send RPC RPC 5604667397559748996 to /172.20.1.10:36940: java.nio.channels.ClosedChannelException 2020-01-16 01:33:50 ERROR YarnSchedulerBackend$YarnSchedulerEndpoint:91 - Sending KillExecutors(List(8)) to AM was unsuccessful 2020-01-16 01:33:50 ERROR Utils:91 - Uncaught exception in thread kill-executor-thread 2020-01-16 01:33:50 ERROR TransportClient:331 - Failed to send RPC RPC 4795019310054221062 to /172.20.1.10:36940: java.nio.channels.ClosedChannelException 2020-01-16 01:33:50 ERROR YarnSchedulerBackend$YarnSchedulerEndpoint:91 - Sending KillExecutors(List(7)) to AM was unsuccessful 2020-01-16 01:33:50 ERROR Utils:91 - Uncaught exception in thread kill-executor-thread 2020-01-16 01:33:50 ERROR TransportClient:331 - Failed to send RPC RPC 5234086875584424262 to /172.20.1.10:36940: java.nio.channels.ClosedChannelException 2020-01-16 01:33:50 ERROR YarnSchedulerBackend$YarnSchedulerEndpoint:91 - Sending KillExecutors(List(1)) to AM was unsuccessful 2020-01-16 01:33:50 ERROR Utils:91 - Uncaught exception in thread kill-executor-thread 2020-01-16 01:33:50 ERROR TransportClient:331 - Failed to send RPC RPC 5740243668631548947 to /172.20.1.10:36940: java.nio.channels.ClosedChannelException 2020-01-16 01:33:50 ERROR YarnSchedulerBackend$YarnSchedulerEndpoint:91 - Sending KillExecutors(List(4)) to AM was unsuccessful 2020-01-16 01:33:50 ERROR Utils:91 - Uncaught exception in thread kill-executor-thread 2020-01-16 01:33:50 ERROR TransportClient:331 - Failed to send RPC RPC 7509468406493835515 to /172.20.1.10:36940: java.nio.channels.ClosedChannelException 2020-01-16 01:33:50 ERROR YarnSchedulerBackend$YarnSchedulerEndpoint:91 - Sending KillExecutors(List(6)) to AM was unsuccessful 2020-01-16 01:33:50 ERROR Utils:91 - Uncaught exception in thread kill-executor-thread 2020-01-16 01:33:50 ERROR TransportClient:331 - Failed to send RPC RPC 4790295982241523280 to /172.20.1.10:36940: java.nio.channels.ClosedChannelException 2020-01-16 01:33:50 ERROR YarnSchedulerBackend$YarnSchedulerEndpoint:91 - Sending KillExecutors(List(3)) to AM was unsuccessful 2020-01-16 01:33:50 ERROR Utils:91 - Uncaught exception in thread kill-executor-thread 2020-01-16 01:39:06 ERROR AsyncEventQueue:91 - Listener EventLoggingListener threw an exception 2020-01-16 01:39:06 ERROR AsyncEventQueue:91 - Listener EventLoggingListener threw an exception 2020-01-16 01:39:06 ERROR AsyncEventQueue:91 - Listener EventLoggingListener threw an exception 2020-01-16 01:39:06 ERROR AsyncEventQueue:91 - Listener EventLoggingListener threw an exception 2020-01-16 01:39:06 ERROR AsyncEventQueue:91 - Listener EventLoggingListener threw an exception 2020-01-16 01:39:06 ERROR AsyncEventQueue:91 - Listener EventLoggingListener threw an exception 2020-01-16 01:39:06 ERROR AsyncEventQueue:91 - Listener EventLoggingListener threw an exception 2020-01-16 01:39:06 ERROR AsyncEventQueue:91 - Listener EventLoggingListener threw an exception 2020-01-16 01:39:06 ERROR AsyncEventQueue:91 - Listener EventLoggingListener threw an exception 2020-01-16 01:39:06 ERROR AsyncEventQueue:91 - Listener EventLoggingListener threw an exception 2020-01-16 01:39:06 ERROR AsyncEventQueue:91 - Listener EventLoggingListener threw an exception 2020-01-16 01:39:34 ERROR AsyncEventQueue:91 - Listener EventLoggingListener threw an exception 2020-01-16 01:39:34 ERROR AsyncEventQueue:91 - Listener EventLoggingListener threw an exception 2020-01-16 01:39:38 ERROR AsyncEventQueue:91 - Listener EventLoggingListener threw an exception 2020-01-16 01:39:38 ERROR AsyncEventQueue:91 - Listener EventLoggingListener threw an exception 2020-01-16 01:39:41 ERROR AsyncEventQueue:91 - Listener EventLoggingListener threw an exception 2020-01-16 01:39:41 ERROR AsyncEventQueue:91 - Listener EventLoggingListener threw an exception 2020-01-16 01:39:41 ERROR AsyncEventQueue:91 - Listener EventLoggingListener threw an exception 2020-01-16 01:39:41 ERROR AsyncEventQueue:91 - Listener EventLoggingListener threw an exception 2020-01-16 01:39:42 ERROR AsyncEventQueue:91 - Listener EventLoggingListener threw an exception 2020-01-16 01:39:42 ERROR AsyncEventQueue:91 - Listener EventLoggingListener threw an exception 2020-01-16 01:39:42 ERROR AsyncEventQueue:91 - Listener EventLoggingListener threw an exception 2020-01-16 01:39:42 ERROR AsyncEventQueue:91 - Listener EventLoggingListener threw an exception 2020-01-16 01:39:44 ERROR AsyncEventQueue:91 - Listener EventLoggingListener threw an exception 2020-01-16 01:39:44 ERROR AsyncEventQueue:91 - Listener EventLoggingListener threw an exception 2020-01-16 01:39:44 ERROR AsyncEventQueue:91 - Listener EventLoggingListener threw an exception 2020-01-16 01:39:44 ERROR AsyncEventQueue:91 - Listener EventLoggingListener threw an exception 2020-01-16 01:45:34 ERROR MicroBatchExecution:91 - Query [id = c4492e8a-a982-4774-a19b-6aa5a2ac9c9b, runId = 5aa1e3bd-2f26-4e96-bd26-c88d0efa9176] terminated with error 2020-01-16 01:45:34 ERROR AsyncEventQueue:91 - Listener EventLoggingListener threw an exception 2020-01-16 01:45:34 ERROR AsyncEventQueue:91 - Listener EventLoggingListener threw an exception 2020-01-16 01:45:34 ERROR AsyncEventQueue:91 - Listener EventLoggingListener threw an exception 2020-01-16 01:45:34 ERROR AsyncEventQueue:91 - Listener EventLoggingListener threw an exception 2020-01-16 01:45:34 ERROR TransportClient:331 - Failed to send RPC RPC 6764822998199901153 to /172.20.1.10:36940: java.nio.channels.ClosedChannelException 2020-01-16 01:45:34 ERROR YarnSchedulerBackend$YarnSchedulerEndpoint:91 - Sending RequestExecutors(0,0,Map(),Set()) to AM was unsuccessful 2020-01-16 01:45:34 ERROR Utils:91 - Uncaught exception in thread Thread-1 2020-01-16 01:45:34 ERROR Utils:91 - Uncaught exception in thread Thread-1