apache / linkis

Apache Linkis builds a computation middleware layer to facilitate connection, governance and orchestration between the upper applications and the underlying data engines.
https://linkis.apache.org/
Apache License 2.0
3.3k stars 1.17k forks source link

cannot assign instance of scala.collection.immutable.List$SerializationProxy to field #3303

Closed 2018yinjian closed 2 years ago

2018yinjian commented 2 years ago

Before asking

Your environment

Describe your questions

Q1. ... 使用scala和pyspark读取kudu表报错,读取csv、parquet、hive表都能正常读取 image 说明:spark-shell能够读取kudu表数据 image

Eureka service list

image

Some logs info or acctch file

2022-09-09 10:10:32.219 INFO [Linkis-Default-Scheduler-Thread-3] com.netflix.loadbalancer.DynamicServerListLoadBalancer 150 restOfInit - DynamicServerListLoadBalancer for client linkis-cg-entrance initialized: DynamicServerListLoadBalancer:{NFLoadBalancer:name=linkis-cg-entrance,current list of Servers=[emr-header-3.cluster-254539:9104],Load balancer stats=Zone stats: {defaultzone=[Zone:defaultzone; Instance count:1; Active connections count: 0; Circuit breaker tripped count: 0; Active connections per server: 0.0;] },Server stats: [[Server:emr-header-3.cluster-254539:9104; Zone:defaultZone; Total Requests:0; Successive connection failure:0; Total blackout seconds:0; Last connection made:Thu Jan 01 08:00:00 CST 1970; First connection made: Thu Jan 01 08:00:00 CST 1970; Active Connections:0; total failure count in last (1000) msecs:0; average resp time:0.0; 90 percentile resp time:0.0; 95 percentile resp time:0.0; min resp time:0.0; max resp time:0.0; stddev resp time:0.0] ]}ServerList:org.springframework.cloud.netflix.ribbon.eureka.DomainExtractingServerList@6036102a 2022-09-09 10:10:32.231 INFO [Linkis-Default-Scheduler-Thread-3] org.apache.linkis.engineconn.computation.executor.upstream.service.ECTaskEntranceMonitorService 41 info - ignored EngineConnSyncEvent org.apache.linkis.engineconn.acessible.executor.listener.event.TaskLogUpdateEvent 2022-09-09 10:10:32.234 INFO [Linkis-Default-Scheduler-Thread-3] org.apache.linkis.engineplugin.spark.executor.SparkPythonExecutor 46 info - Your code will be submitted in overall mode. java.util.NoSuchElementException: next on empty iterator at scala.collection.Iterator$$anon$2.next(Iterator.scala:39) ~[ess-shuffle-manager-1.0.0-shaded.jar:?] at scala.collection.Iterator$$anon$2.next(Iterator.scala:37) ~[ess-shuffle-manager-1.0.0-shaded.jar:?] at scala.collection.IndexedSeqLike$Elements.next(IndexedSeqLike.scala:63) ~[ess-shuffle-manager-1.0.0-shaded.jar:?] at scala.collection.IterableLike$class.head(IterableLike.scala:107) ~[ess-shuffle-manager-1.0.0-shaded.jar:?] at scala.collection.mutable.ArrayBuffer.scala$collection$IndexedSeqOptimized$$super$head(ArrayBuffer.scala:48) ~[ess-shuffle-manager-1.0.0-shaded.jar:?] at scala.collection.IndexedSeqOptimized$class.head(IndexedSeqOptimized.scala:126) ~[ess-shuffle-manager-1.0.0-shaded.jar:?] at scala.collection.mutable.ArrayBuffer.head(ArrayBuffer.scala:48) ~[ess-shuffle-manager-1.0.0-shaded.jar:?] at scala.collection.TraversableLike$class.last(TraversableLike.scala:431) ~[ess-shuffle-manager-1.0.0-shaded.jar:?] at scala.collection.mutable.ArrayBuffer.scala$collection$IndexedSeqOptimized$$super$last(ArrayBuffer.scala:48) ~[ess-shuffle-manager-1.0.0-shaded.jar:?] at scala.collection.IndexedSeqOptimized$class.last(IndexedSeqOptimized.scala:132) ~[ess-shuffle-manager-1.0.0-shaded.jar:?] at scala.collection.mutable.ArrayBuffer.last(ArrayBuffer.scala:48) ~[ess-shuffle-manager-1.0.0-shaded.jar:?] at org.apache.linkis.governance.common.paser.PythonCodeParser$$anonfun$parse$3.apply(CodeParser.scala:136) ~[linkis-computation-governance-common-1.1.1.jar:1.1.1] at org.apache.linkis.governance.common.paser.PythonCodeParser$$anonfun$parse$3.apply(CodeParser.scala:127) ~[linkis-computation-governance-common-1.1.1.jar:1.1.1] at scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33) ~[ess-shuffle-manager-1.0.0-shaded.jar:?] at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:186) ~[ess-shuffle-manager-1.0.0-shaded.jar:?] at org.apache.linkis.governance.common.paser.PythonCodeParser.parse(CodeParser.scala:127) ~[linkis-computation-governance-common-1.1.1.jar:1.1.1] at org.apache.linkis.engineconn.computation.executor.execute.ComputationExecutor$$anonfun$toExecuteTask$2$$anonfun$1$$anonfun$apply$7.apply(ComputationExecutor.scala:170) ~[linkis-computation-engineconn-1.1.1.jar:1.1.1] at org.apache.linkis.engineconn.computation.executor.execute.ComputationExecutor$$anonfun$toExecuteTask$2$$anonfun$1$$anonfun$apply$7.apply(ComputationExecutor.scala:170) ~[linkis-computation-engineconn-1.1.1.jar:1.1.1] at scala.Option.map(Option.scala:146) ~[ess-shuffle-manager-1.0.0-shaded.jar:?] at org.apache.linkis.engineconn.computation.executor.execute.ComputationExecutor$$anonfun$toExecuteTask$2$$anonfun$1.apply(ComputationExecutor.scala:170) ~[linkis-computation-engineconn-1.1.1.jar:1.1.1] at org.apache.linkis.engineconn.computation.executor.execute.ComputationExecutor$$anonfun$toExecuteTask$2$$anonfun$1.apply(ComputationExecutor.scala:170) ~[linkis-computation-engineconn-1.1.1.jar:1.1.1] at org.apache.linkis.common.utils.Utils$.tryCatch(Utils.scala:40) ~[linkis-common-1.1.1.jar:1.1.1] at org.apache.linkis.engineconn.computation.executor.execute.ComputationExecutor$$anonfun$toExecuteTask$2.apply(ComputationExecutor.scala:170) ~[linkis-computation-engineconn-1.1.1.jar:1.1.1] at org.apache.linkis.engineconn.computation.executor.execute.ComputationExecutor$$anonfun$toExecuteTask$2.apply(ComputationExecutor.scala:151) ~[linkis-computation-engineconn-1.1.1.jar:1.1.1] at org.apache.linkis.common.utils.Utils$.tryFinally(Utils.scala:61) ~[linkis-common-1.1.1.jar:1.1.1] at org.apache.linkis.engineconn.computation.executor.execute.ComputationExecutor.toExecuteTask(ComputationExecutor.scala:228) ~[linkis-computation-engineconn-1.1.1.jar:1.1.1] at org.apache.linkis.engineconn.computation.executor.execute.ComputationExecutor$$anonfun$3.apply(ComputationExecutor.scala:243) ~[linkis-computation-engineconn-1.1.1.jar:1.1.1] at org.apache.linkis.engineconn.computation.executor.execute.ComputationExecutor$$anonfun$3.apply(ComputationExecutor.scala:243) ~[linkis-computation-engineconn-1.1.1.jar:1.1.1] at org.apache.linkis.common.utils.Utils$.tryFinally(Utils.scala:61) ~[linkis-common-1.1.1.jar:1.1.1] at org.apache.linkis.engineconn.acessible.executor.entity.AccessibleExecutor.ensureIdle(AccessibleExecutor.scala:55) ~[linkis-accessible-executor-1.1.1.jar:1.1.1] at org.apache.linkis.engineconn.acessible.executor.entity.AccessibleExecutor.ensureIdle(AccessibleExecutor.scala:49) ~[linkis-accessible-executor-1.1.1.jar:1.1.1] at org.apache.linkis.engineconn.computation.executor.execute.ComputationExecutor.ensureOp(ComputationExecutor.scala:135) ~[linkis-computation-engineconn-1.1.1.jar:1.1.1] at org.apache.linkis.engineconn.computation.executor.execute.ComputationExecutor.execute(ComputationExecutor.scala:242) ~[linkis-computation-engineconn-1.1.1.jar:1.1.1] at org.apache.linkis.engineconn.computation.executor.service.TaskExecutionServiceImpl.org$apache$linkis$engineconn$computation$executor$service$TaskExecutionServiceImpl$$executeTask(TaskExecutionServiceImpl.scala:288) ~[linkis-computation-engineconn-1.1.1.jar:1.1.1] at org.apache.linkis.engineconn.computation.executor.service.TaskExecutionServiceImpl$$anon$2$$anonfun$run$2.apply$mcV$sp(TaskExecutionServiceImpl.scala:221) ~[linkis-computation-engineconn-1.1.1.jar:1.1.1] at org.apache.linkis.engineconn.computation.executor.service.TaskExecutionServiceImpl$$anon$2$$anonfun$run$2.apply(TaskExecutionServiceImpl.scala:219) ~[linkis-computation-engineconn-1.1.1.jar:1.1.1] at org.apache.linkis.engineconn.computation.executor.service.TaskExecutionServiceImpl$$anon$2$$anonfun$run$2.apply(TaskExecutionServiceImpl.scala:219) ~[linkis-computation-engineconn-1.1.1.jar:1.1.1] at org.apache.linkis.common.utils.Utils$.tryCatch(Utils.scala:40) ~[linkis-common-1.1.1.jar:1.1.1] at org.apache.linkis.common.utils.Utils$.tryAndWarn(Utils.scala:69) ~[linkis-common-1.1.1.jar:1.1.1] at org.apache.linkis.engineconn.computation.executor.service.TaskExecutionServiceImpl$$anon$2.run(TaskExecutionServiceImpl.scala:219) ~[linkis-computation-engineconn-1.1.1.jar:1.1.1] at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) ~[?:1.8.0_252] at java.util.concurrent.FutureTask.run(FutureTask.java:266) ~[?:1.8.0_252] at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180) ~[?:1.8.0_252] at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293) ~[?:1.8.0_252] at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) ~[?:1.8.0_252] at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) ~[?:1.8.0_252] at java.lang.Thread.run(Thread.java:748) ~[?:1.8.0_252]

2022-09-09 10:10:32.253 INFO [Linkis-Default-Scheduler-Thread-3] org.apache.linkis.engineconn.computation.executor.upstream.service.ECTaskEntranceMonitorService 41 info - ignored EngineConnSyncEvent org.apache.linkis.engineconn.acessible.executor.listener.event.TaskLogUpdateEvent 2022-09-09 10:10:32.259 INFO [Linkis-Default-Scheduler-Thread-3] org.apache.linkis.engineplugin.spark.cs.CSSparkPreExecutionHook 41 info - Start to call CSSparkPreExecutionHook,contextID is null, nodeNameStr is null 2022-09-09 10:10:32.261 INFO [Linkis-Default-Scheduler-Thread-3] org.apache.linkis.engineplugin.spark.cs.CSSparkPreExecutionHook 41 info - Finished to call CSSparkPreExecutionHook,contextID is null, nodeNameStr is null 2022-09-09 10:10:32.263 INFO [Linkis-Default-Scheduler-Thread-3] org.apache.linkis.engineplugin.spark.executor.SparkPythonExecutor 41 info - Ready to run code with kind pyspark. 2022-09-09 10:10:32.264 INFO [Linkis-Default-Scheduler-Thread-3] org.apache.linkis.engineplugin.spark.executor.SparkPythonExecutor 41 info - Set jobGroup to linkis-spark-mix-code-1 2022-09-09 10:10:32.269 INFO [Linkis-Default-Scheduler-Thread-3] org.apache.linkis.engineplugin.spark.executor.SparkPythonExecutor 136 org$apache$linkis$engineplugin$spark$executor$SparkPythonExecutor$$initGateway - spark.python.version => python2 2022-09-09 10:10:32.295 INFO [Linkis-Default-Scheduler-Thread-3] org.apache.linkis.engineplugin.spark.executor.SparkPythonExecutor 41 info - Pyspark process file path is: jar:file:/appcom/tmp/engineConnPublickDir/7d0abc88-59c5-415e-9f03-0dd3df63f869/v000017/lib/linkis-engineplugin-spark-1.1.1.jar!/python/mix_pyspark.py 2022-09-09 10:10:32.297 INFO [Linkis-Default-Scheduler-Thread-3] org.apache.linkis.engineplugin.spark.executor.SparkPythonExecutor 156 org$apache$linkis$engineplugin$spark$executor$SparkPythonExecutor$$initGateway - output spark files 2022-09-09 10:10:32.298 INFO [Linkis-Default-Scheduler-Thread-3] org.apache.linkis.engineplugin.spark.executor.SparkPythonExecutor 162 org$apache$linkis$engineplugin$spark$executor$SparkPythonExecutor$$initGateway - spark.submit.pyFiles => 2022-09-09 10:10:33.077 INFO [Linkis-Default-Scheduler-Thread-3] org.apache.linkis.engineplugin.spark.utils.EngineUtils$ 41 info - spark version is 2.4.7 2022-09-09 10:10:33.078 INFO [Linkis-Default-Scheduler-Thread-3] org.apache.linkis.engineplugin.spark.executor.SparkPythonExecutor 41 info - spark.pyspark.python is null 2022-09-09 10:10:33.079 INFO [Linkis-Default-Scheduler-Thread-3] org.apache.linkis.engineplugin.spark.executor.SparkPythonExecutor 41 info - pyspark builder command:/usr/bin/python /appcom/tmp/root/20220909/spark/4e336dd3-6464-48c0-8644-3364aaae5660/tmp/2138452030582260578.py 15949 247 jlRk61RWSph8XyUUmmWKvejWZgPqzzYhNPsFoUBS6FMIXlK2ODlO2cSdfN2cimq2noXlZFBXFFURjqidw3UHNVZTxZUdPU5GEhhiQUD5grBhtmxM3VzSMtcBTmPK2I3SSQlpp4UTg8z9WykLhBpAy5gCxRj6uFLPym0ljkY735mxtrOlRkBL7RgzwAS6Zoi4OEfX5F6CSqfBzQ3gTUWitRYbEp6MGgBZYcFTCqlylboZtO4tSSkWjT5T1G8Nq8SB /usr/lib/spark-current/python:/usr/lib/spark-current/python/lib/pyspark.zip:/usr/lib/spark-current/python/lib/py4j-0.10.7-src.zip:/opt/apps/ecm/service/spark/2.4.7-hadoop3.2-1.1.1/package/spark-2.4.7-hadoop3.2-1.1.1/jars/spark-core_2.11-2.4.7.jar 2022-09-09 10:10:33.145 INFO [Linkis-Default-Scheduler-Thread-2] org.apache.linkis.engineplugin.spark.executor.SparkPythonExecutor 41 info - Begin to get actual used resources! 2022-09-09 10:10:33.146 INFO [Linkis-Default-Scheduler-Thread-2] org.apache.linkis.engineplugin.spark.executor.SparkPythonExecutor 41 info - Current actual used resources is driverMem:2147483648,driverCores:2,executorMem:4294967296,executorCores:2,queue:default 2022-09-09 10:10:33.219 INFO [PollingServerListUpdater-0] com.netflix.config.ChainedDynamicProperty 115 checkAndFlip - Flipping property: linkis-cg-entrance.ribbon.ActiveConnectionsLimit to use NEXT property: niws.loadbalancer.availabilityFilteringRule.activeConnectionsLimit = 2147483647 2022-09-09 10:10:33.666 INFO [Thread-86] org.apache.linkis.engineplugin.spark.executor.SparkPythonExecutor 41 info - Pyspark process has been initialized.pid is 26115 2022-09-09 10:10:33.711 INFO [Linkis-Default-Scheduler-Thread-3] org.apache.linkis.engineconn.computation.executor.upstream.service.ECTaskEntranceMonitorService 41 info - ignored EngineConnSyncEvent org.apache.linkis.engineconn.acessible.executor.listener.event.TaskLogUpdateEvent 2022-09-09 10:10:33.712 INFO [Thread-86] org.apache.linkis.engineplugin.spark.executor.SparkPythonExecutor 41 info - Prepare to deal python code, code: df = spark.read.format("org.apache.kudu.spark.kudu").option("kudu.table", "impala::dim.dim_src_tab_info").option("kudu.master", "192.168.217.248").load() 2022-09-09 10:10:33.725 INFO [Thread-86] org.apache.spark.sql.internal.SharedState 54 logInfo - loading hive config file: file:/etc/ecm/spark-conf-2.4.7-hadoop3.2-1.1.1/hive-site.xml 2022-09-09 10:10:33.739 INFO [Thread-86] org.apache.spark.sql.internal.SharedState 54 logInfo - spark.sql.warehouse.dir is not set, but hive.metastore.warehouse.dir is set. Setting spark.sql.warehouse.dir to the value of hive.metastore.warehouse.dir ('/user/hive/warehouse'). 2022-09-09 10:10:33.740 INFO [Thread-86] org.apache.spark.sql.internal.SharedState 54 logInfo - Warehouse path is '/user/hive/warehouse'. 2022-09-09 10:10:33.763 INFO [Thread-86] org.apache.spark.ui.JettyUtils 54 logInfo - Adding filter org.apache.hadoop.yarn.server.webproxy.amfilter.AmIpFilter to /SQL. 2022-09-09 10:10:33.764 INFO [Thread-86] org.apache.spark.ui.JettyUtils 54 logInfo - Adding filter org.apache.hadoop.yarn.server.webproxy.amfilter.AmIpFilter to /SQL/json. 2022-09-09 10:10:33.765 INFO [Thread-86] org.apache.spark.ui.JettyUtils 54 logInfo - Adding filter org.apache.hadoop.yarn.server.webproxy.amfilter.AmIpFilter to /SQL/execution. 2022-09-09 10:10:33.766 INFO [Thread-86] org.apache.spark.ui.JettyUtils 54 logInfo - Adding filter org.apache.hadoop.yarn.server.webproxy.amfilter.AmIpFilter to /SQL/execution/json. 2022-09-09 10:10:33.767 INFO [Thread-86] org.apache.spark.ui.JettyUtils 54 logInfo - Adding filter org.apache.hadoop.yarn.server.webproxy.amfilter.AmIpFilter to /static/sql. 2022-09-09 10:10:33.774 INFO [Thread-86] org.apache.spark.ui.JettyUtils 54 logInfo - Adding filter org.apache.hadoop.yarn.server.webproxy.amfilter.AmIpFilter to /streamingsql. 2022-09-09 10:10:33.775 INFO [Thread-86] org.apache.spark.ui.JettyUtils 54 logInfo - Adding filter org.apache.hadoop.yarn.server.webproxy.amfilter.AmIpFilter to /streamingsql/json. 2022-09-09 10:10:33.775 INFO [Thread-86] org.apache.spark.ui.JettyUtils 54 logInfo - Adding filter org.apache.hadoop.yarn.server.webproxy.amfilter.AmIpFilter to /streamingsql/statistics. 2022-09-09 10:10:33.776 INFO [Thread-86] org.apache.spark.ui.JettyUtils 54 logInfo - Adding filter org.apache.hadoop.yarn.server.webproxy.amfilter.AmIpFilter to /streamingsql/statistics/json. 2022-09-09 10:10:33.778 INFO [Thread-86] org.apache.spark.ui.JettyUtils 54 logInfo - Adding filter org.apache.hadoop.yarn.server.webproxy.amfilter.AmIpFilter to /static/sql. 2022-09-09 10:10:33.779 INFO [Thread-86] org.apache.spark.ui.JettyUtils 54 logInfo - Adding filter org.apache.hadoop.yarn.server.webproxy.amfilter.AmIpFilter to /streamingsql/query/kill. 2022-09-09 10:10:33.781 INFO [Thread-86] org.apache.spark.ui.JettyUtils 54 logInfo - Adding filter org.apache.hadoop.yarn.server.webproxy.amfilter.AmIpFilter to /streamingsql/query/restart. 2022-09-09 10:10:34.385 INFO [Thread-86] org.apache.spark.sql.execution.streaming.state.StateStoreCoordinatorRef 54 logInfo - Registered StateStoreCoordinator endpoint 2022-09-09 10:10:34.479 INFO [Thread-86] org.elasticsearch.hadoop.util.Version 133 logVersion - Elasticsearch Hadoop v7.10.2 [f53f4b7b2b] 2022-09-09 10:10:34.867 INFO [ECTask-upstream-connection-monitor-2] org.apache.linkis.engineconn.computation.executor.upstream.ECTaskEntranceMonitor 41 info - requesting connection info: [1] 2022-09-09 10:10:34.873 INFO [ECTask-upstream-connection-monitor-2] org.apache.linkis.engineconn.computation.executor.upstream.ECTaskEntranceMonitor 41 info - connection-info result: emr-header-3.cluster-254539:9104 : true 2022-09-09 10:10:36.948 INFO [Thread-86] org.apache.spark.sql.catalyst.expressions.codegen.CodeGenerator 54 logInfo - Code generated in 284.239207 ms 2022-09-09 10:10:36.968 INFO [Thread-86] org.apache.spark.sql.execution.adaptive.ResultQueryStage 163 prepareExecuteStage - add exchangecoordinator though 2022-09-09 10:10:37.073 INFO [Thread-86] org.apache.spark.sql.catalyst.expressions.codegen.CodeGenerator 54 logInfo - Code generated in 29.045358 ms 2022-09-09 10:10:37.158 INFO [Thread-86] org.apache.spark.SparkContext 54 logInfo - Starting job: showString at NativeMethodAccessorImpl.java:0 2022-09-09 10:10:37.184 INFO [dag-scheduler-event-loop] org.apache.spark.scheduler.DAGScheduler 54 logInfo - Got job 0 (showString at NativeMethodAccessorImpl.java:0) with 1 output partitions 2022-09-09 10:10:37.185 INFO [dag-scheduler-event-loop] org.apache.spark.scheduler.DAGScheduler 54 logInfo - Final stage: ResultStage 0 (showString at NativeMethodAccessorImpl.java:0) 2022-09-09 10:10:37.186 INFO [dag-scheduler-event-loop] org.apache.spark.scheduler.DAGScheduler 54 logInfo - Parents of final stage: List() 2022-09-09 10:10:37.189 INFO [dag-scheduler-event-loop] org.apache.spark.scheduler.DAGScheduler 54 logInfo - Missing parents: List() 2022-09-09 10:10:37.198 INFO [dag-scheduler-event-loop] org.apache.spark.scheduler.DAGScheduler 54 logInfo - Submitting ResultStage 0 (MapPartitionsRDD[4] at showString at NativeMethodAccessorImpl.java:0), which has no missing parents 2022-09-09 10:10:37.296 INFO [dag-scheduler-event-loop] org.apache.spark.storage.memory.MemoryStore 54 logInfo - Block broadcast_0 stored as values in memory (estimated size 28.1 KB, free 1048.8 MB) 2022-09-09 10:10:37.337 INFO [dag-scheduler-event-loop] org.apache.spark.storage.memory.MemoryStore 54 logInfo - Block broadcast_0_piece0 stored as bytes in memory (estimated size 6.2 KB, free 1048.8 MB) 2022-09-09 10:10:37.340 INFO [dispatcher-event-loop-1] org.apache.spark.storage.BlockManagerInfo 54 logInfo - Added broadcast_0_piece0 in memory on 192.168.217.247:18641 (size: 6.2 KB, free: 1048.8 MB) 2022-09-09 10:10:37.344 INFO [dag-scheduler-event-loop] org.apache.spark.SparkContext 54 logInfo - Created broadcast 0 from broadcast at DAGScheduler.scala:1184 2022-09-09 10:10:37.356 INFO [dag-scheduler-event-loop] org.apache.spark.scheduler.DAGScheduler 54 logInfo - Submitting 1 missing tasks from ResultStage 0 (MapPartitionsRDD[4] at showString at NativeMethodAccessorImpl.java:0) (first 15 tasks are for partitions Vector(0)) 2022-09-09 10:10:37.357 INFO [dag-scheduler-event-loop] org.apache.spark.scheduler.cluster.YarnScheduler 54 logInfo - Adding task set 0.0 with 1 tasks 2022-09-09 10:10:37.403 INFO [dag-scheduler-event-loop] org.apache.spark.scheduler.FairSchedulableBuilder 54 logInfo - Added task set TaskSet_0.0 tasks to pool default 2022-09-09 10:10:37.421 INFO [dispatcher-event-loop-10] org.apache.spark.scheduler.TaskSetManager 54 logInfo - Starting task 0.0 in stage 0.0 (TID 0, emr-worker-6.cluster-254539, executor 1, partition 0, NODE_LOCAL, 8601 bytes) 2022-09-09 10:10:37.788 INFO [dispatcher-event-loop-15] org.apache.spark.storage.BlockManagerInfo 54 logInfo - Added broadcast_0piece0 in memory on emr-worker-6.cluster-254539:43247 (size: 6.2 KB, free: 2004.6 MB) 2022-09-09 10:10:38.046 WARN [task-result-getter-0] org.apache.spark.scheduler.TaskSetManager 66 logWarning - Lost task 0.0 in stage 0.0 (TID 0, emr-worker-6.cluster-254539, executor 1): java.lang.ClassCastException: cannot assign instance of scala.collection.immutable.List$SerializationProxy to field org.apache.spark.rdd.RDD.org$apache$spark$rdd$RDD$$dependencies of type scala.collection.Seq in instance of org.apache.spark.rdd.MapPartitionsRDD at java.io.ObjectStreamClass$FieldReflector.setObjFieldValues(ObjectStreamClass.java:2301) at java.io.ObjectStreamClass.setObjFieldValues(ObjectStreamClass.java:1431) at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2349) at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2267) at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2125) at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1624) at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2343) at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2267) at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2125) at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1624) at java.io.ObjectInputStream.readObject(ObjectInputStream.java:464) at java.io.ObjectInputStream.readObject(ObjectInputStream.java:422) at scala.collection.immutable.List$SerializationProxy.readObject(List.scala:490) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at java.io.ObjectStreamClass.invokeReadObject(ObjectStreamClass.java:1184) at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2234) at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2125) at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1624) at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2343) at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2267) at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2125) at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1624) at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2343) at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2267) at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2125) at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1624) at java.io.ObjectInputStream.readObject(ObjectInputStream.java:464) at java.io.ObjectInputStream.readObject(ObjectInputStream.java:422) at scala.collection.immutable.List$SerializationProxy.readObject(List.scala:490) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at java.io.ObjectStreamClass.invokeReadObject(ObjectStreamClass.java:1184) at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2234) at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2125) at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1624) at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2343) at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2267) at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2125) at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1624) at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2343) at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2267) at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2125) at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1624) at java.io.ObjectInputStream.readObject(ObjectInputStream.java:464) at java.io.ObjectInputStream.readObject(ObjectInputStream.java:422) at scala.collection.immutable.List$SerializationProxy.readObject(List.scala:490) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at java.io.ObjectStreamClass.invokeReadObject(ObjectStreamClass.java:1184) at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2234) at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2125) at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1624) at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2343) at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2267) at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2125) at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1624) at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2343) at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2267) at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2125) at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1624) at java.io.ObjectInputStream.readObject(ObjectInputStream.java:464) at java.io.ObjectInputStream.readObject(ObjectInputStream.java:422) at org.apache.spark.serializer.JavaDeserializationStream.readObject(JavaSerializer.scala:75) at org.apache.spark.serializer.JavaSerializerInstance.deserialize(JavaSerializer.scala:114) at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:83) at org.apache.spark.scheduler.Task.run(Task.scala:123) at org.apache.spark.executor.Executor$TaskRunner$$anonfun$10.apply(Executor.scala:408) at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1360) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:414) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748)

2022-09-09 10:10:38.049 INFO [dispatcher-event-loop-1] org.apache.spark.scheduler.TaskSetManager 54 logInfo - Starting task 0.1 in stage 0.0 (TID 1, emr-worker-6.cluster-254539, executor 1, partition 0, NODELOCAL, 8601 bytes) 2022-09-09 10:10:38.075 INFO [task-result-getter-1] org.apache.spark.scheduler.TaskSetManager 54 logInfo - Lost task 0.1 in stage 0.0 (TID 1) on emr-worker-6.cluster-254539, executor 1: java.lang.ClassCastException (cannot assign instance of scala.collection.immutable.List$SerializationProxy to field org.apache.spark.rdd.RDD.org$apache$spark$rdd$RDD$$dependencies of type scala.collection.Seq in instance of org.apache.spark.rdd.MapPartitionsRDD) [duplicate 1] 2022-09-09 10:10:38.077 INFO [dispatcher-event-loop-2] org.apache.spark.scheduler.TaskSetManager 54 logInfo - Starting task 0.2 in stage 0.0 (TID 2, emr-worker-6.cluster-254539, executor 1, partition 0, NODELOCAL, 8601 bytes) 2022-09-09 10:10:38.098 INFO [task-result-getter-2] org.apache.spark.scheduler.TaskSetManager 54 logInfo - Lost task 0.2 in stage 0.0 (TID 2) on emr-worker-6.cluster-254539, executor 1: java.lang.ClassCastException (cannot assign instance of scala.collection.immutable.List$SerializationProxy to field org.apache.spark.rdd.RDD.org$apache$spark$rdd$RDD$$dependencies of type scala.collection.Seq in instance of org.apache.spark.rdd.MapPartitionsRDD) [duplicate 2] 2022-09-09 10:10:38.099 INFO [dispatcher-event-loop-8] org.apache.spark.scheduler.TaskSetManager 54 logInfo - Starting task 0.3 in stage 0.0 (TID 3, emr-worker-6.cluster-254539, executor 1, partition 0, NODELOCAL, 8601 bytes) 2022-09-09 10:10:38.119 INFO [task-result-getter-3] org.apache.spark.scheduler.TaskSetManager 54 logInfo - Lost task 0.3 in stage 0.0 (TID 3) on emr-worker-6.cluster-254539, executor 1: java.lang.ClassCastException (cannot assign instance of scala.collection.immutable.List$SerializationProxy to field org.apache.spark.rdd.RDD.org$apache$spark$rdd$RDD$$dependencies of type scala.collection.Seq in instance of org.apache.spark.rdd.MapPartitionsRDD) [duplicate 3] 2022-09-09 10:10:38.121 ERROR [task-result-getter-3] org.apache.spark.scheduler.TaskSetManager 70 logError - Task 0 in stage 0.0 failed 4 times; aborting job 2022-09-09 10:10:38.124 INFO [task-result-getter-3] org.apache.spark.scheduler.cluster.YarnScheduler 54 logInfo - Removed TaskSet 0.0, whose tasks have all completed, from pool default 2022-09-09 10:10:38.128 INFO [dag-scheduler-event-loop] org.apache.spark.scheduler.cluster.YarnScheduler 54 logInfo - Cancelling stage 0 2022-09-09 10:10:38.129 INFO [dag-scheduler-event-loop] org.apache.spark.scheduler.cluster.YarnScheduler 54 logInfo - Killing all running tasks in stage 0: Stage cancelled 2022-09-09 10:10:38.132 INFO [dag-scheduler-event-loop] org.apache.spark.scheduler.DAGScheduler 54 logInfo - ResultStage 0 (showString at NativeMethodAccessorImpl.java:0) failed in 0.914 s due to Job aborted due to stage failure: Task 0 in stage 0.0 failed 4 times, most recent failure: Lost task 0.3 in stage 0.0 (TID 3, emr-worker-6.cluster-254539, executor 1): java.lang.ClassCastException: cannot assign instance of scala.collection.immutable.List$SerializationProxy to field org.apache.spark.rdd.RDD.org$apache$spark$rdd$RDD$$dependencies_ of type scala.collection.Seq in instance of org.apache.spark.rdd.MapPartitionsRDD at java.io.ObjectStreamClass$FieldReflector.setObjFieldValues(ObjectStreamClass.java:2301) at java.io.ObjectStreamClass.setObjFieldValues(ObjectStreamClass.java:1431) at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2349) at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2267) at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2125) at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1624) at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2343) at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2267) at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2125) at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1624) at java.io.ObjectInputStream.readObject(ObjectInputStream.java:464) at java.io.ObjectInputStream.readObject(ObjectInputStream.java:422) at scala.collection.immutable.List$SerializationProxy.readObject(List.scala:490) at sun.reflect.GeneratedMethodAccessor2.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at java.io.ObjectStreamClass.invokeReadObject(ObjectStreamClass.java:1184) at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2234) at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2125) at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1624) at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2343) at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2267) at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2125) at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1624) at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2343) at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2267) at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2125) at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1624) at java.io.ObjectInputStream.readObject(ObjectInputStream.java:464) at java.io.ObjectInputStream.readObject(ObjectInputStream.java:422) at scala.collection.immutable.List$SerializationProxy.readObject(List.scala:490) at sun.reflect.GeneratedMethodAccessor2.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at java.io.ObjectStreamClass.invokeReadObject(ObjectStreamClass.java:1184) at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2234) at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2125) at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1624) at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2343) at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2267) at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2125) at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1624) at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2343) at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2267) at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2125) at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1624) at java.io.ObjectInputStream.readObject(ObjectInputStream.java:464) at java.io.ObjectInputStream.readObject(ObjectInputStream.java:422) at scala.collection.immutable.List$SerializationProxy.readObject(List.scala:490) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at java.io.ObjectStreamClass.invokeReadObject(ObjectStreamClass.java:1184) at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2234) at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2125) at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1624) at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2343) at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2267) at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2125) at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1624) at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2343) at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2267) at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2125) at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1624) at java.io.ObjectInputStream.readObject(ObjectInputStream.java:464) at java.io.ObjectInputStream.readObject(ObjectInputStream.java:422) at org.apache.spark.serializer.JavaDeserializationStream.readObject(JavaSerializer.scala:75) at org.apache.spark.serializer.JavaSerializerInstance.deserialize(JavaSerializer.scala:114) at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:83) at org.apache.spark.scheduler.Task.run(Task.scala:123) at org.apache.spark.executor.Executor$TaskRunner$$anonfun$10.apply(Executor.scala:408) at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1360) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:414) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748)


log file:
 <!-- 拖拽上传后,会自动生成如下示例格式的附件url -->
[stdout.txt](https://github.com/apache/incubator-linkis/files/9531898/stdout.txt)
github-actions[bot] commented 2 years ago

:blush: Welcome to the Apache Linkis (incubating) community!! We are glad that you are contributing by opening this issue.

Please make sure to include all the relevant context. We will be here shortly.

If you are interested in contributing to our website project, please let us know! You can check out our contributing guide on :point_right: How to Participate in Project Contribution.

WeChat Group:

image Mailing Lists: name description Subscribe Unsubscribe archive
dev@linkis.apache.org community activity information subscribe unsubscribe archive
2018yinjian commented 2 years ago

spark-submit --master yarn test.py这种方式提交也是可以读取kudu表的

cat test.py import os,sys,linecache,commands import datetime from pyspark.sql import SQLContext,Row,SparkSession from pyspark.sql.types import * from pyspark.sql import HiveContext

reload(sys) sys.setdefaultencoding("utf-8")

def readKudu(table_name): df = spark.read.format("org.apache.kudu.spark.kudu") \ .option("kudu.table", table_name) \ .option("kudu.master", "192.168.217.248") \ .load() df.show() df.printSchema()

def init_spark_context(): spark = SparkSession.builder.master("yarn-client").appName("testApp").enableHiveSupport().getOrCreate() return spark

if name == "main": spark = init_spark_context() readKudu("impala::dim.dim_src_tab_info")

SLF4J: Class path contains multiple SLF4J bindings. SLF4J: Found binding in [jar:file:/opt/apps/ecm/service/spark/2.4.7-hadoop3.2-1.1.1/package/spark-2.4.7-hadoop3.2-1.1.1/jars/ess-shuffle-manager-1.0.0-shaded.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: Found binding in [jar:file:/opt/apps/ecm/service/spark/2.4.7-hadoop3.2-1.1.1/package/spark-2.4.7-hadoop3.2-1.1.1/jars/slf4j-log4j12-1.7.16.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation. SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory] 22/09/09 17:36:59 INFO [Thread-5] SparkContext: Running Spark version 2.4.7 22/09/09 17:36:59 WARN [Thread-5] SparkConf: spark.master yarn-client is deprecated in Spark 2.0+, please instead use "yarn" with specified deploy mode. 22/09/09 17:36:59 INFO [Thread-5] SparkContext: Submitted application: testApp 22/09/09 17:36:59 INFO [Thread-5] SecurityManager: Changing view acls to: yinjian, 22/09/09 17:36:59 INFO [Thread-5] SecurityManager: Changing modify acls to: yinjian 22/09/09 17:36:59 INFO [Thread-5] SecurityManager: Changing view acls groups to: 22/09/09 17:36:59 INFO [Thread-5] SecurityManager: Changing modify acls groups to: 22/09/09 17:36:59 INFO [Thread-5] SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(yinjian, ); groups with view permissions: Set(); users with modify permissions: Set(yinjian); groups with modify permissions: Set() 22/09/09 17:36:59 INFO [Thread-5] Utils: Successfully started service 'sparkDriver' on port 13247. 22/09/09 17:36:59 INFO [Thread-5] SparkEnv: Registering MapOutputTracker 22/09/09 17:36:59 INFO [Thread-5] SparkEnv: Registering BlockManagerMaster 22/09/09 17:36:59 INFO [Thread-5] BlockManagerMasterEndpoint: Using org.apache.spark.storage.DefaultTopologyMapper for getting topology information 22/09/09 17:36:59 INFO [Thread-5] BlockManagerMasterEndpoint: BlockManagerMasterEndpoint up 22/09/09 17:36:59 INFO [Thread-5] DiskBlockManager: Created local directory at /tmp/blockmgr-6b896b1f-0b53-4651-a807-367e14e9fd97 22/09/09 17:36:59 INFO [Thread-5] MemoryStore: MemoryStore started with capacity 4.1 GB 22/09/09 17:36:59 INFO [Thread-5] SparkEnv: Registering OutputCommitCoordinator 22/09/09 17:37:00 WARN [Thread-5] Utils: Service 'SparkUI' could not bind on port 4040. Attempting port 4041. 22/09/09 17:37:00 INFO [Thread-5] Utils: Successfully started service 'SparkUI' on port 4041. 22/09/09 17:37:00 INFO [Thread-5] SparkUI: Bound SparkUI to 0.0.0.0, and started at http://192.168.217.247:4041 22/09/09 17:37:00 INFO [Thread-5] Client: Requesting a new application from cluster with 5 NodeManagers 22/09/09 17:37:01 INFO [Thread-5] Configuration: resource-types.xml not found 22/09/09 17:37:01 INFO [Thread-5] ResourceUtils: Unable to find 'resource-types.xml'. 22/09/09 17:37:01 INFO [Thread-5] Client: Verifying our application has not requested more than the maximum memory capability of the cluster (30720 MB per container) 22/09/09 17:37:01 INFO [Thread-5] Client: Will allocate AM container, with 896 MB memory including 384 MB overhead 22/09/09 17:37:01 INFO [Thread-5] Client: Setting up container launch context for our AM 22/09/09 17:37:01 INFO [Thread-5] Client: Setting up the launch environment for our AM container 22/09/09 17:37:01 INFO [Thread-5] Client: Preparing resources for our AM container 22/09/09 17:37:01 WARN [Thread-5] Client: Neither spark.yarn.jars nor spark.yarn.archive is set, falling back to uploading libraries under SPARK_HOME. 22/09/09 17:37:05 INFO [Thread-5] Client: Uploading resource file:/tmp/spark-296f02e9-cd3a-4894-a78d-130953efd23b/spark_libs7120621387282183291.zip -> hdfs://emr-cluster/user/yinjian/.sparkStaging/application_1640111027208_230172/spark_libs7120621387282183291.zip 22/09/09 17:37:05 INFO [Thread-25] SaslDataTransferClient: SASL encryption trust check: localHostTrusted = false, remoteHostTrusted = false 22/09/09 17:37:05 INFO [DataStreamer for file /user/yinjian/.sparkStaging/application_1640111027208_230172/spark_libs7120621387282183291.zip] SaslDataTransferClient: SASL encryption trust check: localHostTrusted= false, remoteHostTrusted = false 22/09/09 17:37:05 INFO [DataStreamer for file /user/yinjian/.sparkStaging/application_1640111027208_230172/spark_libs7120621387282183291.zip] SaslDataTransferClient: SASL encryption trust check: localHostTrusted= false, remoteHostTrusted = false 22/09/09 17:37:06 INFO [Thread-5] Client: Uploading resource file:/usr/lib/spark-current/python/lib/pyspark.zip -> hdfs://emr-cluster/user/yinjian/.sparkStaging/application_1640111027208_230172/pyspark.zip 22/09/09 17:37:06 INFO [Thread-30] SaslDataTransferClient: SASL encryption trust check: localHostTrusted = false, remoteHostTrusted = false 22/09/09 17:37:06 INFO [Thread-5] Client: Uploading resource file:/usr/lib/spark-current/python/lib/py4j-0.10.7-src.zip -> hdfs://emr-cluster/user/yinjian/.sparkStaging/application_1640111027208_230172/py4j-0.10.7-src.zip 22/09/09 17:37:06 INFO [Thread-32] SaslDataTransferClient: SASL encryption trust check: localHostTrusted = false, remoteHostTrusted = false 22/09/09 17:37:07 INFO [Thread-5] Client: Uploading resource file:/tmp/spark-296f02e9-cd3a-4894-a78d-130953efd23b/spark_conf4559551404591797490.zip -> hdfs://emr-cluster/user/yinjian/.sparkStaging/application_1640111027208_230172/spark_conf.zip 22/09/09 17:37:07 INFO [Thread-34] SaslDataTransferClient: SASL encryption trust check: localHostTrusted = false, remoteHostTrusted = false 22/09/09 17:37:07 INFO [Thread-5] Client: EMR SmartData using Java binary at JAVA_HOME. 22/09/09 17:37:07 INFO [Thread-5] SecurityManager: Changing view acls to: yinjian, 22/09/09 17:37:07 INFO [Thread-5] SecurityManager: Changing modify acls to: yinjian 22/09/09 17:37:07 INFO [Thread-5] SecurityManager: Changing view acls groups to: 22/09/09 17:37:07 INFO [Thread-5] SecurityManager: Changing modify acls groups to: 22/09/09 17:37:07 INFO [Thread-5] SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(yinjian, ); groups with view permissions: Set(); users with modify permissions: Set(yinjian); groups with modify permissions: Set() 22/09/09 17:37:07 INFO [Thread-5] EsServiceCredentialProvider: Loaded EsServiceCredentialProvider 22/09/09 17:37:07 INFO [Thread-5] EsServiceCredentialProvider: Hadoop Security Enabled = [false] 22/09/09 17:37:07 INFO [Thread-5] EsServiceCredentialProvider: ES Auth Method = [SIMPLE] 22/09/09 17:37:07 INFO [Thread-5] EsServiceCredentialProvider: Are creds required = [false] 22/09/09 17:37:07 INFO [Thread-5] Client: Submitting application application_1640111027208_230172 to ResourceManager 22/09/09 17:37:08 INFO [Thread-5] YarnClientImpl: Submitted application application_1640111027208_230172 22/09/09 17:37:08 INFO [Thread-5] SchedulerExtensionServices: Starting Yarn extension services with app application_1640111027208_230172 and attemptId None 22/09/09 17:37:09 INFO [Thread-5] Client: Application report for application_1640111027208_230172 (state: ACCEPTED) 22/09/09 17:37:09 INFO [Thread-5] Client: client token: N/A diagnostics: AM container is launched, waiting for AM container to Register with RM ApplicationMaster host: N/A ApplicationMaster RPC port: -1 queue: default start time: 1662716227968 final status: UNDEFINED tracking URL: http://emr-header-1.cluster-254539:20888/proxy/application_1640111027208_230172/ user: yinjian 22/09/09 17:37:10 INFO [Thread-5] Client: Application report for application_1640111027208_230172 (state: ACCEPTED) 22/09/09 17:37:11 INFO [Thread-5] Client: Application report for application_1640111027208_230172 (state: ACCEPTED) 22/09/09 17:37:12 INFO [Thread-5] Client: Application report for application_1640111027208_230172 (state: ACCEPTED) 22/09/09 17:37:13 INFO [Thread-5] Client: Application report for application_1640111027208_230172 (state: ACCEPTED) 22/09/09 17:37:14 INFO [Thread-5] Client: Application report for application_1640111027208_230172 (state: ACCEPTED) 22/09/09 17:37:15 INFO [dispatcher-event-loop-10] YarnClientSchedulerBackend: Add WebUI Filter. org.apache.hadoop.yarn.server.webproxy.amfilter.AmIpFilter, Map(PROXY_HOSTS -> emr-header-1.cluster-254539, PROXY_URI_BASES -> http://emr-header-1.cluster-254539:20888/proxy/application_1640111027208_230172, RM_HA_URLS -> emr-header-1.cluster-254539:8088,emr-header-2.cluster-254539:8088), /proxy/application_1640111027208_230172 22/09/09 17:37:15 INFO [Thread-5] Client: Application report for application_1640111027208_230172 (state: RUNNING) 22/09/09 17:37:15 INFO [Thread-5] Client: client token: N/A diagnostics: N/A ApplicationMaster host: 192.168.217.249 ApplicationMaster RPC port: -1 queue: default start time: 1662716227968 final status: UNDEFINED tracking URL: http://emr-header-1.cluster-254539:20888/proxy/application_1640111027208_230172/ user: yinjian 22/09/09 17:37:15 INFO [Thread-5] YarnClientSchedulerBackend: Application application_1640111027208_230172 has started running. 22/09/09 17:37:15 INFO [Thread-5] Utils: Successfully started service 'org.apache.spark.network.netty.NettyBlockTransferService' on port 19463. 22/09/09 17:37:15 INFO [Thread-5] NettyBlockTransferService: Server created on 192.168.217.247:19463 22/09/09 17:37:15 INFO [Thread-5] BlockManager: Using org.apache.spark.storage.RandomBlockReplicationPolicy for block replication policy 22/09/09 17:37:15 INFO [Thread-5] BlockManagerMaster: Registering BlockManager BlockManagerId(driver, 192.168.217.247, 19463, None) 22/09/09 17:37:15 INFO [dispatcher-event-loop-13] BlockManagerMasterEndpoint: Registering block manager 192.168.217.247:19463 with 4.1 GB RAM, BlockManagerId(driver, 192.168.217.247, 19463, None) 22/09/09 17:37:15 INFO [Thread-5] BlockManagerMaster: Registered BlockManager BlockManagerId(driver, 192.168.217.247, 19463, None) 22/09/09 17:37:15 INFO [Thread-5] BlockManager: external shuffle service port = 7337 22/09/09 17:37:15 INFO [Thread-5] BlockManager: Initialized BlockManager: BlockManagerId(driver, 192.168.217.247, 19463, None) 22/09/09 17:37:15 INFO [dispatcher-event-loop-14] YarnSchedulerBackend$YarnSchedulerEndpoint: ApplicationMaster registered as NettyRpcEndpointRef(spark-client://YarnAM) 22/09/09 17:37:15 INFO [Thread-5] JettyUtils: Adding filter org.apache.hadoop.yarn.server.webproxy.amfilter.AmIpFilter to /metrics/json. 22/09/09 17:37:15 INFO [Thread-5] EventLoggingListener: Logging events to hdfs://emr-cluster/spark-history/application_1640111027208_230172 22/09/09 17:37:15 INFO [Thread-37] SaslDataTransferClient: SASL encryption trust check: localHostTrusted = false, remoteHostTrusted = false 22/09/09 17:37:22 INFO [dispatcher-event-loop-5] YarnSchedulerBackend$YarnDriverEndpoint: Registered executor NettyRpcEndpointRef(spark-client://Executor) (192.168.218.173:57104) with ID 2 22/09/09 17:37:22 INFO [dispatcher-event-loop-0] BlockManagerMasterEndpoint: Registering block manager emr-worker-5.cluster-254539:44863 with 3.4 GB RAM, BlockManagerId(2, emr-worker-5.cluster-254539, 44863, None) 22/09/09 17:37:24 INFO [dispatcher-event-loop-5] YarnSchedulerBackend$YarnDriverEndpoint: Registered executor NettyRpcEndpointRef(spark-client://Executor) (192.168.217.250:15012) with ID 1 22/09/09 17:37:24 INFO [Thread-5] YarnClientSchedulerBackend: SchedulerBackend is ready for scheduling beginning after reached minRegisteredResourcesRatio: 0.8 22/09/09 17:37:24 INFO [dispatcher-event-loop-13] BlockManagerMasterEndpoint: Registering block manager emr-worker-2.cluster-254539:22195 with 3.4 GB RAM, BlockManagerId(1, emr-worker-2.cluster-254539, 22195, None) 22/09/09 17:37:24 INFO [Thread-5] SharedState: loading hive config file: file:/etc/ecm/spark-conf-2.4.7-hadoop3.2-1.1.1/hive-site.xml 22/09/09 17:37:24 INFO [Thread-5] SharedState: spark.sql.warehouse.dir is not set, but hive.metastore.warehouse.dir is set. Setting spark.sql.warehouse.dir to the value of hive.metastore.warehouse.dir ('/user/hive/warehouse'). 22/09/09 17:37:24 INFO [Thread-5] SharedState: Warehouse path is '/user/hive/warehouse'. 22/09/09 17:37:24 INFO [Thread-5] JettyUtils: Adding filter org.apache.hadoop.yarn.server.webproxy.amfilter.AmIpFilter to /SQL. 22/09/09 17:37:24 INFO [Thread-5] JettyUtils: Adding filter org.apache.hadoop.yarn.server.webproxy.amfilter.AmIpFilter to /SQL/json. 22/09/09 17:37:24 INFO [Thread-5] JettyUtils: Adding filter org.apache.hadoop.yarn.server.webproxy.amfilter.AmIpFilter to /SQL/execution. 22/09/09 17:37:24 INFO [Thread-5] JettyUtils: Adding filter org.apache.hadoop.yarn.server.webproxy.amfilter.AmIpFilter to /SQL/execution/json. 22/09/09 17:37:24 INFO [Thread-5] JettyUtils: Adding filter org.apache.hadoop.yarn.server.webproxy.amfilter.AmIpFilter to /static/sql. 22/09/09 17:37:24 INFO [Thread-5] JettyUtils: Adding filter org.apache.hadoop.yarn.server.webproxy.amfilter.AmIpFilter to /streamingsql. 22/09/09 17:37:24 INFO [Thread-5] JettyUtils: Adding filter org.apache.hadoop.yarn.server.webproxy.amfilter.AmIpFilter to /streamingsql/json. 22/09/09 17:37:24 INFO [Thread-5] JettyUtils: Adding filter org.apache.hadoop.yarn.server.webproxy.amfilter.AmIpFilter to /streamingsql/statistics. 22/09/09 17:37:24 INFO [Thread-5] JettyUtils: Adding filter org.apache.hadoop.yarn.server.webproxy.amfilter.AmIpFilter to /streamingsql/statistics/json. 22/09/09 17:37:24 INFO [Thread-5] JettyUtils: Adding filter org.apache.hadoop.yarn.server.webproxy.amfilter.AmIpFilter to /static/sql. 22/09/09 17:37:24 INFO [Thread-5] JettyUtils: Adding filter org.apache.hadoop.yarn.server.webproxy.amfilter.AmIpFilter to /streamingsql/query/kill. 22/09/09 17:37:24 INFO [Thread-5] JettyUtils: Adding filter org.apache.hadoop.yarn.server.webproxy.amfilter.AmIpFilter to /streamingsql/query/restart. 22/09/09 17:37:25 INFO [Thread-5] StateStoreCoordinatorRef: Registered StateStoreCoordinator endpoint 22/09/09 17:37:25 INFO [Thread-5] Version: Elasticsearch Hadoop v7.10.2 [f53f4b7b2b] 22/09/09 17:37:27 INFO [Thread-5] CodeGenerator: Code generated in 206.944333 ms 22/09/09 17:37:27 INFO [Thread-5] ResultQueryStage: add exchangecoordinator though 22/09/09 17:37:27 INFO [Thread-5] CodeGenerator: Code generated in 21.478254 ms 22/09/09 17:37:27 INFO [Thread-5] SparkContext: Starting job: showString at NativeMethodAccessorImpl.java:0 22/09/09 17:37:27 INFO [dag-scheduler-event-loop] DAGScheduler: Got job 0 (showString at NativeMethodAccessorImpl.java:0) with 1 output partitions 22/09/09 17:37:27 INFO [dag-scheduler-event-loop] DAGScheduler: Final stage: ResultStage 0 (showString at NativeMethodAccessorImpl.java:0) 22/09/09 17:37:27 INFO [dag-scheduler-event-loop] DAGScheduler: Parents of final stage: List() 22/09/09 17:37:27 INFO [dag-scheduler-event-loop] DAGScheduler: Missing parents: List() 22/09/09 17:37:27 INFO [dag-scheduler-event-loop] DAGScheduler: Submitting ResultStage 0 (MapPartitionsRDD[4] at showString at NativeMethodAccessorImpl.java:0), which has no missing parents 22/09/09 17:37:27 INFO [dag-scheduler-event-loop] MemoryStore: Block broadcast_0 stored as values in memory (estimated size 28.2 KB, free 4.1 GB) 22/09/09 17:37:27 INFO [dag-scheduler-event-loop] MemoryStore: Block broadcast_0_piece0 stored as bytes in memory (estimated size 6.3 KB, free 4.1 GB) 22/09/09 17:37:27 INFO [dispatcher-event-loop-4] BlockManagerInfo: Added broadcast_0_piece0 in memory on 192.168.217.247:19463 (size: 6.3 KB, free: 4.1 GB) 22/09/09 17:37:27 INFO [dag-scheduler-event-loop] SparkContext: Created broadcast 0 from broadcast at DAGScheduler.scala:1184 22/09/09 17:37:27 INFO [dag-scheduler-event-loop] DAGScheduler: Submitting 1 missing tasks from ResultStage 0 (MapPartitionsRDD[4] at showString at NativeMethodAccessorImpl.java:0) (first 15 tasks are for partitionsVector(0)) 22/09/09 17:37:27 INFO [dag-scheduler-event-loop] YarnScheduler: Adding task set 0.0 with 1 tasks 22/09/09 17:37:27 INFO [dispatcher-event-loop-9] TaskSetManager: Starting task 0.0 in stage 0.0 (TID 0, emr-worker-5.cluster-254539, executor 2, partition 0, RACK_LOCAL, 8601 bytes) 22/09/09 17:37:27 INFO [dispatcher-event-loop-1] BlockManagerInfo: Added broadcast_0_piece0 in memory on emr-worker-5.cluster-254539:44863 (size: 6.3 KB, free: 3.4 GB) 22/09/09 17:37:30 INFO [task-result-getter-0] TaskSetManager: Finished task 0.0 in stage 0.0 (TID 0) in 2615 ms on emr-worker-5.cluster-254539 (executor 2) (1/1) 22/09/09 17:37:30 INFO [task-result-getter-0] YarnScheduler: Removed TaskSet 0.0, whose tasks have all completed, from pool 22/09/09 17:37:30 INFO [dag-scheduler-event-loop] DAGScheduler: ResultStage 0 (showString at NativeMethodAccessorImpl.java:0) finished in 2.813 s 22/09/09 17:37:30 INFO [Thread-5] DAGScheduler: Job 0 finished: showString at NativeMethodAccessorImpl.java:0, took 2.855944 s 22/09/09 17:37:30 INFO [Thread-5] SparkSQLQueryListener: head is called 22/09/09 17:37:30 INFO [Thread-5] SparkSQLQueryListener: Spark user yinjian executed on 1662716250192 with spark sql successfully. +--------------------+--------+--------+--------------------+--------+------------+--------------+--------------------+--------------------+--------+--------+ | tar_tab|tar_part|src_inst| src_tab|src_part| src_pk|src_filter_col| cols| create_sql|batch_id|dim_flag| +--------------------+--------+--------+--------------------+--------+------------+--------------+--------------------+--------------------+--------+--------+ |dim.dimabnormal...| | mysql57|zmn_ums.abnor_cat...| | categ_id| update_time|name,first_let...|create table if n...| 9| 1| | dim.dim_account| | mysql56| zmn_account.account| | account_id| update_time|plat,account_c...|create table if n...| 9| 1| | dim.dim_act_subject| | mysql57| zmn_act.act_subject| | subject_id| update_time|name,show_name...|create table if n...| 9| 1| |dim.dim_aws_base_...| | mysql56|zmn_aws.base_comm...| |community_id| update_time|community_name,...|create table if n...| 9| 1| |dim.dim_aws_commu...| | mysql56|zmn_aws.aws_commu...| | id| update_time|tripartite_commu...|create table if n...| 9| 1| |dim.dim_base_channel| | mysql56|base_channel.channel| | channel_id| update_time|name,name_piny...|create table if n...| 5| 2| |dim.dim_base_chan...| | mysql56|base_channel.chan...| | channel_id| update_time|name,salesman_...|create table if n...| 5| 2| |dim.dim_base_chan...| | mysql56|base_channel.coop...| | relate_id| |cooperate_id,t...|create table if n...| 5| 2| |dim.dim_base_chan...| | mysql56|base_channel.chan...| | mark_id| update_time|name,sort,st...|create table if n...| 5| 2| |dim.dim_base_chan...| | mysql56|base_channel.chan...| | relate_id| update_time|mark_id,channe...|create table if n...| 5| 2| |dim.dim_base_chan...| | mysql56|base_channel.chan...| | channel_id| update_time|province_id,ci...|create table if n...| 5| 2| |dim.dim_base_comm...| | mysql56|base_common_data....| | area_id| update_time|name,code,ar...|create table if n...| 5| 2| |dim.dim_base_comm...| | mysql56|base_common_data....| | channel_id| update_time|name,name_piny...|create table if n...| 5| 2| |dim.dim_base_comm...| | mysql56|base_common_data....| | company_id| update_time|company_code,n...|create table if n...| 5| 2| |dim.dim_base_comm...| | mysql56|base_common_data....| | company_id| update_time|full_name,lega...|create table if n...| 5| 2| |dim.dim_base_comm...| | mysql56|base_common_data....| | company_id| update_time|full_name,prov...|create table if n...| 5| 2| |dim.dim_base_comm...| | mysql56|base_common_data....| | company_id| update_time|full_name,bank...|create table if n...| 5| 2| |dim.dim_base_comm...| | mysql56|base_common_data....| | gate_id| update_time|gate_name,gate...|create table if n...| 5| 2| |dim.dim_base_comm...| | mysql56|base_common_data....| | account_id| update_time|name,account_t...|create table if n...| 5| 2| |dim.dim_base_comm...| | mysql56|base_common_data....| | brand_id| update_time|name,eng_name...|create table if n...| 5| 2| +--------------------+--------+--------+--------------------+--------+------------+--------------+--------------------+--------------------+--------+--------+ only showing top 20 rows

root |-- tar_tab: string (nullable = false) |-- tar_part: string (nullable = true) |-- src_inst: string (nullable = true) |-- src_tab: string (nullable = true) |-- src_part: string (nullable = true) |-- src_pk: string (nullable = true) |-- src_filter_col: string (nullable = true) |-- cols: string (nullable = true) |-- create_sql: string (nullable = true) |-- batch_id: integer (nullable = true) |-- dim_flag: short (nullable = true)

22/09/09 17:37:30 INFO [shutdown-hook-0] SparkContext: Invoking stop() from shutdown hook 22/09/09 17:37:30 INFO [shutdown-hook-0] SparkUI: Stopped Spark web UI at http://192.168.217.247:4041 22/09/09 17:37:30 INFO [YARN application state monitor] YarnClientSchedulerBackend: Interrupting monitor thread 22/09/09 17:37:30 INFO [shutdown-hook-0] YarnClientSchedulerBackend: Shutting down all executors 22/09/09 17:37:30 INFO [dispatcher-event-loop-1] YarnSchedulerBackend$YarnDriverEndpoint: Asking each executor to shut down 22/09/09 17:37:30 INFO [shutdown-hook-0] SchedulerExtensionServices: Stopping SchedulerExtensionServices (serviceOption=None, services=List(), started=false) 22/09/09 17:37:30 INFO [shutdown-hook-0] YarnClientSchedulerBackend: Stopped 22/09/09 17:37:30 INFO [dispatcher-event-loop-4] MapOutputTrackerMasterEndpoint: MapOutputTrackerMasterEndpoint stopped! 22/09/09 17:37:30 INFO [shutdown-hook-0] MemoryStore: MemoryStore cleared 22/09/09 17:37:30 INFO [shutdown-hook-0] BlockManager: BlockManager stopped 22/09/09 17:37:30 INFO [shutdown-hook-0] BlockManagerMaster: BlockManagerMaster stopped 22/09/09 17:37:30 INFO [dispatcher-event-loop-7] OutputCommitCoordinator$OutputCommitCoordinatorEndpoint: OutputCommitCoordinator stopped! 22/09/09 17:37:30 INFO [shutdown-hook-0] SparkContext: Successfully stopped SparkContext 22/09/09 17:37:30 INFO [shutdown-hook-0] ShutdownHookManager: Shutdown hook called 22/09/09 17:37:30 INFO [shutdown-hook-0] ShutdownHookManager: Deleting directory /tmp/spark-296f02e9-cd3a-4894-a78d-130953efd23b 22/09/09 17:37:30 INFO [shutdown-hook-0] ShutdownHookManager: Deleting directory /tmp/spark-296f02e9-cd3a-4894-a78d-130953efd23b/pyspark-1e4cbe5a-fc49-4f91-aa0c-b0371188452d 22/09/09 17:37:30 INFO [shutdown-hook-0] ShutdownHookManager: Deleting directory /tmp/spark-a8650bb9-cc9c-477e-9f94-91565b3de9ff