apache / linkis

Apache Linkis builds a computation middleware layer to facilitate connection, governance and orchestration between the upper applications and the underlying data engines.
https://linkis.apache.org/
Apache License 2.0
3.29k stars 1.16k forks source link

linkis-1.4.0 Excel file import hive failed #4882

Closed tuigerphkeeper closed 11 months ago

tuigerphkeeper commented 1 year ago

Before asking

Your environment

Describe your questions

There is a problem with the LoadData code in linkis-1.4.0. I used 1.3.2 to package it and import it into Excel for Hive to execute. However, using 1.4.0 resulted in a Java. lang. IndexOutOfBoundsException error

Eureka service list

eg:image

Some logs info or acctch file

_0.dolphin

dolphin00000000010000000004NULL0000000038java.lang.IndexOutOfBoundsException: 00000000081  at scala.collection.LinearSeqOptimized$class.apply(LinearSeqOptimized.scala:65)0000000057  at scala.collection.immutable.List.apply(List.scala:84)0000000108  at com.webank.wedatasphere.spark.excel.ExcelRelation$$anonfun$9$$anonfun$11.apply(ExcelRelation.scala:212)0000000108  at com.webank.wedatasphere.spark.excel.ExcelRelation$$anonfun$9$$anonfun$11.apply(ExcelRelation.scala:212)0000000113  at com.webank.wedatasphere.spark.excel.ExcelRelation$$anonfun$9$$anonfun$apply$3.apply(ExcelRelation.scala:226)0000000113  at com.webank.wedatasphere.spark.excel.ExcelRelation$$anonfun$9$$anonfun$apply$3.apply(ExcelRelation.scala:220)0000000114  at com.webank.wedatasphere.spark.excel.ExcelRelation$$anonfun$13$$anonfun$apply$4.apply(ExcelRelation.scala:232)0000000114  at com.webank.wedatasphere.spark.excel.ExcelRelation$$anonfun$13$$anonfun$apply$4.apply(ExcelRelation.scala:232)0000000085  at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)0000000085  at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)0000000064  at scala.collection.Iterator$class.foreach(Iterator.scala:891)0000000067  at scala.collection.AbstractIterator.foreach(Iterator.scala:1334)0000000071  at scala.collection.IterableLike$class.foreach(IterableLike.scala:72)0000000065  at scala.collection.AbstractIterable.foreach(Iterable.scala:54)0000000074  at scala.collection.TraversableLike$class.map(TraversableLike.scala:234)0000000068  at scala.collection.AbstractTraversable.map(Traversable.scala:104)0000000097  at com.webank.wedatasphere.spark.excel.ExcelRelation$$anonfun$13.apply(ExcelRelation.scala:232)0000000097  at com.webank.wedatasphere.spark.excel.ExcelRelation$$anonfun$13.apply(ExcelRelation.scala:232)0000000064  at scala.collection.Iterator$$anon$11.next(Iterator.scala:410)0000000064  at scala.collection.Iterator$class.foreach(Iterator.scala:891)0000000067  at scala.collection.AbstractIterator.foreach(Iterator.scala:1334)0000000077  at scala.collection.generic.Growable$class.$plus$plus$eq(Growable.scala:59)0000000077  at scala.collection.immutable.VectorBuilder.$plus$plus$eq(Vector.scala:732)0000000077  at scala.collection.immutable.VectorBuilder.$plus$plus$eq(Vector.scala:708)0000000073  at scala.collection.TraversableOnce$class.to(TraversableOnce.scala:310)0000000062  at scala.collection.AbstractIterator.to(Iterator.scala:1334)0000000089  at com.webank.wedatasphere.spark.excel.ExcelRelation.buildScan(ExcelRelation.scala:233)0000000114  at org.apache.spark.sql.execution.datasources.DataSourceStrategy$$anonfun$11.apply(DataSourceStrategy.scala:292)0000000114  at org.apache.spark.sql.execution.datasources.DataSourceStrategy$$anonfun$11.apply(DataSourceStrategy.scala:292)0000000132  at org.apache.spark.sql.execution.datasources.DataSourceStrategy$$anonfun$pruneFilterProject$1.apply(DataSourceStrategy.scala:330)0000000132  at org.apache.spark.sql.execution.datasources.DataSourceStrategy$$anonfun$pruneFilterProject$1.apply(DataSourceStrategy.scala:329)0000000118  at org.apache.spark.sql.execution.datasources.DataSourceStrategy.pruneFilterProjectRaw(DataSourceStrategy.scala:385)0000000115  at org.apache.spark.sql.execution.datasources.DataSourceStrategy.pruneFilterProject(DataSourceStrategy.scala:325)0000000102  at org.apache.spark.sql.execution.datasources.DataSourceStrategy.apply(DataSourceStrategy.scala:288)0000000096  at org.apache.spark.sql.catalyst.planning.QueryPlanner$$anonfun$1.apply(QueryPlanner.scala:63)0000000096  at org.apache.spark.sql.catalyst.planning.QueryPlanner$$anonfun$1.apply(QueryPlanner.scala:63)0000000067  at scala.collection.Iterator$$anon$12.nextCur(Iterator.scala:435)0000000067  at scala.collection.Iterator$$anon$12.hasNext(Iterator.scala:441)0000000067  at scala.collection.Iterator$$anon$12.hasNext(Iterator.scala:440)0000000084  at org.apache.spark.sql.catalyst.planning.QueryPlanner.plan(QueryPlanner.scala:93)0000000113  at org.apache.spark.sql.catalyst.planning.QueryPlanner$$anonfun$2$$anonfun$apply$2.apply(QueryPlanner.scala:78)0000000113  at org.apache.spark.sql.catalyst.planning.QueryPlanner$$anonfun$2$$anonfun$apply$2.apply(QueryPlanner.scala:75)0000000090  at scala.collection.TraversableOnce$$anonfun$foldLeft$1.apply(TraversableOnce.scala:157)0000000090  at scala.collection.TraversableOnce$$anonfun$foldLeft$1.apply(TraversableOnce.scala:157)0000000064  at scala.collection.Iterator$class.foreach(Iterator.scala:891)0000000067  at scala.collection.AbstractIterator.foreach(Iterator.scala:1334)0000000079  at scala.collection.TraversableOnce$class.foldLeft(TraversableOnce.scala:157)0000000068  at scala.collection.AbstractIterator.foldLeft(Iterator.scala:1334)0000000096  at org.apache.spark.sql.catalyst.planning.QueryPlanner$$anonfun$2.apply(QueryPlanner.scala:75)0000000096  at org.apache.spark.sql.catalyst.planning.QueryPlanner$$anonfun$2.apply(QueryPlanner.scala:67)0000000067  at scala.collection.Iterator$$anon$12.nextCur(Iterator.scala:435)0000000067  at scala.collection.Iterator$$anon$12.hasNext(Iterator.scala:441)0000000084  at org.apache.spark.sql.catalyst.planning.QueryPlanner.plan(QueryPlanner.scala:93)0000000113  at org.apache.spark.sql.catalyst.planning.QueryPlanner$$anonfun$2$$anonfun$apply$2.apply(QueryPlanner.scala:78)0000000113  at org.apache.spark.sql.catalyst.planning.QueryPlanner$$anonfun$2$$anonfun$apply$2.apply(QueryPlanner.scala:75)0000000090  at scala.collection.TraversableOnce$$anonfun$foldLeft$1.apply(TraversableOnce.scala:157)0000000090  at scala.collection.TraversableOnce$$anonfun$foldLeft$1.apply(TraversableOnce.scala:157)0000000064  at scala.collection.Iterator$class.foreach(Iterator.scala:891)0000000067  at scala.collection.AbstractIterator.foreach(Iterator.scala:1334)0000000079  at scala.collection.TraversableOnce$class.foldLeft(TraversableOnce.scala:157)0000000068  at scala.collection.AbstractIterator.foldLeft(Iterator.scala:1334)0000000096  at org.apache.spark.sql.catalyst.planning.QueryPlanner$$anonfun$2.apply(QueryPlanner.scala:75)0000000096  at org.apache.spark.sql.catalyst.planning.QueryPlanner$$anonfun$2.apply(QueryPlanner.scala:67)0000000067  at scala.collection.Iterator$$anon$12.nextCur(Iterator.scala:435)0000000067  at scala.collection.Iterator$$anon$12.hasNext(Iterator.scala:441)0000000084  at org.apache.spark.sql.catalyst.planning.QueryPlanner.plan(QueryPlanner.scala:93)0000000096  at org.apache.spark.sql.execution.QueryExecution.sparkPlan$lzycompute(QueryExecution.scala:72)0000000085  at org.apache.spark.sql.execution.QueryExecution.sparkPlan(QueryExecution.scala:68)0000000099  at org.apache.spark.sql.execution.QueryExecution.executedPlan$lzycompute(QueryExecution.scala:77)0000000088  at org.apache.spark.sql.execution.QueryExecution.executedPlan(QueryExecution.scala:77)0000000064  at org.apache.spark.sql.Dataset.withAction(Dataset.scala:3254)0000000059  at org.apache.spark.sql.Dataset.<init>(Dataset.scala:190)0000000059  at org.apache.spark.sql.Dataset$.ofRows(Dataset.scala:75)0000000066  at org.apache.spark.sql.SparkSession.sql(SparkSession.scala:642)0000000105  at org.apache.linkis.engineplugin.spark.imexport.LoadData$.create_table_from_a_file(LoadData.scala:199)0000000095  at org.apache.linkis.engineplugin.spark.imexport.LoadData$.loadDataToTable(LoadData.scala:52)0000000015  ... 89 elided

ec.log

2023-08-28 15:28:23.678 [ERROR] [Linkis-Default-Scheduler-Thread-2       ] o.a.l.e.s.e.SparkScalaExecutor (214) [apply] [JobId-174] - execute code failed! org.apache.linkis.engin
eplugin.spark.exception.ExecuteError: errCode: 40005 ,desc: execute sparkScala failed!(执行 sparkScala 失败!) ,ip: lc-node1 ,port: 37962 ,serviceKind: linkis-cg-engineconn
        at org.apache.linkis.engineplugin.spark.executor.SparkScalaExecutor$$anonfun$1.apply(SparkScalaExecutor.scala:242) ~[linkis-engineplugin-spark-1.4.0.jar:1.4.0]
        at org.apache.linkis.engineplugin.spark.executor.SparkScalaExecutor$$anonfun$1.apply(SparkScalaExecutor.scala:193) ~[linkis-engineplugin-spark-1.4.0.jar:1.4.0]
        at scala.util.DynamicVariable.withValue(DynamicVariable.scala:58) ~[scala-library-2.11.12.jar:?]
        at scala.Console$.withOut(Console.scala:65) ~[scala-library-2.11.12.jar:?]
        at org.apache.linkis.engineplugin.spark.executor.SparkScalaExecutor.executeLine(SparkScalaExecutor.scala:192) ~[linkis-engineplugin-spark-1.4.0.jar:1.4.0]
        at org.apache.linkis.engineplugin.spark.executor.SparkScalaExecutor$$anonfun$runCode$1.apply$mcV$sp(SparkScalaExecutor.scala:164) ~[linkis-engineplugin-spark-1.4.0.jar:1.
4.0]
        at org.apache.linkis.engineplugin.spark.executor.SparkScalaExecutor$$anonfun$runCode$1.apply(SparkScalaExecutor.scala:164) ~[linkis-engineplugin-spark-1.4.0.jar:1.4.0]
        at org.apache.linkis.engineplugin.spark.executor.SparkScalaExecutor$$anonfun$runCode$1.apply(SparkScalaExecutor.scala:164) ~[linkis-engineplugin-spark-1.4.0.jar:1.4.0]
        at org.apache.linkis.common.utils.Utils$.tryCatch(Utils.scala:49) ~[linkis-common-1.4.0.jar:1.4.0]
        at org.apache.linkis.engineplugin.spark.executor.SparkScalaExecutor.runCode(SparkScalaExecutor.scala:165) ~[linkis-engineplugin-spark-1.4.0.jar:1.4.0]
        at org.apache.linkis.engineplugin.spark.executor.SparkEngineConnExecutor$$anonfun$executeLine$2$$anonfun$2.apply(SparkEngineConnExecutor.scala:132) ~[linkis-engineplugin-
spark-1.4.0.jar:1.4.0]
        at org.apache.linkis.engineplugin.spark.executor.SparkEngineConnExecutor$$anonfun$executeLine$2$$anonfun$2.apply(SparkEngineConnExecutor.scala:132) ~[linkis-engineplugin-
spark-1.4.0.jar:1.4.0]
        at org.apache.linkis.common.utils.Utils$.tryFinally(Utils.scala:77) ~[linkis-common-1.4.0.jar:1.4.0]
        at org.apache.linkis.engineplugin.spark.executor.SparkEngineConnExecutor.executeLine(SparkEngineConnExecutor.scala:144) ~[linkis-engineplugin-spark-1.4.0.jar:1.4.0]
        at org.apache.linkis.engineconn.computation.executor.execute.ComputationExecutor$$anonfun$toExecuteTask$2$$anonfun$apply$7$$anonfun$apply$8.apply(ComputationExecutor.scal
a:207) ~[linkis-computation-engineconn-1.4.0.jar:1.4.0]
        at org.apache.linkis.engineconn.computation.executor.execute.ComputationExecutor$$anonfun$toExecuteTask$2$$anonfun$apply$7$$anonfun$apply$8.apply(ComputationExecutor.scal
a:205) ~[linkis-computation-engineconn-1.4.0.jar:1.4.0]
        at org.apache.linkis.common.utils.Utils$.tryCatch(Utils.scala:49) ~[linkis-common-1.4.0.jar:1.4.0]
        at org.apache.linkis.engineconn.computation.executor.execute.ComputationExecutor$$anonfun$toExecuteTask$2$$anonfun$apply$7.apply(ComputationExecutor.scala:207) ~[linkis-c
omputation-engineconn-1.4.0.jar:1.4.0]
        at org.apache.linkis.engineconn.computation.executor.execute.ComputationExecutor$$anonfun$toExecuteTask$2$$anonfun$apply$7.apply(ComputationExecutor.scala:199) ~[linkis-c
omputation-engineconn-1.4.0.jar:1.4.0]
        at scala.collection.immutable.Range.foreach(Range.scala:160) ~[scala-library-2.11.12.jar:?]
        at org.apache.linkis.engineconn.computation.executor.execute.ComputationExecutor$$anonfun$toExecuteTask$2.apply(ComputationExecutor.scala:199) ~[linkis-computation-engine
conn-1.4.0.jar:1.4.0]
        at org.apache.linkis.engineconn.computation.executor.execute.ComputationExecutor$$anonfun$toExecuteTask$2.apply(ComputationExecutor.scala:169) ~[linkis-computation-engine
conn-1.4.0.jar:1.4.0]
        at org.apache.linkis.common.utils.Utils$.tryFinally(Utils.scala:77) ~[linkis-common-1.4.0.jar:1.4.0]
        at org.apache.linkis.engineconn.computation.executor.execute.ComputationExecutor.toExecuteTask(ComputationExecutor.scala:250) ~[linkis-computation-engineconn-1.4.0.jar:1.
4.0]
        at org.apache.linkis.engineconn.computation.executor.execute.ComputationExecutor$$anonfun$execute$2$$anonfun$3.apply(ComputationExecutor.scala:265) ~[linkis-computation-e
ngineconn-1.4.0.jar:1.4.0]
        at org.apache.linkis.engineconn.computation.executor.execute.ComputationExecutor$$anonfun$execute$2$$anonfun$3.apply(ComputationExecutor.scala:264) ~[linkis-computation-e
ngineconn-1.4.0.jar:1.4.0]
        at org.apache.linkis.common.utils.Utils$.tryFinally(Utils.scala:77) ~[linkis-common-1.4.0.jar:1.4.0]
        at org.apache.linkis.engineconn.acessible.executor.entity.AccessibleExecutor.ensureIdle(AccessibleExecutor.scala:62) ~[linkis-accessible-executor-1.4.0.jar:1.4.0]
        at org.apache.linkis.engineconn.acessible.executor.entity.AccessibleExecutor.ensureIdle(AccessibleExecutor.scala:56) ~[linkis-accessible-executor-1.4.0.jar:1.4.0]
        at org.apache.linkis.engineconn.computation.executor.execute.ComputationExecutor.ensureOp(ComputationExecutor.scala:145) ~[linkis-computation-engineconn-1.4.0.jar:1.4.0]
        at org.apache.linkis.engineconn.computation.executor.execute.ComputationExecutor$$anonfun$execute$2.apply(ComputationExecutor.scala:264) ~[linkis-computation-engineconn-1
.4.0.jar:1.4.0]
        at org.apache.linkis.engineconn.computation.executor.execute.ComputationExecutor$$anonfun$execute$2.apply(ComputationExecutor.scala:256) ~[linkis-computation-engineconn-1
.4.0.jar:1.4.0]
        at org.apache.linkis.common.utils.Utils$.tryFinally(Utils.scala:77) ~[linkis-common-1.4.0.jar:1.4.0]
        at org.apache.linkis.engineconn.computation.executor.execute.ComputationExecutor.execute(ComputationExecutor.scala:281) ~[linkis-computation-engineconn-1.4.0.jar:1.4.0]
        at org.apache.linkis.engineconn.computation.executor.service.TaskExecutionServiceImpl$$anonfun$org$apache$linkis$engineconn$computation$executor$service$TaskExecutionServiceImpl$$executeTask$1.apply$mcV$sp(TaskExecutionServiceImpl.scala:403) ~[linkis-computation-engineconn-1.4.0.jar:1.4.0]
        at org.apache.linkis.engineconn.computation.executor.service.TaskExecutionServiceImpl$$anonfun$org$apache$linkis$engineconn$computation$executor$service$TaskExecutionServiceImpl$$executeTask$1.apply(TaskExecutionServiceImpl.scala:400) ~[linkis-computation-engineconn-1.4.0.jar:1.4.0]
        at org.apache.linkis.engineconn.computation.executor.service.TaskExecutionServiceImpl$$anonfun$org$apache$linkis$engineconn$computation$executor$service$TaskExecutionServiceImpl$$executeTask$1.apply(TaskExecutionServiceImpl.scala:400) ~[linkis-computation-engineconn-1.4.0.jar:1.4.0]
        at org.apache.linkis.common.utils.Utils$.tryFinally(Utils.scala:77) ~[linkis-common-1.4.0.jar:1.4.0]
        at org.apache.linkis.engineconn.computation.executor.service.TaskExecutionServiceImpl.org$apache$linkis$engineconn$computation$executor$service$TaskExecutionServiceImpl$$executeTask(TaskExecutionServiceImpl.scala:405) ~[linkis-computation-engineconn-1.4.0.jar:1.4.0]
        at org.apache.linkis.engineconn.computation.executor.service.TaskExecutionServiceImpl$$anon$2$$anonfun$run$2.apply$mcV$sp(TaskExecutionServiceImpl.scala:330) ~[linkis-computation-engineconn-1.4.0.jar:1.4.0]
        at org.apache.linkis.engineconn.computation.executor.service.TaskExecutionServiceImpl$$anon$2$$anonfun$run$2.apply(TaskExecutionServiceImpl.scala:328) ~[linkis-computation-engineconn-1.4.0.jar:1.4.0]
        at org.apache.linkis.engineconn.computation.executor.service.TaskExecutionServiceImpl$$anon$2$$anonfun$run$2.apply(TaskExecutionServiceImpl.scala:328) ~[linkis-computation-engineconn-1.4.0.jar:1.4.0]
        at org.apache.linkis.common.utils.Utils$.tryCatch(Utils.scala:49) ~[linkis-common-1.4.0.jar:1.4.0]
        at org.apache.linkis.common.utils.Utils$.tryAndWarn(Utils.scala:85) ~[linkis-common-1.4.0.jar:1.4.0]
        at org.apache.linkis.engineconn.computation.executor.service.TaskExecutionServiceImpl$$anon$2.run(TaskExecutionServiceImpl.scala:328) ~[linkis-computation-engineconn-1.4.0.jar:1.4.0]
        at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) ~[?:1.8.0_333]
        at java.util.concurrent.FutureTask.run(FutureTask.java:266) ~[?:1.8.0_333]
        at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180) ~[?:1.8.0_333]
        at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293) ~[?:1.8.0_333]
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) ~[?:1.8.0_333]
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) ~[?:1.8.0_333]
        at java.lang.Thread.run(Thread.java:750) ~[?:1.8.0_333]
github-actions[bot] commented 1 year ago

:blush: Welcome to the Apache Linkis community!!

We are glad that you are contributing by opening this issue.

Please make sure to include all the relevant context. We will be here shortly.

If you are interested in contributing to our website project, please let us know! You can check out our contributing guide on :point_right: How to Participate in Project Contribution.

Community

WeChat Assistant WeChat Public Account

Mailing Lists

Name Description Subscribe Unsubscribe Archive
dev@linkis.apache.org community activity information subscribe unsubscribe archive
tuigerphkeeper commented 1 year ago

Here are some jars used.spark_excel.jar is obtained from here https://github.com/apache/linkis/issues/2590 spark-engine-for-excel.zip

aiceflower commented 1 year ago

After spark is upgraded to 3.x in linkis1.4.0, the corresponding scala version should also be upgraded, please use scala 2.12+ to recompile. Otherwise, there will be many unexpected problems.

tuigerphkeeper commented 1 year ago

After spark is upgraded to 3.x in linkis1.4.0, the corresponding scala version should also be upgraded, please use scala 2.12+ to recompile. Otherwise, there will be many unexpected problems.

I have written the wrong version of Spark2.3.0 I am using

aiceflower commented 1 year ago

Can you list your compile command? You can refer to the following guidelines to compile. https://linkis.apache.org/zh-CN/docs/latest/feature/base-engine-compatibilty

tuigerphkeeper commented 1 year ago

Can you list your compile command? You can refer to the following guidelines to compile. https://linkis.apache.org/zh-CN/docs/latest/feature/base-engine-compatibilty

I copied the LoadData code from linkis1.3.2 into linkis1.4.0 and it was executed successfully. I don't think it has anything to do with the scala version. What I mean is that there is a problem with LoadData in 1.4.0

peacewong commented 1 year ago

ping ~@GuoPhilipse

GuoPhilipse commented 11 months ago

sorry to reply later, I fired a pr to fix this issue later~