Open ASiegeLion opened 1 month ago
gluten conf:
Got the same problem and solved by making a change to hadoop server side config hadoop.rpc.protection
from authentication,privacy
to authentication
(my hadoop is HDP 3.1.0)
Looks like disabling of data encryption helps.
The root cause is not clear to me, hope someone can explain, thanks.
Got the same problem and solved by making a change to hadoop server side config
hadoop.rpc.protection
fromauthentication,privacy
toauthentication
(my hadoop is HDP 3.1.0) Looks like disabling of data encryption helps. The root cause is not clear to me, hope someone can explain, thanks.
we modify the configuration but the exception still exists spark-kubernetes-executor.log
hadoop SecurityAuth.audit log
Backend
CH (ClickHouse)
Bug description
2024-09-20T10:21:35.670057896+08:00 20. Java_org_apache_gluten_vectorized_BatchIterator_nativeHasNext @ 0x0000000005e765d7
2024-09-20T10:21:35.670060237+08:00
2024-09-20T10:21:35.670062920+08:00 at org.apache.gluten.vectorized.GeneralOutIterator.hasNext(GeneralOutIterator.java:39)
2024-09-20T10:21:35.670065532+08:00 at org.apache.gluten.backendsapi.clickhouse.CollectMetricIterator.hasNext(CHIteratorApi.scala:332)
2024-09-20T10:21:35.670068131+08:00 at org.apache.gluten.vectorized.CloseableCHColumnBatchIterator.$anonfun$hasNext$1(CloseableCHColumnBatchIterator.scala:42)
2024-09-20T10:21:35.670070518+08:00 at scala.runtime.java8.JFunction0$mcZ$sp.apply(JFunction0$mcZ$sp.java:23)
2024-09-20T10:21:35.670073013+08:00 at org.apache.gluten.metrics.GlutenTimeMetric$.withNanoTime(GlutenTimeMetric.scala:41)
2024-09-20T10:21:35.670075814+08:00 at org.apache.gluten.vectorized.CloseableCHColumnBatchIterator.hasNext(CloseableCHColumnBatchIterator.scala:42)
2024-09-20T10:21:35.670078261+08:00 at org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:37)
2024-09-20T10:21:35.670080648+08:00 at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:491)
2024-09-20T10:21:35.670083225+08:00 at scala.collection.Iterator$$anon$10.hasNext(Iterator.scala:460)
2024-09-20T10:21:35.670096914+08:00 at org.apache.spark.util.random.SamplingUtils$.reservoirSampleAndCount(SamplingUtils.scala:41)
2024-09-20T10:21:35.670105585+08:00 at org.apache.spark.RangePartitioner$.$anonfun$sketch$1(Partitioner.scala:306)
2024-09-20T10:21:35.670113927+08:00 at org.apache.spark.RangePartitioner$.$anonfun$sketch$1$adapted(Partitioner.scala:304)
2024-09-20T10:21:35.670116839+08:00 at org.apache.spark.rdd.RDD.$anonfun$mapPartitionsWithIndex$2(RDD.scala:915)
2024-09-20T10:21:35.670119327+08:00 at org.apache.spark.rdd.RDD.$anonfun$mapPartitionsWithIndex$2$adapted(RDD.scala:915)
2024-09-20T10:21:35.670122000+08:00 at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
2024-09-20T10:21:35.670124902+08:00 at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:373)
2024-09-20T10:21:35.670127818+08:00 at org.apache.spark.rdd.RDD.iterator(RDD.scala:337)
2024-09-20T10:21:35.670130570+08:00 at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90)
2024-09-20T10:21:35.670133342+08:00 at org.apache.spark.scheduler.Task.run(Task.scala:131)
2024-09-20T10:21:35.670135847+08:00 at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:506)
2024-09-20T10:21:35.670138536+08:00 at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1491)
2024-09-20T10:21:35.670141095+08:00 at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:509)
2024-09-20T10:21:35.670144107+08:00 ... 3 more
2024-09-20T10:21:35.670150461+08:00 Caused by: org.apache.gluten.exception.GlutenException: Unable to connect to HDFS: HdfsRpcException: RPC channel to "fs-hiido-yycluster01-yynn1.hiido.host.int.yy.com:38020" got protocol mismatch: RPC channel cannot find pending call: id = -33.: While executing SubstraitFileSource
spark executor.log spark-kubernetes-executor.log
sprak driver.log spark-kubernetes-driver.log
Spark version
None
Spark configurations
No response
System information
No response
Relevant logs
No response