apache / kyuubi

Apache Kyuubi is a distributed and multi-tenant gateway to provide serverless SQL on data warehouses and lakehouses.
https://kyuubi.apache.org/
Apache License 2.0
2.07k stars 902 forks source link

[Bug] when set iceberg configuration and query any exists db.table, kyuubi show table not found #5759

Open beat4ocean opened 10 months ago

beat4ocean commented 10 months ago

Code of Conduct

Search before asking

Describe the bug

my env: hadoop-3.3.6 spark-3.3.3 kyuubi-1.8.0/1.7.3 when set iceberg conf from the offical website: spark.sql.catalog.spark_catalog=org.apache.iceberg.spark.SparkCatalog spark.sql.catalog.spark_catalog.type=hive spark.sql.catalog.spark_catalog.uri=thrift://metastore-host:port spark.sql.extensions=org.apache.iceberg.spark.extensions.IcebergSparkSessionExtensions

and also put the iceberg-spark-runtime-3.3_2.12-1.4.2.jar into the spark_home/jars. and restart kyuubi and then I query any exists tables, it triggered a bug: Caused by: org.apache.spark.sql.AnalysisException: Table or view not found: t1; line 1 pos 21; 'Aggregate [unresolvedalias(count(1), None)] +- 'UnresolvedRelation [t1], [], false

at org.apache.spark.sql.catalyst.analysis.package$AnalysisErrorAt.failAnalysis(package.scala:42)
at org.apache.spark.sql.catalyst.analysis.CheckAnalysis.$anonfun$checkAnalysis$1(CheckAnalysis.scala:131)
at org.apache.spark.sql.catalyst.analysis.CheckAnalysis.$anonfun$checkAnalysis$1$adapted(CheckAnalysis.scala:102)
at org.apache.spark.sql.catalyst.trees.TreeNode.foreachUp(TreeNode.scala:367)
at org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$foreachUp$1(TreeNode.scala:366)
at org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$foreachUp$1$adapted(TreeNode.scala:366)
at scala.collection.Iterator.foreach(Iterator.scala:943)
at scala.collection.Iterator.foreach$(Iterator.scala:943)
at scala.collection.AbstractIterator.foreach(Iterator.scala:1431)
at scala.collection.IterableLike.foreach(IterableLike.scala:74)
at scala.collection.IterableLike.foreach$(IterableLike.scala:73)
at scala.collection.AbstractIterable.foreach(Iterable.scala:56)
at org.apache.spark.sql.catalyst.trees.TreeNode.foreachUp(TreeNode.scala:366)
at org.apache.spark.sql.catalyst.analysis.CheckAnalysis.checkAnalysis(CheckAnalysis.scala:102)
at org.apache.spark.sql.catalyst.analysis.CheckAnalysis.checkAnalysis$(CheckAnalysis.scala:97)
at org.apache.spark.sql.catalyst.analysis.Analyzer.checkAnalysis(Analyzer.scala:188)
at org.apache.spark.sql.catalyst.analysis.Analyzer.$anonfun$executeAndCheck$1(Analyzer.scala:214)
at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper$.markInAnalyzer(AnalysisHelper.scala:330)
at org.apache.spark.sql.catalyst.analysis.Analyzer.executeAndCheck(Analyzer.scala:211)
at org.apache.spark.sql.execution.QueryExecution.$anonfun$analyzed$1(QueryExecution.scala:76)
at org.apache.spark.sql.catalyst.QueryPlanningTracker.measurePhase(QueryPlanningTracker.scala:111)
at org.apache.spark.sql.execution.QueryExecution.$anonfun$executePhase$2(QueryExecution.scala:185)
at org.apache.spark.sql.execution.QueryExecution$.withInternalError(QueryExecution.scala:510)
at org.apache.spark.sql.execution.QueryExecution.$anonfun$executePhase$1(QueryExecution.scala:185)
at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:779)
at org.apache.spark.sql.execution.QueryExecution.executePhase(QueryExecution.scala:184)
at org.apache.spark.sql.execution.QueryExecution.analyzed$lzycompute(QueryExecution.scala:76)
at org.apache.spark.sql.execution.QueryExecution.analyzed(QueryExecution.scala:74)
at org.apache.spark.sql.execution.QueryExecution.assertAnalyzed(QueryExecution.scala:66)
at org.apache.spark.sql.Dataset$.$anonfun$ofRows$2(Dataset.scala:98)
at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:779)
at org.apache.spark.sql.Dataset$.ofRows(Dataset.scala:96)
at org.apache.spark.sql.SparkSession.$anonfun$sql$1(SparkSession.scala:622)
at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:779)
at org.apache.spark.sql.SparkSession.sql(SparkSession.scala:617)
at org.apache.kyuubi.engine.spark.operation.ExecuteStatement.$anonfun$executeStatement$1(ExecuteStatement.scala:86)
at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23)
at org.apache.kyuubi.engine.spark.operation.SparkOperation.$anonfun$withLocalProperties$1(SparkOperation.scala:155)
at org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:169)
at org.apache.kyuubi.engine.spark.operation.SparkOperation.withLocalProperties(SparkOperation.scala:139)
at org.apache.kyuubi.engine.spark.operation.ExecuteStatement.executeStatement(ExecuteStatement.scala:81)
... 6 more

at org.apache.kyuubi.KyuubiSQLException$.apply(KyuubiSQLException.scala:70)
at org.apache.kyuubi.operation.ExecuteStatement.waitStatementComplete(ExecuteStatement.scala:135)
at org.apache.kyuubi.operation.ExecuteStatement.$anonfun$runInternal$1(ExecuteStatement.scala:173)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:750) (state=,code=0)

Affects Version(s)

1.8.0/1.7.3

Kyuubi Server Log Output

2023-11-23 20:53:01.514 INFO KyuubiSessionManager-exec-pool: Thread-71 org.apache.kyuubi.operation.LaunchEngine: Processing bigdata's query[3fb0109a-a1dc-40b9-aa34-eb9b65ccb790]: PENDING_STATE -> RUNNING_STATE, statement:
LaunchEngine
2023-11-23 20:53:01.517 INFO KyuubiSessionManager-exec-pool: Thread-71 org.apache.kyuubi.shaded.curator.framework.imps.CuratorFrameworkImpl: Starting
2023-11-23 20:53:01.517 INFO KyuubiSessionManager-exec-pool: Thread-71 org.apache.kyuubi.shaded.zookeeper.ZooKeeper: Initiating client connection, connectString=hadoop202:2181,hadoop203:2181,hadoop204:2181 sessionTimeout=60000 watcher=org.apache.kyuubi.shaded.curator.ConnectionState@b8c1e4
2023-11-23 20:53:01.519 INFO KyuubiSessionManager-exec-pool: Thread-71-SendThread(hadoop202:2181) org.apache.kyuubi.shaded.zookeeper.ClientCnxn: Opening socket connection to server hadoop202/192.168.10.202:2181. Will not attempt to authenticate using SASL (unknown error)
2023-11-23 20:53:01.522 INFO KyuubiSessionManager-exec-pool: Thread-71-SendThread(hadoop202:2181) org.apache.kyuubi.shaded.zookeeper.ClientCnxn: Socket connection established to hadoop202/192.168.10.202:2181, initiating session
2023-11-23 20:53:01.528 INFO KyuubiSessionManager-exec-pool: Thread-71-SendThread(hadoop202:2181) org.apache.kyuubi.shaded.zookeeper.ClientCnxn: Session establishment complete on server hadoop202/192.168.10.202:2181, sessionid = 0xca00000206af0007, negotiated timeout = 40000
2023-11-23 20:53:01.528 INFO KyuubiSessionManager-exec-pool: Thread-71-EventThread org.apache.kyuubi.shaded.curator.framework.state.ConnectionStateManager: State change: CONNECTED
2023-11-23 20:53:01.579 INFO KyuubiSessionManager-exec-pool: Thread-71 org.apache.kyuubi.ha.client.zookeeper.ZookeeperDiscoveryClient: Get service instance:hadoop203:42141 engine id:application_1700742232452_0003 and version:1.8.0 under /kyuubi_1.8.0_GROUP_SPARK_SQL/bigdata/default
2023-11-23 20:53:01.614 INFO KyuubiSessionManager-exec-pool: Thread-71 org.apache.kyuubi.session.KyuubiSessionImpl: [bigdata:192.168.10.203] SessionHandle [74a56112-661b-47c5-a4ef-a3b351b2e9d4] - Connected to engine [hadoop203:42141]/[application_1700742232452_0003] with SessionHandle [74a56112-661b-47c5-a4ef-a3b351b2e9d4]]
2023-11-23 20:53:01.616 INFO Curator-Framework-0 org.apache.kyuubi.shaded.curator.framework.imps.CuratorFrameworkImpl: backgroundOperationsLoop exiting
2023-11-23 20:53:01.624 INFO KyuubiSessionManager-exec-pool: Thread-71 org.apache.kyuubi.shaded.zookeeper.ZooKeeper: Session: 0xca00000206af0007 closed
2023-11-23 20:53:01.624 INFO KyuubiSessionManager-exec-pool: Thread-71-EventThread org.apache.kyuubi.shaded.zookeeper.ClientCnxn: EventThread shut down for session: 0xca00000206af0007
2023-11-23 20:53:01.625 INFO KyuubiSessionManager-exec-pool: Thread-71 org.apache.kyuubi.operation.LaunchEngine: Processing bigdata's query[3fb0109a-a1dc-40b9-aa34-eb9b65ccb790]: RUNNING_STATE -> FINISHED_STATE, time taken: 0.109 seconds

Kyuubi Engine Log Output

no output

Kyuubi Server Configurations

# Z-Ordering Support
spark.sql.extensions=org.apache.kyuubi.sql.KyuubiSparkSQLExtension

# Auxiliary Optimization Rules
spark.sql.optimizer.insertZorderBeforeWriting.enabled=true
spark.sql.optimizer.zorderGlobalSort.enabled=true
spark.sql.optimizer.dropIgnoreNonExistent=false
spark.sql.optimizer.rebalanceBeforeZorder.enabled=false
spark.sql.optimizer.rebalanceZorderColumns.enabled=false
spark.sql.optimizer.twoPhaseRebalanceBeforeZorder.enabled=false
spark.sql.optimizer.zorderUsingOriginalOrdering.enabled=false
spark.sql.optimizer.inferRebalanceAndSortOrders.enabled=false
spark.sql.optimizer.inferRebalanceAndSortOrdersMaxColumns=3
spark.sql.optimizer.insertRepartitionBeforeWriteIfNoShuffle.enabled=false
spark.sql.optimizer.finalStageConfigIsolationWriteOnly.enabled=false

# Spark Dynamic Resource Allocation (DRA)
spark.dynamicAllocation.enabled=true
##false if prefer shuffle tracking than ESS
spark.dynamicAllocation.initialExecutors=1
spark.dynamicAllocation.minExecutors=1
spark.dynamicAllocation.maxExecutors=500
spark.dynamicAllocation.executorAllocationRatio=0.5
spark.dynamicAllocation.executorIdleTimeout=60s
spark.dynamicAllocation.cachedExecutorIdleTimeout=30min
# true if prefer shuffle tracking than ESS
spark.dynamicAllocation.shuffleTracking.enabled=false
spark.dynamicAllocation.shuffleTracking.timeout=30min
spark.dynamicAllocation.schedulerBacklogTimeout=1s
spark.dynamicAllocation.sustainedSchedulerBacklogTimeout=1s
spark.cleaner.periodicGC.interval=5min

# Spark Adaptive Query Execution (AQE)
spark.sql.adaptive.enabled=true
spark.sql.adaptive.forceApply=false
spark.sql.adaptive.logLevel=info
spark.sql.adaptive.advisoryPartitionSizeInBytes=256m
spark.sql.adaptive.coalescePartitions.enabled=true
spark.sql.adaptive.coalescePartitions.minPartitionSize=256m
spark.sql.adaptive.coalescePartitions.initialPartitionNum=8192
spark.sql.adaptive.fetchShuffleBlocksInBatch=true
spark.sql.adaptive.localShuffleReader.enabled=true
spark.sql.adaptive.skewJoin.enabled=true
spark.sql.adaptive.skewJoin.skewedPartitionFactor=5
spark.sql.adaptive.skewJoin.skewedPartitionThresholdInBytes=400m
spark.sql.adaptive.nonEmptyPartitionRatioForBroadcastJoin=0.2
spark.sql.adaptive.optimizer.excludedRules
spark.sql.autoBroadcastJoinThreshold=-1

# SPARK Paimon
spark.sql.catalog.paimon=org.apache.paimon.spark.SparkCatalog
spark.sql.catalog.paimon.warehouse=hdfs://hadoop202:8020/kyuubi_spark_paimon

# SPARK hudi
spark.serializer=org.apache.spark.serializer.KryoSerializer
spark.sql.extensions=org.apache.spark.sql.hudi.HoodieSparkSessionExtension
spark.sql.catalog.spark_catalog=org.apache.spark.sql.hudi.catalog.HoodieCatalog

# SPARK iceberg
spark.sql.catalog.spark_catalog=org.apache.iceberg.spark.SparkCatalog
spark.sql.catalog.spark_catalog.type=hive
spark.sql.catalog.spark_catalog.uri=thrift://hadoop203:9083
spark.sql.extensions=org.apache.iceberg.spark.extensions.IcebergSparkSessionExtensions

# SPARK lineage
spark.sql.queryExecutionListeners=org.apache.kyuubi.plugin.lineage.SparkOperationLineageQueryExecutionListener

Kyuubi Engine Configurations

# spark default conf
spark.master=yarn
spark.shuffle.service.enabled=true

Additional context

if i shield the iceberg config from the kyuubi-defaults.conf, the bug disappears

Are you willing to submit PR?

github-actions[bot] commented 10 months ago

Hello @beat4ocean, Thanks for finding the time to report the issue! We really appreciate the community's efforts to improve Apache Kyuubi.