apache / kyuubi

Apache Kyuubi is a distributed and multi-tenant gateway to provide serverless SQL on data warehouses and lakehouses.
https://kyuubi.apache.org/
Apache License 2.0
2.11k stars 915 forks source link

[Bug] Can not find suitable driver at first try when using spark + Iceberg(JDBC catalog) #4401

Open badbye opened 1 year ago

badbye commented 1 year ago

Code of Conduct

Search before asking

Describe the bug

Spark and Iceberg work well. After adding Kyuubi, the first "show tables" command always throw an error org.apache.iceberg.jdbc.UncheckedSQLException: Failed to connect: jdbc:mysql: xxx Caused by: java.sql.SQLException: No suitable driver found for jdbc:mysql: xxx.

Whenever I create a new connection and try to read Catalog information for the first time, this error appears. When I try to read again within the same connection, the information is returned successfully. I have retried multiple times, and this problem consistently appears. Additionally, when I use Spark alone without Kyuubi, this problem does not occur.

Affects Version(s)

1.6.1

Kyuubi Server Log Output

Caused by: java.sql.SQLException: No suitable driver found for jdbc:mysql://host:3306/iceberg_catalog
    at java.sql.DriverManager.getConnection(DriverManager.java:689)
    at java.sql.DriverManager.getConnection(DriverManager.java:208)
    at org.apache.iceberg.jdbc.JdbcClientPool.newClient(JdbcClientPool.java:55)
    ... 86 more

    at org.apache.kyuubi.KyuubiSQLException$.apply(KyuubiSQLException.scala:69)
    at org.apache.kyuubi.operation.ExecuteStatement.waitStatementComplete(ExecuteStatement.scala:119)
    at org.apache.kyuubi.operation.ExecuteStatement.$anonfun$runInternal$1(ExecuteStatement.scala:154)
    at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
    at java.util.concurrent.FutureTask.run(FutureTask.java:266)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
    at java.lang.Thread.run(Thread.java:748) (state=,code=0)

Kyuubi Engine Log Output

23/02/23 10:58:28 ERROR ExecuteStatement: Error operating ExecuteStatement: org.apache.iceberg.jdbc.UncheckedSQLException: Failed to connect: jdbc:mysql://host:3306/iceberg_catalog
    at org.apache.iceberg.jdbc.JdbcClientPool.newClient(JdbcClientPool.java:57)
    at org.apache.iceberg.jdbc.JdbcClientPool.newClient(JdbcClientPool.java:30)
    at org.apache.iceberg.ClientPoolImpl.get(ClientPoolImpl.java:125)
    at org.apache.iceberg.ClientPoolImpl.run(ClientPoolImpl.java:56)
    at org.apache.iceberg.ClientPoolImpl.run(ClientPoolImpl.java:51)
    at org.apache.iceberg.jdbc.JdbcCatalog.initializeCatalogTables(JdbcCatalog.java:121)
    at org.apache.iceberg.jdbc.JdbcCatalog.initialize(JdbcCatalog.java:106)
    at org.apache.iceberg.CatalogUtil.loadCatalog(CatalogUtil.java:238)
    at org.apache.iceberg.CatalogUtil.buildIcebergCatalog(CatalogUtil.java:283)
    at org.apache.iceberg.spark.SparkCatalog.buildIcebergCatalog(SparkCatalog.java:129)
    at org.apache.iceberg.spark.SparkCatalog.initialize(SparkCatalog.java:519)
    at org.apache.spark.sql.connector.catalog.Catalogs$.load(Catalogs.scala:60)
    at org.apache.spark.sql.connector.catalog.CatalogManager.$anonfun$catalog$1(CatalogManager.scala:53)
    at scala.collection.mutable.HashMap.getOrElseUpdate(HashMap.scala:86)
    at org.apache.spark.sql.connector.catalog.CatalogManager.catalog(CatalogManager.scala:53)
    at org.apache.spark.sql.connector.catalog.LookupCatalog$CatalogAndNamespace$.unapply(LookupCatalog.scala:86)
    at org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveNamespace$$anonfun$apply$11.applyOrElse(Analyzer.scala:877)
    at org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveNamespace$$anonfun$apply$11.applyOrElse(Analyzer.scala:864)
    at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.$anonfun$resolveOperatorsDownWithPruning$2(AnalysisHelper.scala:170)
    at org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:176)
    at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.$anonfun$resolveOperatorsDownWithPruning$1(AnalysisHelper.scala:170)
    at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper$.allowInvokingTransformsInAnalyzer(AnalysisHelper.scala:323)
    at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.resolveOperatorsDownWithPruning(AnalysisHelper.scala:168)
    at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.resolveOperatorsDownWithPruning$(AnalysisHelper.scala:164)
    at org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.resolveOperatorsDownWithPruning(LogicalPlan.scala:30)
    at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.$anonfun$resolveOperatorsDownWithPruning$4(AnalysisHelper.scala:175)
    at org.apache.spark.sql.catalyst.trees.UnaryLike.mapChildren(TreeNode.scala:1228)
    at org.apache.spark.sql.catalyst.trees.UnaryLike.mapChildren$(TreeNode.scala:1227)
    at org.apache.spark.sql.catalyst.plans.logical.ShowTables.mapChildren(v2Commands.scala:690)
    at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.$anonfun$resolveOperatorsDownWithPruning$1(AnalysisHelper.scala:175)
    at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper$.allowInvokingTransformsInAnalyzer(AnalysisHelper.scala:323)
    at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.resolveOperatorsDownWithPruning(AnalysisHelper.scala:168)
    at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.resolveOperatorsDownWithPruning$(AnalysisHelper.scala:164)
    at org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.resolveOperatorsDownWithPruning(LogicalPlan.scala:30)
    at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.resolveOperatorsWithPruning(AnalysisHelper.scala:99)
    at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.resolveOperatorsWithPruning$(AnalysisHelper.scala:96)
    at org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.resolveOperatorsWithPruning(LogicalPlan.scala:30)
    at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.resolveOperators(AnalysisHelper.scala:76)
    at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.resolveOperators$(AnalysisHelper.scala:75)
    at org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.resolveOperators(LogicalPlan.scala:30)
    at org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveNamespace.apply(Analyzer.scala:864)
    at org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveNamespace.apply(Analyzer.scala:862)
    at org.apache.spark.sql.catalyst.rules.RuleExecutor.$anonfun$execute$2(RuleExecutor.scala:211)
    at scala.collection.LinearSeqOptimized.foldLeft(LinearSeqOptimized.scala:126)
    at scala.collection.LinearSeqOptimized.foldLeft$(LinearSeqOptimized.scala:122)
    at scala.collection.immutable.List.foldLeft(List.scala:91)
    at org.apache.spark.sql.catalyst.rules.RuleExecutor.$anonfun$execute$1(RuleExecutor.scala:208)
    at org.apache.spark.sql.catalyst.rules.RuleExecutor.$anonfun$execute$1$adapted(RuleExecutor.scala:200)
    at scala.collection.immutable.List.foreach(List.scala:431)
    at org.apache.spark.sql.catalyst.rules.RuleExecutor.execute(RuleExecutor.scala:200)
    at org.apache.spark.sql.catalyst.analysis.Analyzer.org$apache$spark$sql$catalyst$analysis$Analyzer$$executeSameContext(Analyzer.scala:227)
    at org.apache.spark.sql.catalyst.analysis.Analyzer.$anonfun$execute$1(Analyzer.scala:223)
    at org.apache.spark.sql.catalyst.analysis.AnalysisContext$.withNewAnalysisContext(Analyzer.scala:172)
    at org.apache.spark.sql.catalyst.analysis.Analyzer.execute(Analyzer.scala:223)
    at org.apache.spark.sql.catalyst.analysis.Analyzer.execute(Analyzer.scala:187)
    at org.apache.spark.sql.catalyst.rules.RuleExecutor.$anonfun$executeAndTrack$1(RuleExecutor.scala:179)
    at org.apache.spark.sql.catalyst.QueryPlanningTracker$.withTracker(QueryPlanningTracker.scala:88)
    at org.apache.spark.sql.catalyst.rules.RuleExecutor.executeAndTrack(RuleExecutor.scala:179)
    at org.apache.spark.sql.catalyst.analysis.Analyzer.$anonfun$executeAndCheck$1(Analyzer.scala:208)
    at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper$.markInAnalyzer(AnalysisHelper.scala:330)
    at org.apache.spark.sql.catalyst.analysis.Analyzer.executeAndCheck(Analyzer.scala:207)
    at org.apache.spark.sql.execution.QueryExecution.$anonfun$analyzed$1(QueryExecution.scala:76)
    at org.apache.spark.sql.catalyst.QueryPlanningTracker.measurePhase(QueryPlanningTracker.scala:111)
    at org.apache.spark.sql.execution.QueryExecution.$anonfun$executePhase$2(QueryExecution.scala:185)
    at org.apache.spark.sql.execution.QueryExecution$.withInternalError(QueryExecution.scala:510)
    at org.apache.spark.sql.execution.QueryExecution.$anonfun$executePhase$1(QueryExecution.scala:185)
    at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:779)
    at org.apache.spark.sql.execution.QueryExecution.executePhase(QueryExecution.scala:184)
    at org.apache.spark.sql.execution.QueryExecution.analyzed$lzycompute(QueryExecution.scala:76)
    at org.apache.spark.sql.execution.QueryExecution.analyzed(QueryExecution.scala:74)
    at org.apache.spark.sql.execution.QueryExecution.assertAnalyzed(QueryExecution.scala:66)
    at org.apache.spark.sql.Dataset$.$anonfun$ofRows$2(Dataset.scala:99)
    at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:779)
    at org.apache.spark.sql.Dataset$.ofRows(Dataset.scala:97)
    at org.apache.spark.sql.SparkSession.$anonfun$sql$1(SparkSession.scala:622)
    at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:779)
    at org.apache.spark.sql.SparkSession.sql(SparkSession.scala:617)
    at org.apache.kyuubi.engine.spark.operation.ExecuteStatement.$anonfun$executeStatement$1(ExecuteStatement.scala:94)
    at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23)
    at org.apache.kyuubi.engine.spark.operation.SparkOperation.withLocalProperties(SparkOperation.scala:87)
    at org.apache.kyuubi.engine.spark.operation.ExecuteStatement.org$apache$kyuubi$engine$spark$operation$ExecuteStatement$$executeStatement(ExecuteStatement.scala:89)
    at org.apache.kyuubi.engine.spark.operation.ExecuteStatement$$anon$1.run(ExecuteStatement.scala:125)
    at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
    at java.util.concurrent.FutureTask.run(FutureTask.java:266)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
    at java.lang.Thread.run(Thread.java:748)
Caused by: java.sql.SQLException: No suitable driver found for jdbc:mysql://host:3306/iceberg_catalog
    at java.sql.DriverManager.getConnection(DriverManager.java:689)
    at java.sql.DriverManager.getConnection(DriverManager.java:208)
    at org.apache.iceberg.jdbc.JdbcClientPool.newClient(JdbcClientPool.java:55)
    ... 86 more

Kyuubi Server Configurations

spark.serializer = org.apache.spark.serializer.KryoSerializer
spark.sql.extensions = org.apache.iceberg.spark.extensions.IcebergSparkSessionExtensions
spark.hive.server2.thrift.resultset.default.fetch.size=10000
spark.master=local[2]

# for iceberg
spark.sql.catalog.demo = org.apache.iceberg.spark.SparkCatalog
spark.sql.catalog.demo.catalog-impl = org.apache.iceberg.jdbc.JdbcCatalog
spark.sql.catalog.demo.uri = jdbc:mysql://host:3306/iceberg_catalog
spark.sql.catalog.demo.jdbc.user = xxx
spark.sql.catalog.demo.jdbc.password = xxx
spark.sql.catalog.demo.driver = com.mysql.jdbc.Driver
spark.sql.catalog.demo.io-impl = org.apache.iceberg.aliyun.oss.OSSFileIO
spark.sql.catalog.demo.oss.endpoint = oss-cn-xxx-internal.aliyuncs.com
spark.sql.catalog.demo.client.access-key-id = xxx
spark.sql.catalog.demo.client.access-key-secret = xxx
spark.sql.catalog.demo.warehouse = xxx

Kyuubi Engine Configurations

No response

Additional context

No response

Are you willing to submit PR?

github-actions[bot] commented 1 year ago

Hello @badbye, Thanks for finding the time to report the issue! We really appreciate the community's efforts to improve Apache Kyuubi.

pan3793 commented 1 year ago

I'm not sure what happens actually, have a quick look in DriverManager, it may be something wrong about classloader, please try the following approach to explicitly request DriverManager to load MySQL driver

spark.driver.extraJavaOptions=-Djdbc.drivers=com.mysql.jdbc.Diriver
badbye commented 1 year ago

@pan3793 Thanks for your quick reply. It does not work.

I am also very confused. From what I have observed, the response time from execution to error is very fast. When the second execution is successful, it is slower, taking around 1 second or more. When using Spark alone, the first execution of "show table" takes around 2-3 seconds. Perhaps there are some delay loading mechanisms that are not effective during the first execution?

pan3793 commented 1 year ago

No suspicious clue found, need more time to investigate.

minyk commented 1 year ago

@badbye how to add mysql driver to spark? Add it to the spark's jars directory?

badbye commented 1 year ago

@badbye how to add mysql driver to spark? Add it to the spark's jars directory?

Yes. All dependencies are saved in $SPARK_HOME/jars.