Open richardcerny opened 7 months ago
We're hitting this as well, @richardcerny were you able to get to a resolution?
Found a workaround:
Went from:
def listTables(databaseName: String): Array[String] = {
if (databaseExists(databaseName)) {
return spark.catalog.listTables(databaseName).collect().map(_.name)
}
Array.empty[String]
}
To this:
def listTables(databaseName: String): Array[String] = {
if (databaseExists(databaseName)) {
// Delta 2.4.0 has a regression with Spark 3.4.1 that makes
// spark.catalog.listTables calls fail
//
// >>> https://github.com/delta-io/delta/issues/2610
//
return spark
.sql(s"SHOW TABLES IN $databaseName")
.collect()
.map(row => row.getAs[String]("tableName"))
}
Array.empty[String]
}
thank you @mdrakiburrahman. We have used the same workaround.
It seems the problem is this line, val isTemp = row.getBoolean(2)
: https://github.com/apache/spark/blob/1eb558c3a6fbdd59e5a305bc3ab12ce748f6511f/sql/core/src/main/scala/org/apache/spark/sql/internal/CatalogImpl.scala#L126
returns false when the catalog is set to DeltaCatalog
You can see it by starting a spark shell with/without Delta and run
spark.range(0,2).createOrReplaceTempView("abc")
val namespace = Seq("spark_catalog", "default")
val plan = org.apache.spark.sql.catalyst.plans.logical.ShowTables(org.apache.spark.sql.catalyst.analysis.UnresolvedNamespace(namespace), None)
val tables = spark.sessionState.executePlan(plan).toRdd.collect().map { row =>
val tableName = row.getString(1)
println(tableName)
val namespaceName = row.getString(0)
println(namespaceName)
val isTemp = row.getBoolean(2)
println(isTemp)
if (isTemp) {
// Temp views do not belong to any catalog. We shouldn't prepend the catalog name here.
// val ns = if (namespaceName.isEmpty) Nil else Seq(namespaceName)
// makeTable(ns :+ tableName)
} else {
//val ns = parseIdent(namespaceName)
val ns = spark.sessionState.sqlParser.parseMultipartIdentifier(namespaceName)
//makeTable(catalog.name() +: ns :+ tableName)
}
}
@cloud-fan I have seen some contribs you did for Delta and Spark related to catalog. Any insights?
@felipepessoto thanks for providing the repro! What was the error you hit? And can you also post the result of spark.sessionState.executePlan(plan).analyzed.treeString
?
@cloud-fan it is the same error that @richardcerny reported. In spark-shell, using my repro code:
org.apache.spark.sql.catalyst.parser.ParseException:
[PARSE_SYNTAX_ERROR] Syntax error at or near end of input.(line 1, pos 0)
== SQL ==
^^^
at org.apache.spark.sql.catalyst.parser.ParseException.withCommand(ParseDriver.scala:306)
at org.apache.spark.sql.catalyst.parser.AbstractSqlParser.parse(ParseDriver.scala:144)
at org.apache.spark.sql.execution.SparkSqlParser.parse(SparkSqlParser.scala:52)
at org.apache.spark.sql.catalyst.parser.AbstractSqlParser.parseMultipartIdentifier(ParseDriver.scala:67)
at $anonfun$tables$1(<console>:37)
at $anonfun$tables$1$adapted(<console>:23)
at scala.collection.TraversableLike.$anonfun$map$1(TraversableLike.scala:286)
at scala.collection.IndexedSeqOptimized.foreach(IndexedSeqOptimized.scala:36)
at scala.collection.IndexedSeqOptimized.foreach$(IndexedSeqOptimized.scala:33)
at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:198)
at scala.collection.TraversableLike.map(TraversableLike.scala:286)
at scala.collection.TraversableLike.map$(TraversableLike.scala:279)
at scala.collection.mutable.ArrayOps$ofRef.map(ArrayOps.scala:198)
... 64 elided
Calling spark.catalog.listTables().show()
:
org.apache.spark.sql.catalyst.parser.ParseException:
[PARSE_SYNTAX_ERROR] Syntax error at or near end of input.(line 1, pos 0)
== SQL ==
^^^
at org.apache.spark.sql.catalyst.parser.ParseException.withCommand(ParseDriver.scala:306)
at org.apache.spark.sql.catalyst.parser.AbstractSqlParser.parse(ParseDriver.scala:144)
at org.apache.spark.sql.execution.SparkSqlParser.parse(SparkSqlParser.scala:52)
at org.apache.spark.sql.catalyst.parser.AbstractSqlParser.parseMultipartIdentifier(ParseDriver.scala:67)
at org.apache.spark.sql.internal.CatalogImpl.parseIdent(CatalogImpl.scala:49)
at org.apache.spark.sql.internal.CatalogImpl.$anonfun$listTables$1(CatalogImpl.scala:132)
at scala.collection.TraversableLike.$anonfun$map$1(TraversableLike.scala:286)
at scala.collection.IndexedSeqOptimized.foreach(IndexedSeqOptimized.scala:36)
at scala.collection.IndexedSeqOptimized.foreach$(IndexedSeqOptimized.scala:33)
at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:198)
at scala.collection.TraversableLike.map(TraversableLike.scala:286)
at scala.collection.TraversableLike.map$(TraversableLike.scala:279)
at scala.collection.mutable.ArrayOps$ofRef.map(ArrayOps.scala:198)
at org.apache.spark.sql.internal.CatalogImpl.listTables(CatalogImpl.scala:123)
at org.apache.spark.sql.internal.CatalogImpl.listTables(CatalogImpl.scala:98)
... 47 elided
treeString:
scala> println(spark.sessionState.executePlan(plan).analyzed.treeString)
ShowTables [namespace#2, tableName#3, isTemporary#4]
+- ResolvedNamespace org.apache.spark.sql.delta.catalog.DeltaCatalog@32855523, [default]
one workaround is to set spark.sql.legacy.useV1Command
to true. Ideally DeltaCatalog
should not return views in listTables
.
Bug
Describe the problem
After upgrade from Spark spark_version 3.3.2 to 3.4.1 catalog.listTables command is always failing after the "createOrReplaceTempView" is called. See code snipped bellow.
Steps to reproduce
Observed results
Expected results
Shows list of tables.
Further details
While removing following configuration from the spark session, the code works, but the catalog extension is necessary for other features.
Environment information
Willingness to contribute
The Delta Lake Community encourages bug fix contributions. Would you or another member of your organization be willing to contribute a fix for this bug to the Delta Lake code base?