Open meyergin opened 2 years ago
Getting the same error here using spark-sql:
spark-sql> show table extended in ice.snapshots like '*';
Error in query: SHOW TABLE EXTENDED is not supported for v2 tables.;
ShowTableExtended *, [namespace#179, tableName#180, isTemporary#181, information#182]
+- ResolvedNamespace org.apache.iceberg.spark.SparkCatalog@272d0dd3, [snapshots]
Environment: Spark 3.3.0 org.apache.iceberg:iceberg-aws:0.14.0, org.apache.iceberg:iceberg-spark-runtime-3.3_2.12:0.14.0, org.apache.hadoop:hadoop-aws:3.3.3, software.amazon.awssdk:bundle:2.17.131, software.amazon.awssdk:url-connection-client:2.17.131, software.amazon.awssdk:kms:2.17.131
Command:
spark-sql \
--packages org.apache.iceberg:iceberg-aws:0.14.0,org.apache.iceberg:iceberg-spark-runtime-3.3_2.12:0.14.0,org.apache.hadoop:hadoop-aws:3.3.3,software.amazon.awssdk:bundle:2.17.131,software.amazon.awssdk:url-connection-client:2.17.131,software.amazon.awssdk:kms:2.17.131 \
--conf spark.sql.extensions=org.apache.iceberg.spark.extensions.IcebergSparkSessionExtensions \
--conf spark.sql.catalog.ice=org.apache.iceberg.spark.SparkCatalog \
--conf spark.sql.catalog.ice.catalog-impl=org.apache.iceberg.aws.glue.GlueCatalog \
--conf spark.sql.catalog.ice.warehouse=$WAREHOUSE_BUCKET_LOC \
--conf spark.sql.catalog.ice.io-impl=org.apache.iceberg.aws.s3.S3FileIO \
--conf iceberg.engine.hive.enabled=true
The error seems to be thrown here so I wonder if the conversion (as per the comment in the code) should be done somewhere in iceberg. Maybe the catalog?
Looking at the full stack trace, iceberg not involved?
scala> lastException.printStackTrace
org.apache.spark.sql.AnalysisException: SHOW TABLE EXTENDED is not supported for v2 tables.;
ShowTableExtended *, [namespace#0, tableName#1, isTemporary#2, information#3]
+- ResolvedNamespace org.apache.iceberg.spark.SparkCatalog@d28a805, [snapshots]
at org.apache.spark.sql.errors.QueryCompilationErrors$.commandUnsupportedInV2TableError(QueryCompilationErrors.scala:1507)
at org.apache.spark.sql.catalyst.analysis.CheckAnalysis.$anonfun$checkAnalysis$1(CheckAnalysis.scala:162)
at org.apache.spark.sql.catalyst.analysis.CheckAnalysis.$anonfun$checkAnalysis$1$adapted(CheckAnalysis.scala:101)
at org.apache.spark.sql.catalyst.trees.TreeNode.foreachUp(TreeNode.scala:367)
at org.apache.spark.sql.catalyst.analysis.CheckAnalysis.checkAnalysis(CheckAnalysis.scala:101)
at org.apache.spark.sql.catalyst.analysis.CheckAnalysis.checkAnalysis$(CheckAnalysis.scala:96)
at org.apache.spark.sql.catalyst.analysis.Analyzer.checkAnalysis(Analyzer.scala:187)
at org.apache.spark.sql.catalyst.analysis.Analyzer.$anonfun$executeAndCheck$1(Analyzer.scala:210)
at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper$.markInAnalyzer(AnalysisHelper.scala:330)
at org.apache.spark.sql.catalyst.analysis.Analyzer.executeAndCheck(Analyzer.scala:207)
at org.apache.spark.sql.execution.QueryExecution.$anonfun$analyzed$1(QueryExecution.scala:76)
at org.apache.spark.sql.catalyst.QueryPlanningTracker.measurePhase(QueryPlanningTracker.scala:111)
at org.apache.spark.sql.execution.QueryExecution.$anonfun$executePhase$2(QueryExecution.scala:185)
at org.apache.spark.sql.execution.QueryExecution$.withInternalError(QueryExecution.scala:510)
at org.apache.spark.sql.execution.QueryExecution.$anonfun$executePhase$1(QueryExecution.scala:185)
at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:779)
at org.apache.spark.sql.execution.QueryExecution.executePhase(QueryExecution.scala:184)
at org.apache.spark.sql.execution.QueryExecution.analyzed$lzycompute(QueryExecution.scala:76)
at org.apache.spark.sql.execution.QueryExecution.analyzed(QueryExecution.scala:74)
at org.apache.spark.sql.execution.QueryExecution.assertAnalyzed(QueryExecution.scala:66)
at org.apache.spark.sql.Dataset$.$anonfun$ofRows$2(Dataset.scala:99)
at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:779)
at org.apache.spark.sql.Dataset$.ofRows(Dataset.scala:97)
at org.apache.spark.sql.SparkSession.$anonfun$sql$1(SparkSession.scala:622)
at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:779)
at org.apache.spark.sql.SparkSession.sql(SparkSession.scala:617)
at $line16.$read$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw.<init>(<console>:23)
at $line16.$read$$iw$$iw$$iw$$iw$$iw$$iw$$iw.<init>(<console>:27)
at $line16.$read$$iw$$iw$$iw$$iw$$iw$$iw.<init>(<console>:29)
at $line16.$read$$iw$$iw$$iw$$iw$$iw.<init>(<console>:31)
at $line16.$read$$iw$$iw$$iw$$iw.<init>(<console>:33)
at $line16.$read$$iw$$iw$$iw.<init>(<console>:35)
at $line16.$read$$iw$$iw.<init>(<console>:37)
at $line16.$read$$iw.<init>(<console>:39)
at $line16.$read.<init>(<console>:41)
at $line16.$read$.<init>(<console>:45)
at $line16.$read$.<clinit>(<console>)
at $line16.$eval$.$print$lzycompute(<console>:7)
at $line16.$eval$.$print(<console>:6)
at $line16.$eval.$print(<console>)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at scala.tools.nsc.interpreter.IMain$ReadEvalPrint.call(IMain.scala:747)
at scala.tools.nsc.interpreter.IMain$Request.loadAndRun(IMain.scala:1020)
at scala.tools.nsc.interpreter.IMain.$anonfun$interpret$1(IMain.scala:568)
at scala.reflect.internal.util.ScalaClassLoader.asContext(ScalaClassLoader.scala:36)
at scala.reflect.internal.util.ScalaClassLoader.asContext$(ScalaClassLoader.scala:116)
at scala.reflect.internal.util.AbstractFileClassLoader.asContext(AbstractFileClassLoader.scala:41)
at scala.tools.nsc.interpreter.IMain.loadAndRunReq$1(IMain.scala:567)
at scala.tools.nsc.interpreter.IMain.interpret(IMain.scala:594)
at scala.tools.nsc.interpreter.IMain.interpret(IMain.scala:564)
at scala.tools.nsc.interpreter.ILoop.interpretStartingWith(ILoop.scala:865)
at scala.tools.nsc.interpreter.ILoop.command(ILoop.scala:733)
at scala.tools.nsc.interpreter.ILoop.processLine(ILoop.scala:435)
at scala.tools.nsc.interpreter.ILoop.loop(ILoop.scala:456)
at org.apache.spark.repl.SparkILoop.process(SparkILoop.scala:239)
at org.apache.spark.repl.Main$.doMain(Main.scala:78)
at org.apache.spark.repl.Main$.main(Main.scala:58)
at org.apache.spark.repl.Main.main(Main.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52)
at org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:958)
at org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:180)
at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:203)
at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:90)
at org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:1046)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:1055)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
We are working on support for Iceberg in dbt-spark. Since Iceberg does not support show tables extended
we fall back to show table
and many describe table
to determine if a given table is an iceberg table or not.
Normally (for Hudi and Delta) dbt-spark uses the show tables extended
and parses the information from the information
column to determine if it's dealing with a Hudi or Delta table.
Iterating the tables and running describe table
can get quite slow when there are hundreds of tables in a schema.
It would be much better if Iceberg also supported show tables extended
Tagging @Fokko who is also working on this
Thanks for the background @cccs-jc. To add to that, this is the original issue in Spark, and a PR is ready: https://github.com/apache/spark/pull/37588. It is not directly related to Iceberg.
Ha I see. Thanks for looking into this
This issue has been automatically marked as stale because it has been open for 180 days with no activity. It will be closed in next 14 days if no further activity occurs. To permanently prevent this issue from being considered stale, add the label 'not-stale', but commenting on the issue is preferred when possible.
It looks like there is some activity again on the Spark side: https://github.com/apache/spark/pull/37588#issuecomment-1612349461
@Fokko this is nice to hear. Thanks for letting me know.
Not sure if the error's stem is the same, but I'm facing the same behaviour when using AWS Glue as metastore
Not sure if the error's stem is the same, but I'm facing the same behaviour when using AWS Glue as metastore
Ya, I faced the same issue too. Anyone knows workaround?
+1
This issue has been automatically marked as stale because it has been open for 180 days with no activity. It will be closed in next 14 days if no further activity occurs. To permanently prevent this issue from being considered stale, add the label 'not-stale', but commenting on the issue is preferred when possible.
The PR https://github.com/apache/spark/pull/37588 is now merged for quite some time, but I still get this error. Is there a change necessary to iceberg now or what would be the correct solution?
Oh, it appears the change wasn't released yet after all, v3.5.2
contains the old code still, it seems. I'm not used to this release strategy apparently, coming from the Node world. In the v4.0.0-preview
releases the changes seem to have been applied.
Apache Iceberg version
0.14.0 (latest release)
Query engine
Spark
Please describe the bug 🐞
Spark + Hive Metastore When I query
show table extended in SCHEMA_NAME like '*';
at Spark-SQL, it throws error: