apache / iceberg

Apache Iceberg
https://iceberg.apache.org/
Apache License 2.0
6.23k stars 2.17k forks source link

show table extended not supported for v2 table. #5782

Open meyergin opened 2 years ago

meyergin commented 2 years ago

Apache Iceberg version

0.14.0 (latest release)

Query engine

Spark

Please describe the bug 🐞

Spark + Hive Metastore When I query show table extended in SCHEMA_NAME like '*'; at Spark-SQL, it throws error:

Error in query: SHOW TABLE EXTENDED is not supported for v2 tables.;
    ShowTableExtended *, [namespace#906, tableName#907, isTemporary#908, information#909]
    +- ResolvedNamespace org.apache.iceberg.spark.SparkCatalog@49ea646b, [SCHEMA_NAME]

        at org.apache.spark.sql.errors.QueryCompilationErrors$.commandUnsupportedInV2TableError(QueryCompilationErrors.scala:1507)
        at org.apache.spark.sql.catalyst.analysis.CheckAnalysis.$anonfun$checkAnalysis$1(CheckAnalysis.scala:162)
        at org.apache.spark.sql.catalyst.analysis.CheckAnalysis.$anonfun$checkAnalysis$1$adapted(CheckAnalysis.scala:101)
        at org.apache.spark.sql.catalyst.trees.TreeNode.foreachUp(TreeNode.scala:367)
        at org.apache.spark.sql.catalyst.analysis.CheckAnalysis.checkAnalysis(CheckAnalysis.scala:101)
        at org.apache.spark.sql.catalyst.analysis.CheckAnalysis.checkAnalysis$(CheckAnalysis.scala:96)
        at org.apache.spark.sql.catalyst.analysis.Analyzer.checkAnalysis(Analyzer.scala:187)
        at org.apache.spark.sql.catalyst.analysis.Analyzer.$anonfun$executeAndCheck$1(Analyzer.scala:210)
        at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper$.markInAnalyzer(AnalysisHelper.scala:330)
        at org.apache.spark.sql.catalyst.analysis.Analyzer.executeAndCheck(Analyzer.scala:207)
        at org.apache.spark.sql.execution.QueryExecution.$anonfun$analyzed$1(QueryExecution.scala:76)
        at org.apache.spark.sql.catalyst.QueryPlanningTracker.measurePhase(QueryPlanningTracker.scala:111)
        at org.apache.spark.sql.execution.QueryExecution.$anonfun$executePhase$2(QueryExecution.scala:185)
        at org.apache.spark.sql.execution.QueryExecution$.withInternalError(QueryExecution.scala:510)
        at org.apache.spark.sql.execution.QueryExecution.$anonfun$executePhase$1(QueryExecution.scala:185)
        at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:779)
        at org.apache.spark.sql.execution.QueryExecution.executePhase(QueryExecution.scala:184)
        at org.apache.spark.sql.execution.QueryExecution.analyzed$lzycompute(QueryExecution.scala:76)
        at org.apache.spark.sql.execution.QueryExecution.analyzed(QueryExecution.scala:74)
        at org.apache.spark.sql.execution.QueryExecution.assertAnalyzed(QueryExecution.scala:66)
        at org.apache.spark.sql.Dataset$.$anonfun$ofRows$2(Dataset.scala:99)
        at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:779)
        at org.apache.spark.sql.Dataset$.ofRows(Dataset.scala:97)
        at org.apache.spark.sql.SparkSession.$anonfun$sql$1(SparkSession.scala:622)
        at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:779)
        at org.apache.spark.sql.SparkSession.sql(SparkSession.scala:617)
        at org.apache.spark.sql.SQLContext.sql(SQLContext.scala:651)
        at org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation.org$apache$spark$sql$hive$thriftserver$SparkExecuteStatementOperation$$execute(SparkExecuteStatementOperation.scala:291)
        ... 16 more
ja-michel commented 2 years ago

Getting the same error here using spark-sql:

spark-sql> show table extended in ice.snapshots like '*';
Error in query: SHOW TABLE EXTENDED is not supported for v2 tables.;
ShowTableExtended *, [namespace#179, tableName#180, isTemporary#181, information#182]
+- ResolvedNamespace org.apache.iceberg.spark.SparkCatalog@272d0dd3, [snapshots]

Environment: Spark 3.3.0 org.apache.iceberg:iceberg-aws:0.14.0, org.apache.iceberg:iceberg-spark-runtime-3.3_2.12:0.14.0, org.apache.hadoop:hadoop-aws:3.3.3, software.amazon.awssdk:bundle:2.17.131, software.amazon.awssdk:url-connection-client:2.17.131, software.amazon.awssdk:kms:2.17.131

Command:

spark-sql \
    --packages  org.apache.iceberg:iceberg-aws:0.14.0,org.apache.iceberg:iceberg-spark-runtime-3.3_2.12:0.14.0,org.apache.hadoop:hadoop-aws:3.3.3,software.amazon.awssdk:bundle:2.17.131,software.amazon.awssdk:url-connection-client:2.17.131,software.amazon.awssdk:kms:2.17.131 \
    --conf spark.sql.extensions=org.apache.iceberg.spark.extensions.IcebergSparkSessionExtensions \
    --conf spark.sql.catalog.ice=org.apache.iceberg.spark.SparkCatalog \
    --conf spark.sql.catalog.ice.catalog-impl=org.apache.iceberg.aws.glue.GlueCatalog \
    --conf spark.sql.catalog.ice.warehouse=$WAREHOUSE_BUCKET_LOC \
    --conf spark.sql.catalog.ice.io-impl=org.apache.iceberg.aws.s3.S3FileIO \
    --conf iceberg.engine.hive.enabled=true
ja-michel commented 2 years ago

The error seems to be thrown here so I wonder if the conversion (as per the comment in the code) should be done somewhere in iceberg. Maybe the catalog?

ja-michel commented 2 years ago

Looking at the full stack trace, iceberg not involved?

scala> lastException.printStackTrace
org.apache.spark.sql.AnalysisException: SHOW TABLE EXTENDED is not supported for v2 tables.;
ShowTableExtended *, [namespace#0, tableName#1, isTemporary#2, information#3]
+- ResolvedNamespace org.apache.iceberg.spark.SparkCatalog@d28a805, [snapshots]

        at org.apache.spark.sql.errors.QueryCompilationErrors$.commandUnsupportedInV2TableError(QueryCompilationErrors.scala:1507)
        at org.apache.spark.sql.catalyst.analysis.CheckAnalysis.$anonfun$checkAnalysis$1(CheckAnalysis.scala:162)
        at org.apache.spark.sql.catalyst.analysis.CheckAnalysis.$anonfun$checkAnalysis$1$adapted(CheckAnalysis.scala:101)
        at org.apache.spark.sql.catalyst.trees.TreeNode.foreachUp(TreeNode.scala:367)
        at org.apache.spark.sql.catalyst.analysis.CheckAnalysis.checkAnalysis(CheckAnalysis.scala:101)
        at org.apache.spark.sql.catalyst.analysis.CheckAnalysis.checkAnalysis$(CheckAnalysis.scala:96)
        at org.apache.spark.sql.catalyst.analysis.Analyzer.checkAnalysis(Analyzer.scala:187)
        at org.apache.spark.sql.catalyst.analysis.Analyzer.$anonfun$executeAndCheck$1(Analyzer.scala:210)
        at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper$.markInAnalyzer(AnalysisHelper.scala:330)
        at org.apache.spark.sql.catalyst.analysis.Analyzer.executeAndCheck(Analyzer.scala:207)
        at org.apache.spark.sql.execution.QueryExecution.$anonfun$analyzed$1(QueryExecution.scala:76)
        at org.apache.spark.sql.catalyst.QueryPlanningTracker.measurePhase(QueryPlanningTracker.scala:111)
        at org.apache.spark.sql.execution.QueryExecution.$anonfun$executePhase$2(QueryExecution.scala:185)
        at org.apache.spark.sql.execution.QueryExecution$.withInternalError(QueryExecution.scala:510)
        at org.apache.spark.sql.execution.QueryExecution.$anonfun$executePhase$1(QueryExecution.scala:185)
        at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:779)
        at org.apache.spark.sql.execution.QueryExecution.executePhase(QueryExecution.scala:184)
        at org.apache.spark.sql.execution.QueryExecution.analyzed$lzycompute(QueryExecution.scala:76)
        at org.apache.spark.sql.execution.QueryExecution.analyzed(QueryExecution.scala:74)
        at org.apache.spark.sql.execution.QueryExecution.assertAnalyzed(QueryExecution.scala:66)
        at org.apache.spark.sql.Dataset$.$anonfun$ofRows$2(Dataset.scala:99)
        at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:779)
        at org.apache.spark.sql.Dataset$.ofRows(Dataset.scala:97)
        at org.apache.spark.sql.SparkSession.$anonfun$sql$1(SparkSession.scala:622)
        at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:779)
        at org.apache.spark.sql.SparkSession.sql(SparkSession.scala:617)
        at $line16.$read$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw.<init>(<console>:23)
        at $line16.$read$$iw$$iw$$iw$$iw$$iw$$iw$$iw.<init>(<console>:27)
        at $line16.$read$$iw$$iw$$iw$$iw$$iw$$iw.<init>(<console>:29)
        at $line16.$read$$iw$$iw$$iw$$iw$$iw.<init>(<console>:31)
        at $line16.$read$$iw$$iw$$iw$$iw.<init>(<console>:33)
        at $line16.$read$$iw$$iw$$iw.<init>(<console>:35)
        at $line16.$read$$iw$$iw.<init>(<console>:37)
        at $line16.$read$$iw.<init>(<console>:39)
        at $line16.$read.<init>(<console>:41)
        at $line16.$read$.<init>(<console>:45)
        at $line16.$read$.<clinit>(<console>)
        at $line16.$eval$.$print$lzycompute(<console>:7)
        at $line16.$eval$.$print(<console>:6)
        at $line16.$eval.$print(<console>)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:498)
        at scala.tools.nsc.interpreter.IMain$ReadEvalPrint.call(IMain.scala:747)
        at scala.tools.nsc.interpreter.IMain$Request.loadAndRun(IMain.scala:1020)
        at scala.tools.nsc.interpreter.IMain.$anonfun$interpret$1(IMain.scala:568)
        at scala.reflect.internal.util.ScalaClassLoader.asContext(ScalaClassLoader.scala:36)
        at scala.reflect.internal.util.ScalaClassLoader.asContext$(ScalaClassLoader.scala:116)
        at scala.reflect.internal.util.AbstractFileClassLoader.asContext(AbstractFileClassLoader.scala:41)
        at scala.tools.nsc.interpreter.IMain.loadAndRunReq$1(IMain.scala:567)
        at scala.tools.nsc.interpreter.IMain.interpret(IMain.scala:594)
        at scala.tools.nsc.interpreter.IMain.interpret(IMain.scala:564)
        at scala.tools.nsc.interpreter.ILoop.interpretStartingWith(ILoop.scala:865)
        at scala.tools.nsc.interpreter.ILoop.command(ILoop.scala:733)
        at scala.tools.nsc.interpreter.ILoop.processLine(ILoop.scala:435)
        at scala.tools.nsc.interpreter.ILoop.loop(ILoop.scala:456)
        at org.apache.spark.repl.SparkILoop.process(SparkILoop.scala:239)
        at org.apache.spark.repl.Main$.doMain(Main.scala:78)
        at org.apache.spark.repl.Main$.main(Main.scala:58)
        at org.apache.spark.repl.Main.main(Main.scala)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:498)
        at org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52)
        at org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:958)
        at org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:180)
        at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:203)
        at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:90)
        at org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:1046)
        at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:1055)
        at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
cccs-jc commented 1 year ago

We are working on support for Iceberg in dbt-spark. Since Iceberg does not support show tables extended we fall back to show table and many describe table to determine if a given table is an iceberg table or not.

Normally (for Hudi and Delta) dbt-spark uses the show tables extended and parses the information from the information column to determine if it's dealing with a Hudi or Delta table.

Iterating the tables and running describe table can get quite slow when there are hundreds of tables in a schema.

It would be much better if Iceberg also supported show tables extended

Tagging @Fokko who is also working on this

Fokko commented 1 year ago

Thanks for the background @cccs-jc. To add to that, this is the original issue in Spark, and a PR is ready: https://github.com/apache/spark/pull/37588. It is not directly related to Iceberg.

cccs-jc commented 1 year ago

Ha I see. Thanks for looking into this

github-actions[bot] commented 1 year ago

This issue has been automatically marked as stale because it has been open for 180 days with no activity. It will be closed in next 14 days if no further activity occurs. To permanently prevent this issue from being considered stale, add the label 'not-stale', but commenting on the issue is preferred when possible.

Fokko commented 1 year ago

It looks like there is some activity again on the Spark side: https://github.com/apache/spark/pull/37588#issuecomment-1612349461

cccs-jc commented 1 year ago

@Fokko this is nice to hear. Thanks for letting me know.

lsabreu96 commented 10 months ago

Not sure if the error's stem is the same, but I'm facing the same behaviour when using AWS Glue as metastore

tanweipeng commented 10 months ago

Not sure if the error's stem is the same, but I'm facing the same behaviour when using AWS Glue as metastore

Ya, I faced the same issue too. Anyone knows workaround?

Peeyush-Now commented 9 months ago

+1

github-actions[bot] commented 1 month ago

This issue has been automatically marked as stale because it has been open for 180 days with no activity. It will be closed in next 14 days if no further activity occurs. To permanently prevent this issue from being considered stale, add the label 'not-stale', but commenting on the issue is preferred when possible.

dargmuesli commented 4 weeks ago

The PR https://github.com/apache/spark/pull/37588 is now merged for quite some time, but I still get this error. Is there a change necessary to iceberg now or what would be the correct solution?

dargmuesli commented 4 weeks ago

Oh, it appears the change wasn't released yet after all, v3.5.2 contains the old code still, it seems. I'm not used to this release strategy apparently, coming from the Node world. In the v4.0.0-preview releases the changes seem to have been applied.