apache / iceberg

Apache Iceberg
https://iceberg.apache.org/
Apache License 2.0
6.16k stars 2.14k forks source link

Creating an existing database with spark sql command "Create database if exists" throws exception #8298

Open vinitamaloo-asu opened 1 year ago

vinitamaloo-asu commented 1 year ago

Apache Iceberg version

1.3.1 (latest release)

Query engine

Spark

Please describe the bug 🐞

Command: sparkSession.sql(s"CREATE DATABASE IF NOT EXISTS temp_db")

`2023-08-11T16:00:24,479 ERROR [pool-6-thread-32] metastore.RetryingHMSHandler: AlreadyExistsException(message:Database temp_db already exists) at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.create_database(HiveMetaStore.java:1313) at sun.reflect.GeneratedMethodAccessor16.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.apache.hadoop.hive.metastore.RetryingHMSHandler.invokeInternal(RetryingHMSHandler.java:147) at org.apache.hadoop.hive.metastore.RetryingHMSHandler.invoke(RetryingHMSHandler.java:108) at com.sun.proxy.$Proxy26.create_database(Unknown Source) at org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Processor$create_database.getResult(ThriftHiveMetastore.java:14396) at org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Processor$create_database.getResult(ThriftHiveMetastore.java:14380) at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:39) at org.apache.hadoop.hive.metastore.TUGIBasedProcessor$1.run(TUGIBasedProcessor.java:111) at org.apache.hadoop.hive.metastore.TUGIBasedProcessor$1.run(TUGIBasedProcessor.java:107) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1899) at org.apache.hadoop.hive.metastore.TUGIBasedProcessor.process(TUGIBasedProcessor.java:119) at org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:286) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:750)

On debug found that the bufferedOutputstream looks something like this with lots of undefined characters in between. �create_database  temp_db  gfile:/folder/_warehouse

RussellSpitzer commented 1 year ago

If it does exist we expect it to throw an exception which should get converted to an Iceberg Already Exists Exception, but that should get rethrown as a spark NamesapceAlreadyExistsException which spark should ignore.

This Code Should throw a Hive Exception which should be rethrown as an Iceberg Exception https://github.com/apache/iceberg/blob/79c88a1775c4e2019fff00de7520826388158424/hive-metastore/src/main/java/org/apache/iceberg/hive/HiveCatalog.java#L289-L293

This Code in the Spark implementation should then catch the Iceberg Exception and Rethrow as a Spark Exception https://github.com/apache/iceberg/blob/7406098ee7fdb2c9b4cd5060afbb39e5e5b3f7f3/spark/v3.2/spark/src/main/java/org/apache/iceberg/spark/SparkCatalog.java#L398-L400

I don't see either of those bits in your stack trace. Are we missing some lines?

github-actions[bot] commented 5 days ago

This issue has been automatically marked as stale because it has been open for 180 days with no activity. It will be closed in next 14 days if no further activity occurs. To permanently prevent this issue from being considered stale, add the label 'not-stale', but commenting on the issue is preferred when possible.