delta-io / delta

An open-source storage framework that enables building a Lakehouse architecture with compute engines including Spark, PrestoDB, Flink, Trino, and Hive and APIs
https://delta.io
Apache License 2.0
7.59k stars 1.7k forks source link

[BUG] - Inconsistent behavior between opensource delta and databricks runtime #1129

Open bugsbunny1101 opened 2 years ago

bugsbunny1101 commented 2 years ago

Bug

Describe the problem

Create a delta table with table properties like this got exception

TBLPROPERTIES | ( delta.autoOptimize.autoCompact = true, | delta.autoOptimize.optimizeWrite = true, | delta.dataSkippingNumIndexedCols = 9, | delta.logRetentionDuration = 'interval 30 days', | delta.deletedFileRetentionDuration = 'interval 1 weeks' | )

Unknown configuration was specified: delta.autoOptimize.autoCompact org.apache.spark.sql.AnalysisException: Unknown configuration was specified: delta.autoOptimize.autoCompact at org.apache.spark.sql.delta.DeltaErrors$.unknownConfigurationKeyException(DeltaErrors.scala:412) at org.apache.spark.sql.delta.DeltaConfigsBase.$anonfun$validateConfigurations$3(DeltaConfig.scala:157) at scala.Option.getOrElse(Option.scala:189) at org.apache.spark.sql.delta.DeltaConfigsBase.$anonfun$validateConfigurations$1(DeltaConfig.scala:157)

Steps to reproduce

Run SQL like this should get similar exception

sparkSession.sql(s"ALTER TABLE delta.${deltaTableUrl} SET TBLPROPERTIES (delta.autoOptimize.optimizeWrite = true)")

Observed results

Unknown configuration was specified: delta.autoOptimize.optimizeWrite org.apache.spark.sql.AnalysisException: Unknown configuration was specified: delta.autoOptimize.optimizeWrite at org.apache.spark.sql.delta.DeltaErrors$.unknownConfigurationKeyException(DeltaErrors.scala:412) at org.apache.spark.sql.delta.DeltaConfigsBase.$anonfun$validateConfigurations$3(DeltaConfig.scala:157) at scala.Option.getOrElse(Option.scala:189) at org.apache.spark.sql.delta.DeltaConfigsBase.$anonfun$validateConfigurations$1(DeltaConfig.scala:157) at scala.collection.TraversableLike.$anonfun$map$1(TraversableLike.scala:286) at scala.collection.immutable.Map$Map1.foreach(Map.scala:193) at scala.collection.TraversableLike.map(TraversableLike.scala:286) at scala.collection.TraversableLike.map$(TraversableLike.scala:279) at scala.collection.AbstractTraversable.map(Traversable.scala:108) at org.apache.spark.sql.delta.DeltaConfigsBase.validateConfigurations(DeltaConfig.scala:149)

Expected results

No exception

Further details

We hope able to share same code base and test suits between open source spark/delta and databricks runtime.

Environment information

Willingness to contribute

The Delta Lake Community encourages bug fix contributions. Would you or another member of your organization be willing to contribute a fix for this bug to the Delta Lake code base?

allisonport-db commented 2 years ago

Hi @bugsbunny1101 thanks for opening this issue. This is something we're currently investigating.

hawkaa commented 2 years ago

Hi! I'm experiencing the exact same issue 👍🏻

scottsand-db commented 2 years ago

Hi all - we at Delta Lake haven't forgotten about this issue. We are working away on the next Delta Lake release, and are hoping to get it out by the Data and AI summit next month. For a bug like this, it will take some time to develop a comprehensive solution.

In the meantime, removing those Databricks confs from your workloads will fix the problem.

hamelinboyerj commented 2 months ago

Hi, also experiencing the same issue. A workaround solution for the optimizeWrite is to add it as a DataFrame writer option. Might be the case also for other configs, didn't check yet