apache / datafusion

Apache DataFusion SQL Query Engine
https://datafusion.apache.org/
Apache License 2.0
5.49k stars 1.02k forks source link

Decimal division compatibility mode with spark #7301

Open alamb opened 10 months ago

alamb commented 10 months ago

Is your feature request related to a problem or challenge?

As described in detail by @liukun4515 and @tustvold and @viirya on https://github.com/apache/arrow-datafusion/pull/6832, DataFusion's decimal devision semantics.

@liukun4515 notes https://github.com/apache/arrow-datafusion/pull/6832#issuecomment-1680098056 that spark has the config to control the precision loss : https://github.com/apache/spark/blob/2be20e54a2222f6cdf64e8486d1910133b43665f/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/arithmetic.scala#L246

And @tustvold notes For people looking to emulate spark which only supports precision up to 38, casting to Decimal256 and then truncating down to Decimal128 will be equivalent, and is what a precision loss arithmetic kernel would do

Describe the solution you'd like

If anyone needs spark compatible decimal division rules, I suggest:

  1. Add a new config option
  2. Apply the rewrite suggested by @tustvold (cast to Decimal256, divide, and then cast to Decimal128) as an AnalyzerRule

Describe alternatives you've considered

See ticket -- we discussed at length changing the semantics of division in arrow-rs and concluded there was no one agreed upon ideal behavior

Additional context

No response

alamb commented 10 months ago

I don't think we should implement this unless a user actually needs it, but I wanted to summarize the conversation on https://github.com/apache/arrow-datafusion/pull/6832