facebookincubator / velox

A C++ vectorized database acceleration library aimed to optimizing query engines and data processing systems.
https://velox-lib.io/
Apache License 2.0
3.29k stars 1.09k forks source link

Velox doesn't support legacy date format behavior in Spark SQL #10354

Open NEUpanning opened 1 week ago

NEUpanning commented 1 week ago

Bug description

For backward capability of parsing/formatting of timestamp/date strings expression behavior and adoption of new behaviors. Spark add a setting spark.sql.legacy.timeParserPolicy to indicate whether using legacy behavior. If it is set to LEGACY, Spark doesn't performs strict checking for expression's input. I see Velox doesn't support this when using unix_timestamp function, there is a related gluten issue : link. Maybe there are more result mismatch cases.

NEUpanning commented 1 week ago

Maybe we should add support for legacy date format behavior. @pedroerp @rui-mo What do you think? Thanks!

PHILO-HE commented 1 week ago

@NEUpanning, is Spark's legacy policy widely used in your production environment?

NEUpanning commented 1 week ago

@PHILO-HE Yes. In Spark upgrades, we need to use legacy policy for backward compatibility rather than pushing users to accept the new behavior. Here is part of legacy policies we are using :

spark.sql.storeAssignmentPolicy Legacy
spark.sql.legacy.createArrayOneBasedIndex false
spark.sql.legacy.setCommandRejectsSparkCoreConfs false
spark.sql.legacy.ctePrecedencePolicy LEGACY
spark.sql.legacy.createEmptyCollectionUsingStringType true
spark.sql.legacy.exponentLiteralAsDecimal.enabled true
spark.sql.legacy.parser.havingWithoutGroupByAsWhere true
spark.sql.legacy.timeParserPolicy LEGACY
spark.sql.legacy.dataset.nameNonStructGroupingKeyAsValue true
spark.sql.legacy.fromDayTimeString.enabled true
spark.sql.legacy.allowHashOnMapType true
spark.sql.legacy.allowNegativeScaleOfDecimal true
spark.sql.legacy.followThreeValuedLogicInArrayExists false
spark.sql.legacy.json.allowEmptyString.enabled true
spark.sql.legacy.setopsPrecedence.enabled true
spark.sql.legacy.literal.pickMinimumPrecision false
spark.sql.legacy.sizeOfNull true
spark.sql.legacy.typeCoercion.datetimeToString.enabled true
spark.sql.legacy.doLooseUpcast true
spark.sql.legacy.addMonthsAdjustLastDay true
spark.sql.legacy.statisticalAggregate true
pedroerp commented 1 week ago

@NEUpanning: we have recently added support for different date/timestamp parsing semantics; are these not enough to follow the Spark parsing semantic you described? If not, could you highlight some of the differences?

https://github.com/facebookincubator/velox/blob/main/velox/type/TimestampConversion.h#L48

Cc: @mbasmanova

NEUpanning commented 1 week ago

@pedroerp Thanks for your reply. UnixTimestampParseFunction is using DateTimeFormatter to parse date : source code link. DateTimeFormatter doesn't support different date/timestamp parsing semantics like TimestampConversion.

NEUpanning commented 5 days ago

@kecookier