Open NEUpanning opened 1 week ago
Maybe we should add support for legacy date format behavior. @pedroerp @rui-mo What do you think? Thanks!
@NEUpanning, is Spark's legacy policy widely used in your production environment?
@PHILO-HE Yes. In Spark upgrades, we need to use legacy policy for backward compatibility rather than pushing users to accept the new behavior. Here is part of legacy policies we are using :
spark.sql.storeAssignmentPolicy Legacy
spark.sql.legacy.createArrayOneBasedIndex false
spark.sql.legacy.setCommandRejectsSparkCoreConfs false
spark.sql.legacy.ctePrecedencePolicy LEGACY
spark.sql.legacy.createEmptyCollectionUsingStringType true
spark.sql.legacy.exponentLiteralAsDecimal.enabled true
spark.sql.legacy.parser.havingWithoutGroupByAsWhere true
spark.sql.legacy.timeParserPolicy LEGACY
spark.sql.legacy.dataset.nameNonStructGroupingKeyAsValue true
spark.sql.legacy.fromDayTimeString.enabled true
spark.sql.legacy.allowHashOnMapType true
spark.sql.legacy.allowNegativeScaleOfDecimal true
spark.sql.legacy.followThreeValuedLogicInArrayExists false
spark.sql.legacy.json.allowEmptyString.enabled true
spark.sql.legacy.setopsPrecedence.enabled true
spark.sql.legacy.literal.pickMinimumPrecision false
spark.sql.legacy.sizeOfNull true
spark.sql.legacy.typeCoercion.datetimeToString.enabled true
spark.sql.legacy.doLooseUpcast true
spark.sql.legacy.addMonthsAdjustLastDay true
spark.sql.legacy.statisticalAggregate true
@NEUpanning: we have recently added support for different date/timestamp parsing semantics; are these not enough to follow the Spark parsing semantic you described? If not, could you highlight some of the differences?
https://github.com/facebookincubator/velox/blob/main/velox/type/TimestampConversion.h#L48
Cc: @mbasmanova
@pedroerp Thanks for your reply. UnixTimestampParseFunction
is using DateTimeFormatter
to parse date : source code link. DateTimeFormatter doesn't support different date/timestamp parsing semantics like TimestampConversion
.
@kecookier
Bug description
For backward capability of parsing/formatting of timestamp/date strings expression behavior and adoption of new behaviors. Spark add a setting
spark.sql.legacy.timeParserPolicy
to indicate whether using legacy behavior. If it is set toLEGACY
, Spark doesn't performs strict checking for expression's input. I see Velox doesn't support this when usingunix_timestamp
function, there is a related gluten issue : link. Maybe there are more result mismatch cases.