Open benedeki opened 1 year ago
This behaviour can be reached by adding:
--conf spark.sql.parquet.datetimeRebaseModeInRead=LEGACY --conf spark.sql.parquet.datetimeRebaseModeInWrite=LEGACY
into spark job json file call "spark-submit": "spark-submit --num-executors 2 --executor-memory 2G --deploy-mode client --conf spark.sql.parquet.datetimeRebaseModeInRead=LEGACY --conf spark.sql.parquet.datetimeRebaseModeInWrite=LEGACY",
No code changes in Enceladus are needed. See example usage in json file.
This behaviour can be reached by adding:
--conf spark.sql.parquet.datetimeRebaseModeInRead=LEGACY --conf spark.sql.parquet.datetimeRebaseModeInWrite=LEGACY
into spark job json file call"spark-submit": "spark-submit --num-executors 2 --executor-memory 2G --deploy-mode client --conf spark.sql.parquet.datetimeRebaseModeInRead=LEGACY --conf spark.sql.parquet.datetimeRebaseModeInWrite=LEGACY",
No code changes in Enceladus are needed. See example usage in json file.
Great finding and solution. So only the Helper scripts needs to be enhanced.
Yes
Background
With Spark 3 new option were added how to work with dates pre 1900 in parquet files The settings are:
spark.sql.parquet.datetimeRebaseModeInRead
spark.sql.parquet.datetimeRebaseModeInWrite
spark.sql.parquet.int96RebaseModeInRead
spark.sql.parquet.int96RebaseModeInWrite
Details here.
Feature
Allow setting of the options for Enceladus jobs
To discuss