AbsaOSS / enceladus

Dynamic Conformance Engine
Apache License 2.0
29 stars 14 forks source link

Add ability to configure how Spark handles dates in parquet files. #2175

Open benedeki opened 1 year ago

benedeki commented 1 year ago

Background

With Spark 3 new option were added how to work with dates pre 1900 in parquet files The settings are: spark.sql.parquet.datetimeRebaseModeInRead spark.sql.parquet.datetimeRebaseModeInWrite spark.sql.parquet.int96RebaseModeInRead spark.sql.parquet.int96RebaseModeInWrite

Details here.

Feature

Allow setting of the options for Enceladus jobs

### Tasks
- [ ] ~Add command line options to be able to set the **read** options. Set a default behavior either to `EXCEPTION` or `LEGACY`.~
- [ ] ~Modify the helper scripts to recognize these settings~
- [ ] ~Add an `reference.conf`/`application.conf` setting to be applied to write options. The default should be `LEGACY`~
- [ ] Modify the helper scripts to be able to easily send the Spark settings into the `spark submit` - the defaults remain the same as described above

To discuss

miroslavpojer commented 1 year ago

This behaviour can be reached by adding: --conf spark.sql.parquet.datetimeRebaseModeInRead=LEGACY --conf spark.sql.parquet.datetimeRebaseModeInWrite=LEGACY into spark job json file call "spark-submit": "spark-submit --num-executors 2 --executor-memory 2G --deploy-mode client --conf spark.sql.parquet.datetimeRebaseModeInRead=LEGACY --conf spark.sql.parquet.datetimeRebaseModeInWrite=LEGACY",

No code changes in Enceladus are needed. See example usage in json file.

benedeki commented 1 year ago

This behaviour can be reached by adding: --conf spark.sql.parquet.datetimeRebaseModeInRead=LEGACY --conf spark.sql.parquet.datetimeRebaseModeInWrite=LEGACY into spark job json file call "spark-submit": "spark-submit --num-executors 2 --executor-memory 2G --deploy-mode client --conf spark.sql.parquet.datetimeRebaseModeInRead=LEGACY --conf spark.sql.parquet.datetimeRebaseModeInWrite=LEGACY",

No code changes in Enceladus are needed. See example usage in json file.

Great finding and solution. So only the Helper scripts needs to be enhanced.

miroslavpojer commented 1 year ago

Yes