GoogleCloudDataproc / spark-bigquery-connector

BigQuery data source for Apache Spark: Read data from BigQuery into DataFrames, write DataFrames into BigQuery tables.
Apache License 2.0
378 stars 198 forks source link

Support spark.sql.datetime.java8API.enabled #1303

Closed tom-s-powell closed 1 month ago

tom-s-powell commented 1 month ago

When spark.sql.datetime.java8API.enabled is true, Spark dates and timestamps will be returned as java.time.LocalDate/java.time.Instant rather than java.sql.Date/java.sql.Timestamp.

https://issues.apache.org/jira/browse/SPARK-38437 addressed this in https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/DateTimeUtils.scala so perhaps this code path could leverage that?

Currently if this config is enabled you hit java.lang.ClassCastException.

davidrabinowitz commented 1 month ago

Hi @tom-s-powell , thanks for the PR! It seems that the relevant methods are in org.apache.spark.sql.catalyst.util.SparkDateTimeUtils, can you please check?

davidrabinowitz commented 1 month ago

Also, it seem to be available only in Spark 3.5, so we may need our own implementation instead of the Spark one to support older Spark versions.

tom-s-powell commented 1 month ago

Ah yes looks like they moved in recent version. I've added the implementation directly rather than relying on the Spark one.

davidrabinowitz commented 1 month ago

/gcbrun

davidrabinowitz commented 1 month ago

/gcbrun

tom-s-powell commented 1 month ago

@davidrabinowitz are we okay to merge this?