Closed Murli16 closed 3 months ago
Notice that for Datetime types you need to create the Spark equivalent of TimestampNTZ. Also, please verify that spark support those dates. Many computer systems do not support dates prior to 1900 well, not to mention prior to 1582 (the creation of the Gregorian calendar system). Can you please share the use case that requires those dates?
Hi @davidrabinowitz - The issue does not seem to be with datetime attribute type, this issue is more with timestamp attribute type.
We see that in dataproc the earliest date that can be used is 1900-01-01 without facing any data issue or data corruption. We do have a use-case where the source system stores lower end timestamps as 0001-01-01 00:00:00, Is there any fix that is possible for the same.
Just to add, we have a similar spark job running in databricks that is able to process this date without issue. I understand the spark version/environment between databricks and dataproc might be different.
@Murli16 we do support timestamps like 0001-01-01 00:00:00
however the starting timestamp is 0001-01-01 00:00:00 UTC
. Please make sure you are not using any timestamp before that.
It would be a good idea to explicitly specify the time zone instead of using the Local Time Zone.
The customer is using spark jobs to read data from sql server and write to bigquery, the customer is unable to process the timestamp column with values - 0001-01-01 00:00:00.
spark connector version - 0.36.1 Dataproc image - 2.2.5-ubuntu22 spark 3.5 Version
Error Message
Issue Reproduction Steps