GoogleCloudDataproc / spark-bigquery-connector

BigQuery data source for Apache Spark: Read data from BigQuery into DataFrames, write DataFrames into BigQuery tables.
Apache License 2.0
378 stars 198 forks source link

TIMESTAMP appears to add an hour compared to DATETIME #1229

Closed Naboo22 closed 6 months ago

Naboo22 commented 6 months ago

Env:


The following code

# Define task attempts SQL
sql = f"""
SELECT 
datetime("2023-09-18 09:54:12.442227") as datetime,
timestamp("2023-09-18 09:54:12.442227") as timestamp
"""

# Load data from BigQuery.
test = spark.read.format("bigquery").load(
    sql,
)

produces the resultant table:

+--------------------+--------------------+ | datetime| timestamp| +--------------------+--------------------+ |2023-09-18 09:54:...|2023-09-18 10:54:...| +--------------------+--------------------+

You can see that the timestamp field is an hour ahead of the datetime field. I am unclear if this is intended behaviour to do with timezones, but the same query run in the BigQuery console yields identical times.

davidrabinowitz commented 6 months ago

Notice that DATETIME and TIMESTAMP are not identical:

The documentation go on and say

To represent an absolute point in time, use a timestamp.

You can see the results of this query, when I'm at the America/Los_Angeles time zone, notice the difference in the values.

Naboo22 commented 6 months ago

@davidrabinowitz thanks, I think I've wrapped my head around that now!