GoogleCloudPlatform / DataflowTemplates

Cloud Dataflow Google-provided templates for solving in-Cloud data tasks
https://cloud.google.com/dataflow/docs/guides/templates/provided-templates
Apache License 2.0
1.14k stars 949 forks source link

[Bug]: Dates being inserted incorrectly #1345

Open jhendricks-elevate opened 6 months ago

jhendricks-elevate commented 6 months ago

Related Template(s)

Datastream to BigQuery

Template Version

2024-02-14-00_rc00

What happened?

Date fields should insert correctly from GCS into BigQuery. They have until recently.

Instead I'm seeing this in my DLQ/retry "error_message":{"errors":[{"debugInfo":"","location":"estimated_delivery_date","message":"Invalid date: '2023-07-26Z'","reason":"invalid"}],"index":0}}

When I look at what datastream is putting into the GCS bucket, "ESTIMATED_DELIVERY_DATE":"2023-07-26T00:00:00.000Z".

This started a month ago. It keeps happening randomly to other dates. i did start a thread here: https://www.reddit.com/r/googlecloud/comments/198l343/dataflow_changing_date_format_causing/

One of the projects this is happening on is gcp-elevate-prod-us-east1

Relevant log output

No response

csoare7 commented 5 months ago

I'm having a slightly similar but related issue. Datastream service dumps data in GCS in format 2024-02-29T14:18:14.159Z while the Dataflow template seems to truncate the timestamp in BQ, ending up as 2024-02-29T14:18:14.

I am running Apache Beam SDK for Java 2.46.0, which is a bit old.