goccy / bigquery-emulator

BigQuery emulator server implemented in Go
MIT License
846 stars 108 forks source link

Query response cannot be parsed when the result contains a nested timestamp #218

Open Kamulau opened 1 year ago

Kamulau commented 1 year ago

I found the bug while writing an integration test using python and test-containers, and have also confirmed behavior via CLI.

Steps to reproduce via Python (used Python 3.8.16)

from google.cloud.bigquery import Table, SchemaField, Client

# Instantiate client = Client(), using emulator

test_project = "test-project"
test_dataset = "test_dataset"
test_schema = [
    SchemaField("test_timestamp_field", "TIMESTAMP", mode="REQUIRED")
]
test_table = f"{test_project}.{test_dataset}.test_table"
table = Table(test_table, schema=test_schema)
client.create_table(table)

from datetime import datetime
dtf = "%Y-%m-%d %H:%M:%S UTC"
dt1 = datetime.strptime("2021-11-29 22:00:00 UTC", dtf)

test_data = [
    {
        "test_timestamp_field": dt1.strftime(dtf)
    } 
]
client.insert_rows_json(test_table, test_data)
query1 = f"SELECT * from {test_table}"
query2 = f"SELECT STRUCT(test_timestamp_field) as nested_timestamp_field from {test_table}"

res1 = client.query(query1).result()
for r in res1:
    print(r)

res2 = client.query(query2).result()
for r in res2: # An error is thrown here
    print(r)

Python Error

Here is the error message when running the Python code above ValueError: invalid literal for int() with base 10: '2021-11-29T22:00:00Z'

Steps to reproduce using CLI

docker run -p 55687:9050 -it ghcr.io/goccy/bigquery-emulator:latest --project=test-project --dataset=test_dataset

bq --api http://localhost:55687 mk --project_id test-project --schema test_timestamp_field:timestamp -t test_dataset.test_table

bq --api http://localhost:55687 query --project_id test-project --nouse_legacy_sql \
"
INSERT test_dataset.test_table(test_timestamp_field) values ('2021-11-29 22:00:00 UTC')
"

bq --api http://localhost:55687 query --project_id test-project --nouse_legacy_sql \
'SELECT
   test_timestamp_field
 FROM
   `test-project`.test_dataset.test_table'

bq --api http://localhost:55687 query --project_id test-project \
'SELECT
   STRUCT(test_timestamp_field) as nested_timestamp_field
 FROM
   `test-project`.test_dataset.test_table' 

CLI Actual Response

Here is the response of the second query via CLI, with the nested timestamp field:

+------------------------------------------------------------+
|                   nested_timestamp_field                   |
+------------------------------------------------------------+
| {"test_timestamp_field":"<date out of range for display>"} |
+------------------------------------------------------------+

CLI Expected Response

Here is the expected response (tested w/ BQ):

+----------------------------------------+
|         nested_timestamp_field         |
+----------------------------------------+
| {"reserve_time":"2021-11-29 22:00:00"} |
+----------------------------------------+

An aside

I have also found that you can wrap the struct with TO_JSON_STRING, and it mostly works for CLI (although the timestamp is formatted for JSON), but it does not it appears to work with the Python client. Note that the timestamp lacks double quotes (which is what BQ returns in this case):

bq --api http://localhost:55687 query --project_id test-project \
'SELECT
   TO_JSON_STRING(STRUCT(test_timestamp_field)) as nested_timestamp_field
 FROM
   `test-project`.test_dataset.test_table' 
+-----------------------------------------------+
|            nested_timestamp_field             |
+-----------------------------------------------+
| {"test_timestamp_field":2021-11-29T22:00:00Z} |
+-----------------------------------------------+

I have also tested the above with BigQuery, and there were no errors and CLI returned responses, as expected.

Kamulau commented 1 year ago

Also, perhaps this is related to https://github.com/goccy/bigquery-emulator/issues/32

ohaibbq commented 10 months ago

This is related to #265 I've fixed the TO_JSON() for the TimestampValue in this PR https://github.com/goccy/go-zetasqlite/pull/111/files#diff-a710e43b641c24f0f092d550d22fd984a9f09c62b7b4b8e41db3144c0217f592R2105-R2108

MaiElshiashi commented 1 month ago

I'm having the same problem. I have a query that returns a timestamp field in a struct. The emulator returns the timestamp as a string and bigquery can't convert it to a time.Time object and I get this error panic: strconv.ParseInt: parsing "2024-10-12T18:05:52+02:00"

MaiElshiashi commented 1 month ago

Did anyone manage to solve this?