goccy / bigquery-emulator

BigQuery emulator server implemented in Go
MIT License
814 stars 108 forks source link

Cant parametrize FLOAT value in query #268

Open kurrenda opened 8 months ago

kurrenda commented 8 months ago

What happened?

When i was trying to create query parameter as:

bigquery.ScalarQueryParameter("TESTPARAM", "FLOAT64", 2.15)

I have got 400 response: "json: cannot unmarshal number into Go struct field QueryParameterValue.Value of type string"

Looks like "FLOAT/FLOAT64" value field is converted to float value (https://github.com/googleapis/python-bigquery/blob/62490325f64e5d66303d9218992e28ac5f21cb3f/google/cloud/bigquery/_helpers.py#L348), but bigquery-emulator api is not prepared for numeric values, but only strings.

What did you expect to happen?

Fix this to be able to set "FLOAT" parameter

How can we reproduce it (as minimally and precisely as possible)?

By using docker image and creating job: POST http://localhost:9050/bigquery/v2/projects/test/jobs?prettyPrint=true {"jobReference": {"jobId": "98ee9d11-548a-4a5f-b1c1-a8e7bfc48916", "projectId": "test"}, "configuration": {"query": {"queryParameters": [{"parameterType": {"type": "FLOAT64"}, "parameterValue": {"value": 2.15}, "name": "TESTPARAM"}], "useLegacySql": false, "query": "SELECT * FROM UNNEST(@TESTPARAM)", "parameterMode": "NAMED"}}}

Anything else we need to know?

docker image: bigquery-emulator:latest (version: 0.0.4)

totem3 commented 8 months ago

@kurrenda

Passing Float64 to UNNEST seems odd, is this intended? The error is not due to Float64, but because you are passing parameters to UNNEST in FROM clause. Currently, you cannot pass parameters to UNNEST used in the FROM clause. Please refer to the following issue: https://github.com/goccy/bigquery-emulator/issues/234#issuecomment-1929153585

FLOAT64 should work as intended. Please refer to the following.

❯ bq --api http://0.0.0.0:9050 mk --project_id=test test_ds
Dataset 'test:test_ds' successfully created.

❯ bq --api http://0.0.0.0:9050 query --project_id=test "create table test_ds.abc (id string, value float64)"

❯ bq --api http://0.0.0.0:9050 query --project_id=test 'insert into test_ds.abc (id, value) VALUES ("a", 1.23), ("b", 2.46)'

❯ bq --api http://0.0.0.0:9050 query --project_id=test --parameter='TESTPARAM:FLOAT64:1.23' "select * from test_ds.abc where value = @TESTPARAM"
+----+-------+
| id | value |
+----+-------+
| a  |  1.23 |
+----+-------+

If my understanding is incorrect, I would appreciate it if you could inform me again, along with steps to reproduce the issue.

kurrenda commented 8 months ago

It's not related with UNNEST and its working fine in bq shell. The issue occurs with the HTTP API in preparing job (when i call e.g bigquery.query())

To simulate that run docker container and try make POST request (based on your example):

POST http://localhost:9050/bigquery/v2/projects/test/jobs?prettyPrint=true {"jobReference": {"jobId": "98ee9d11-548a-4a5f-b1c1-a8e7bfc48911", "projectId": "test"}, "configuration": {"query": {"queryParameters": [{"parameterType": {"type": "FLOAT64"}, "parameterValue": {"value": 2.12}, "name": "TESTPARAM"}], "useLegacySql": false, "query": "select * from test_ds.abc where value = @TESTPARAM", "parameterMode": "NAMED"}}}

It works fine when i change parameter value to string manually - "parameterValue": {"value": 2.12} -> "parameterValue": {"value": "2.12"}

But this is not a workaround, because bigquery.cloud package changes it to float before request anyway

totem3 commented 8 months ago

I Understood. Indeed, that aspect is not compatible with BigQuery. In the case of the emulator, not limited to float but also int, passing them as query parameters in their original types can lead to problems. This is reproducible with the Node.js BigQuery SDK as well. But in the case of Node.js, we have been able to avoid the problem by casting numbers to strings in advance. I have one thing to clarify, when you mention "bq shell," what tool are you using to execute your queries? I cannot reproduce the same behavior with parameters alone, so if possible, could you share the commands you are executing? (I've been able to test using curl with the parameters you've provided,)

kurrenda commented 8 months ago

To summarize

I found this issue when i was using Python SDK (https://github.com/googleapis/python-bigquery), by using:

python sdk documentation

bigquery.ScalarQueryParameter("TESTPARAM", "FLOAT64", 2.15) 

"bq shell" i had in mind your example bq --api http://0.0.0.0:9050 query --project_id=test --parameter='TESTPARAM:FLOAT64:1.23' "select * from test_ds.abc where value = @TESTPARAM" - it works fine (im just entering bq --api http://0.0.0.0:9050 shell and then quering)

HTTP POST with body which i posted above (python sdk uses this HTTP call) - not working

docker run -it -d -p 9050:9050 --platform linux/amd64 ghcr.io/goccy/bigquery-emulator:latest --project=test

bq --api http://0.0.0.0:9050 mk --project_id=test test_ds

bq --api http://0.0.0.0:9050 query --project_id=test "create table test_ds.abc (id string, value float64)"

bq --api http://0.0.0.0:9050 query --project_id=test 'insert into test_ds.abc (id, value) VALUES ("a", 1.23), ("b", 2.46)'

curl -H "Content-Type: application/json" -X POST -d '{"jobReference": {"jobId": "5", "projectId": "test"}, "configuration": {"query": {"queryParameters": [{"parameterType": {"type": "FLOAT64"}, "parameterValue": {"value": 1.23}, "name": "TESTPARAM"}], "useLegacySql": false, "query": "select * from test_ds.abc where value > @TESTPARAM", "parameterMode": "NAMED"}}}' "http://0.0.0.0:9050/bigquery/v2/projects/test/jobs?prettyPrint=true"
totem3 commented 8 months ago

I see, you were using the Python SDK. Understood, thank you very much for clarifying.

I also understood the point you made about how, even if numerical values are written as strings, the Python SDK casts them to the correct type before executing the request, which leads to this error occurring.

ttiurani commented 7 months ago

I just hit this issue too. I had to work around this with an emulator-specific hack for unit testing where I just set the value directly.

Would be great to get this fixed!

And thanks a lot for this project, it's really useful!

j-antunes commented 6 months ago

I also hit this issue, when using python.

@ttiurani - Mind sharing what you did to bypass this issue?

ttiurani commented 6 months ago

@ttiurani - Mind sharing what you did to bypass this issue?

Instead of "float_column = @float_value" everywhere I have an integration-test-only hack:

if config.is_emulator: "float_column = " + float_value else: "float_column = @float_value"

NB: this is obviously bad because it's an SQL injection vulnerability, so you have to be super sure "is_emulator" cannot be True anywhere except in controlled environments. Even though I am sure, I would very much want to get rid if this hack.

j-antunes commented 6 months ago

Thank you for sharing. I see, yeah that is something dangerous to have.