goccy / bigquery-emulator

BigQuery emulator server implemented in Go
MIT License
845 stars 108 forks source link

Does the emulator work with Pandas GBQ? #288

Open jitendrawbd opened 8 months ago

jitendrawbd commented 8 months ago

What happened?

My python code uses pandas gbq's to_gbq function to write to the big query table. It works as expected when running the code. But when I use bigquery emulator in unit test, it throws the below error

GenericGBQException("Reason: {0}".format(ex)) from ex
pandas_gbq.exceptions.GenericGBQException: Reason: 400 POST http://localhost:9050/bigquery/v2/projects/local-project/jobs?prettyPrint=false: unspecified job configuration query

What did you expect to happen?

I expected the to_gbq function to write the data to the bigquery emulator table

How can we reproduce it (as minimally and precisely as possible)?

I am using the function as below

from pandas_gbq import to_gbq
to_gbq(df, destination_table=f"{dataset_id}.{table_name}", project_id=project_id, if_exists='append')

Created the dataset & table for the bigquery emulator. Testing the code in the unit test, but getting the below error

GenericGBQException("Reason: {0}".format(ex)) from ex
pandas_gbq.exceptions.GenericGBQException: Reason: 400 POST http://localhost:9050/bigquery/v2/projects/local-project/jobs?prettyPrint=false: unspecified job configuration query

Anything else we need to know?

No response

jitendrawbd commented 8 months ago

Using even the simplest of loading doesn't seem to work

def testapi():
    df = pd.DataFrame(
        {
            "my_string": ["a", "b", "c"],
            "my_int64": [1, 2, 3],
            "my_float64": [4.0, 5.0, 6.0],
            "my_bool1": [True, False, True],
            "my_bool2": [False, True, False]
        }
    )
    to_gbq(df, 'test_dataset.test_table', project_id='gcp-cap-dsml-core-dev')

Testing the above with bigquery emulator results in error

GenericGBQException("Reason: {0}".format(ex)) from ex
pandas_gbq.exceptions.GenericGBQException: Reason: 400 POST http://localhost:9050/bigquery/v2/projects/local-project/jobs?prettyPrint=false: unspecified job configuration query
ohaibbq commented 8 months ago

It looks like the jobs insert handler currently only handles query jobs, import from GCS, and extract to GCS jobs: https://github.com/goccy/bigquery-emulator/blob/main/server/handler.go#L1372-L1391

to_gbq uses the BigQueryClient.load_table_from_dataframe method which POSTs a CSV / Parquet file to the API.

In our project, we use google_cloud.bigquery.Client.insert_rows to populate tables.

jitendrawbd commented 7 months ago

Ah, got it. Any plans to incorporate BigQueryClient.load_table_from_dataframe function in the future? For now, I will look for some workaround