goccy / bigquery-emulator

BigQuery emulator server implemented in Go
MIT License
845 stars 108 forks source link

Use Emulator with PySpark #264

Open lvijnck opened 10 months ago

lvijnck commented 10 months ago

What happened?

Hi all,

I'm trying to read from the emulator using PySpark (no Scala), however, I can't seem to figure out how to setup the anonymous credentials.

Any ideas?

Reading the dataframe as follows:

    # Load dataset
    return session.read.format("bigquery") \
        .option("parentProject", "test") \
        .option("table", "test.test") \
        .option("proxyAddress", "0.0.0.0:9060") \
        .load().show()

This gives the following error:

POST https://oauth2.googleapis.com/token
{
  "error": "invalid_grant",
  "error_description": "Bad Request"
}
totem3 commented 10 months ago

Hi there

I am not familiar with pyspark or spark-bigquery-connector, but I understand that the bigquery-emulator does not request permissions or provide authentication features. Therefore, it seems unlikely that this issue is related to the bigquery-emulator but rather a problem on the client side. From what I can see in the spark-bigquery-connector's README and the error messages, it appears that the spark-bigquery-connector requires some form of valid access token. When using the Java SDK without authentication, I supporse NoCredentials is typically used. However, from the look of the configuration interface, it doesn't seem possible to use that here.

Additionally, it is another issue though, you seem to have set the proxyAddress. According to the README and the following PR, the proxy is intended for connecting to BigQuery through a forward proxy like squid. Therefore, it seems incorrect to specify the address of the bigquery-emulator there. (I haven’t used it myself, so I might not be completely accurate.)

If you were to configure it, perhaps you should look at bigQueryHttpEndpoint or bigQueryStorageGrpcEndpoint.