GoogleCloudDataproc / spark-bigquery-connector

BigQuery data source for Apache Spark: Read data from BigQuery into DataFrames, write DataFrames into BigQuery tables.
Apache License 2.0
378 stars 198 forks source link

Direct writemethod not working in Databricks for Spark 3.5 #1196

Closed jainshasha closed 8 months ago

jainshasha commented 8 months ago

Hi,

I am using Databricks with Spark 3.5 version for writing dataframe into Bigquerytable. As per the documentation for the spark-bigquery connector i am using writemethod as Direct to write the data into table and avoidingtemporaryBucket write but even after providing Direct writemethod it si asking for providing temporaryBucket name and create temporary files on GCS.

Is this some bug which is still open ?

I am using the following command finalDF.write.format("bigquery").option("writeMethod", "direct").option("temporaryGcsBucket", bucket).save(table)

davidrabinowitz commented 8 months ago

Which version of the connector are you using?

jainshasha commented 8 months ago

hi @davidrabinowitz thanks for replying. Currently using Databricks version of 14.3 and with that it has spark version of 3.5 how to check the version of spark-bigquery connector version on that, can you please help me in that.

Would really appreciate your help in this

davidrabinowitz commented 8 months ago

If you are using the built it BigQuery connector, then the Databricks release note should have this information.

jainshasha commented 8 months ago

hi @davidrabinowitz i debug into databricks i saw this jar we are using inside databricks ls -lrt /databricks/jars/----ws_3_5--third_party--bigquery-connector--spark-bigquery-connector-hive-2.3__hadoop-3.2_2.12--118181791--fatJar-assembly-0.22.2-SNAPSHOT.jar*

davidrabinowitz commented 8 months ago

If I understand correctly, the jar is of version 0.22.2 which is a very old jar. For Spark 3.5 we recommend to use the spark-3.5-bigquery connector, the latest version is 0.36.1. Direct write is certainly supported there.