GoogleCloudDataproc / spark-bigquery-connector

BigQuery data source for Apache Spark: Read data from BigQuery into DataFrames, write DataFrames into BigQuery tables.
Apache License 2.0
377 stars 197 forks source link

Destination table's schema is not compatible with dataframe's schema. Incompatible max length for field: FIELD_NAME, cannot write source field with max length: null to destination field with max length: 8 #1312

Open kumarshekhar opened 2 weeks ago

kumarshekhar commented 2 weeks ago

I'm getting below error while using parameterized data types in BigQuery https://cloud.google.com/bigquery/docs/reference/standard-sql/data-types#parameterized_data_types in a spark java project.

Caused by: java.land.IllegalArgumentException: com.google.cloud.bigquery.connector.common.BigQueryConnectorException$InvalidSchemaException: Destination table's schema is not compatible with dataframe's schema. Incompatible max length for field: FIELD_NAME, cannot write source field with max length: null to destination field with max length: 8

In BigQuery, I have created a table with column FIELD_NAME STRING(8). I'm trying to write to this table using option writeMethod direct using code provided below. Java spark does not provide a way to define StructField with length for a String.

data.write().format(bigquery).mode(SaveMode.Append).option("writeMethod", "direct").option("table", "TABLE_NAME").save();

Is there a workaround to stop the write from failing? I've tried the same code to write to Oracle database, and it worked fine. BigQuery connector is erroring out while comparing the schema of the Row in spark to the schema of the BigQuery table.

spark-bigquery-with-dependencies_2.13 v0.41.0 spark-sql_2.13 v3.5.1

kumarshekhar commented 2 weeks ago

@davidrabinowitz @isha97 Is there a workaround to skip the maxlength validation for parameterized data types in BigQuery?