GoogleCloudDataproc / spark-bigquery-connector

BigQuery data source for Apache Spark: Read data from BigQuery into DataFrames, write DataFrames into BigQuery tables.
Apache License 2.0
378 stars 198 forks source link

Map type with Complex Value not supported any more #1195

Closed walmaaoui closed 7 months ago

walmaaoui commented 8 months ago

I see what it could be a regression.

The following code works in connector version 0.25.2 but it doesn't in 0.34.0. Is it an expected/wanted change?

    case class Complex(v: Int)
    case class NestedComplexMapType(nested: Map[Int, Complex])

    val ds1 = Seq(NestedComplexMapType(Map(0 -> Complex(1)))).toDS

    ds1.write
      .format("bigquery")
      .option("temporaryGcsBucket", temporalBucket)
      .option("intermediateFormat", "orc")
      .option("dataset", "it_test")
      .mode("overwrite")
      .save("nested_map_complex_type")

The dataframe schema is

  root
 |-- nested: map (nullable = true)
 |    |-- key: integer
 |    |-- value: struct (valueContainsNull = true)
 |    |    |-- v: integer (nullable = false)

The above:

[info]   java.lang.IllegalArgumentException: Data type not expected: struct<v:int>
[info]   at com.google.cloud.spark.bigquery.SchemaConverters.toBigQueryType(SchemaConverters.java:607)
[info]   at com.google.cloud.spark.bigquery.SchemaConverters.createBigQueryColumn(SchemaConverters.java:510)
[info]   at com.google.cloud.spark.bigquery.SchemaConverters.sparkToBigQueryFields(SchemaConverters.java:468)
[info]   at com.google.cloud.spark.bigquery.SchemaConverters.toBigQuerySchema(SchemaConverters.java:456)
[info]   at com.google.cloud.spark.bigquery.write.BigQueryWriteHelper.<init>(BigQueryWriteHelper.java:95)

Although BQ doesn't support Maps:

isha97 commented 7 months ago

This is fixed and will be available in the next release of the connector (0.38).

rafalh commented 1 month ago

@isha97 README still mentions that "Values can be simple types (not structs)". Is this outdated information and struct types as values are supported in both read and write scenario? I conclude that they are based on tests.