GoogleCloudDataproc / spark-bigquery-connector

BigQuery data source for Apache Spark: Read data from BigQuery into DataFrames, write DataFrames into BigQuery tables.
Apache License 2.0
377 stars 197 forks source link

BigQuery RANGE (and INTERVAL) support #1292

Open 0rnella opened 2 months ago

0rnella commented 2 months ago

Context

I have a table in BigQuery with columns of the data type RANGE. When I attempt to read the table with the BigQuery spark connector (in a BigQuery stored procedure), I get the following error:

py4j.protocol.Py4JJavaError: An error occurred while calling o918.load.

: java.lang.IllegalStateException: Unexpected type: RANGE
    at com.google.cloud.spark.bigquery.SchemaConverters.getStandardDataType(SchemaConverters.java:421)
    at com.google.cloud.spark.bigquery.SchemaConverters.lambda$getDataType$3(SchemaConverters.java:340)
    at java.base/java.util.Optional.orElseGet(Optional.java:364)
    at com.google.cloud.spark.bigquery.SchemaConverters.getDataType(SchemaConverters.java:340)
    at com.google.cloud.spark.bigquery.SchemaConverters.convert(SchemaConverters.java:286)
    at java.base/java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:197)
    at java.base/java.util.Iterator.forEachRemaining(Iterator.java:133)
    at java.base/java.util.Spliterators$IteratorSpliterator.forEachRemaining(Spliterators.java:1845)
    at java.base/java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:509)
    at java.base/java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:499)
    at java.base/java.util.stream.ReduceOps$ReduceOp.evaluateSequential(ReduceOps.java:921)
    at java.base/java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234)
    at java.base/java.util.stream.ReferencePipeline.collect(ReferencePipeline.java:682)
    at com.google.cloud.spark.bigquery.SchemaConverters.toSpark(SchemaConverters.java:69)
    at com.google.cloud.spark.bigquery.v2.Spark3Util.lambda$createBigQueryTableInstance$0(Spark3Util.java:63)
    at com.google.cloud.spark.bigquery.v2.Spark31BigQueryTable.schema(Spark31BigQueryTable.java:87)
    at com.google.cloud.spark.bigquery.v2.Spark31BigQueryTableProvider.inferSchema(Spark31BigQueryTableProvider.java:40)
    at org.apache.spark.sql.execution.datasources.v2.DataSourceV2Utils$.getTableFromProvider(DataSourceV2Utils.scala:90)
    at org.apache.spark.sql.execution.datasources.v2.DataSourceV2Utils$.loadV2Source(DataSourceV2Utils.scala:140)
    at org.apache.spark.sql.DataFrameReader.$anonfun$load$1(DataFrameReader.scala:210)
    at scala.Option.flatMap(Option.scala:283)
    at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:208)
    at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:172)
    at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)

I also get a similar error with columns of type INTERVAL, but I do realize that the INTERVAL data type is still pre-GA as of now, so that is pretty much expected.

Neither RANGE nor INTERVAL (again, the latter is totally fine) is currently listed among the Data Types in the README. Are there plans yet to introduce support for RANGE?

Please let me know if I can provide any more information, and thank you.

isha97 commented 1 month ago

Adding support for RANGE and INTERVAL is on our plan but we don't have any eta for that yet.