GoogleCloudDataproc / spark-bigquery-connector

BigQuery data source for Apache Spark: Read data from BigQuery into DataFrames, write DataFrames into BigQuery tables.
Apache License 2.0
358 stars 189 forks source link

BIGNUMERIC Precision Handling: Inaccurate Decimal Values #1208

Closed nabeelq closed 1 month ago

nabeelq commented 2 months ago

Hello

we are facing below error while reading BIGNUMERIC data type field -

java.lang.IllegalArgumentException: BigNumeric precision is too wide (76), Spark can only handle decimal types with max precision of 38 at com.google.cloud.spark.bigquery.SchemaConverters.getStandardDataType(SchemaConverters.java:385) ~[spark-bigquery-with-dependencies_2.12-0.37.0.jar:na] at com.google.cloud.spark.bigquery.SchemaConverters.lambda$getDataType$3(SchemaConverters.java:340) ~[spark-bigquery-with-dependencies_2.12-0.37.0.jar:na] at java.util.Optional.orElseGet(Optional.java:267) ~[na:1.8.0_262] at com.google.cloud.spark.bigquery.SchemaConverters.getDataType(SchemaConverters.java:340) ~[spark-bigquery-with-dependencies_2.12-0.37.0.jar:na] at com.google.cloud.spark.bigquery.SchemaConverters.convert(SchemaConverters.java:286) ~[spark-bigquery-with-dependencies_2.12-0.37.0.jar:na] at java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:193) ~[na:1.8.0_262] at java.util.Iterator.forEachRemaining(Iterator.java:116) ~[na:1.8.0_262] at java.util.Spliterators$IteratorSpliterator.forEachRemaining(Spliterators.java:1801) ~[na:1.8.0_262] at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:482) ~[na:1.8.0_262]

Spark Environment: Spark 3.3.1 with BigQuery connector version 0.37.0 .

Bigquery datatype - Screenshot 2024-04-08 at 1 20 05 PM

Note - In Bigquery, we have not defined any precision or scale for BIGNUMERIC datatype.

davidrabinowitz commented 2 months ago

Hi @nabeelq ,

When not defining precision and scale for BIGNUMERIC, then it uses the default values which are (76, 38). Unfortunately, Spark's DECIMAL is hard coded to a smaller value, and both its precision and scale are limited to a maximum of 38. What is the actual use case for this field? What is the actual precision and scale do you need?

isha97 commented 1 month ago

@nabeelq Please reopen the issue with the requested information.