BigQuery Storage API always returning 200 partitions

GoogleCloudDataproc / spark-bigquery-connector

BigQuery data source for Apache Spark: Read data from BigQuery into DataFrames, write DataFrames into BigQuery tables.

Apache License 2.0

378 stars 198 forks source link

I'm using preferredMinParallelism and maxParallelism successfully, but no matter what I do, I always end up with 200 partitions, regardless of how big the underlying table is -- I've tried with tables as big as 4TiB with the same result.

spark:spark.datasource.bigquery.preferredMinParallelism: "33333"
spark:spark.datasource.bigquery.maxParallelism: "33333"

The message I receive with the following settings is:

Requested 33333 max partitions, but only received 200 from the BigQuery Storage API for session

Is there some additional config that I am missing?

GoogleCloudDataproc / spark-bigquery-connector

BigQuery Storage API always returning 200 partitions #1237