Tested in pyspark version 3.3.0 and with spark-bigquery-latest_2.12.jar
spark.conf.set("materializationProject", "<my-project>")
spark.conf.set("materializationDataset", "<my-dataset>")
spark.conf.set("viewsEnabled", True)
query = """
# just some comment
SELECT *
FROM `bigquery-public-data.samples.shakespeare`
LIMIT 10
"""
spark.read.format("bigquery").load(query)
produces the following error
java.lang.IllegalArgumentException: Invalid Table ID '# inline comment SELECT * FROM ``bigquery-public-data.samples.shakespeare` LIMIT 10'. Must match '^(((\S+)[:.])?(\w+)\.)?([\S&&[^.:]]+)$$'
at com.google.cloud.bigquery.connector.common.BigQueryUtil.parseTableId(BigQueryUtil.java:160)
at com.google.cloud.spark.bigquery.SparkBigQueryConfig.from(SparkBigQueryConfig.java:268)
at com.google.cloud.spark.bigquery.SparkBigQueryConfig.from(SparkBigQueryConfig.java:204)
at com.google.cloud.spark.bigquery.SparkBigQueryConnectorModule.lambda$provideSparkBigQueryConfig$0(SparkBigQueryConnectorModule.java:79)
at java.base/java.util.Optional.orElseGet(Optional.java:364)
at com.google.cloud.spark.bigquery.SparkBigQueryConnectorModule.provideSparkBigQueryConfig(SparkBigQueryConnectorModule.java:77)
at com.google.cloud.spark.bigquery.SparkBigQueryConnectorModule$$FastClassByGuice$$1865852.GUICE$TRAMPOLINE(<generated>)
at com.google.cloud.spark.bigquery.SparkBigQueryConnectorModule$$FastClassByGuice$$1865852.apply(<generated>)
at com.google.cloud.spark.bigquery.repackaged.com.google.inject.internal.ProviderMethod$FastClassProviderMethod.doProvision(ProviderMethod.java:260)
at com.google.cloud.spark.bigquery.repackaged.com.google.inject.internal.ProviderMethod.doProvision(ProviderMethod.java:171)
at com.google.cloud.spark.bigquery.repackaged.com.google.inject.internal.InternalProviderInstanceBindingImpl$CyclicFactory.provision(InternalProviderInstanceBindingImpl.java:185)
at com.google.cloud.spark.bigquery.repackaged.com.google.inject.internal.InternalProviderInstanceBindingImpl$CyclicFactory.get(InternalProviderInstanceBindingImpl.java:162)
at com.google.cloud.spark.bigquery.repackaged.com.google.inject.internal.ProviderToInternalFactoryAdapter.get(ProviderToInternalFactoryAdapter.java:40)
at com.google.cloud.spark.bigquery.repackaged.com.google.inject.internal.SingletonScope$1.get(SingletonScope.java:169)
at com.google.cloud.spark.bigquery.repackaged.com.google.inject.internal.InternalFactoryToProviderAdapter.get(InternalFactoryToProviderAdapter.java:45)
at com.google.cloud.spark.bigquery.repackaged.com.google.inject.internal.InjectorImpl$1.get(InjectorImpl.java:1101)
... 21 more
works without issues. Removing the comment also resolves the error. This is relevant as some sql linter (like sqlfluff) accept options as inline comments at the top of the sql script, e.g.
-- sqlfluff:max_line_length:120
SELECT *
FROM `bigquery-public-data.samples.shakespeare`
LIMIT 10
Tested in pyspark version 3.3.0 and with
spark-bigquery-latest_2.12.jar
produces the following error
Running the same with
works without issues. Removing the comment also resolves the error. This is relevant as some sql linter (like sqlfluff) accept options as inline comments at the top of the sql script, e.g.