Closed sandeep-katta0102 closed 3 years ago
@davidrabinowitz could you please look into this, it is becoming seemingly difficult to query views using BigQuery Connector
Hi Sandeep,
Please try the following query:
viewDf.filter($"date" === "2021-03-29 17:00:00").select("country", "date").show()
@himanshukohli09
Thanks for the workaround, is it a bug or limitation ?
This is currently the limitation. In case of views the fields in the filter should also be part of select.
Could we document this limitation, I don't find any referrence which says as a limitation
https://github.com/GoogleCloudDataproc/spark-bigquery-connector#reading-data-from-a-bigquery-query
Hi Sandeep,
Sure, I will add this limitation in the document(readme). Also, the reason for this limitation is as follows:
As mentioned here, the views have to be materialized before reading them. The materialized view is created with all the columns but if there is a select operator present then the materialized view is created with these columns only. Now, in this particular case the materialized view is created with a single column "country". As a result, when after materialization the filter operator is applied on column "date" it throws error saying "INVALID_ARGUMENT" as "date" column in not present in the materialized view.
Hence, in case of views if we want to use both select and filter then the columns on which filter is applied those columns should also be present in select.
Hi Sandeep,
I have corrected this limitation and the PR has been merged.
When I run
viewdf.filter.select
it throws INVALID_ARGUMENT: request failed: Row filter for table xxxx:tempdataset._sbc_79a9cc7eb80c419f94763550c7ead249 is invalid. Filter is '(date
= TIMESTAMP '2021-03-29 17:00:00.0' ANDdate
IS NOT NULL)'Code
I have created the view using the public dataset
bigquery-public-data.covid19_italy.data_by_province
Saved the output of below results as view in BigQuery
Logs
21/03/30 10:10:32 INFO DirectBigQueryRelation: Querying table XXXXX.tempdataset.covidconfirmview, parameters sent from Spark: requiredColumns=[country], filters=[isnotnull(date#515),(date#515 = 1617037200000000)] 21/03/30 10:10:34 INFO DirectBigQueryRelation: Going to read from XXXXX.tempdataset._sbc_79a9cc7eb80c419f94763550c7ead249 columns=[country], filter='(
date
= TIMESTAMP '2021-03-29 17:00:00.0' ANDdate
IS NOT NULL)' 21/03/30 10:10:34 INFO DirectBigQueryRelation: unhandledFilters: 21/03/30 10:10:34 INFO DirectBigQueryRelation: Querying table XXXXX.tempdataset.covidconfirmview, parameters sent from Spark: requiredColumns=[country], filters=[isnotnull(date#515),(date#515 = 1617037200000000)] 21/03/30 10:10:34 INFO DirectBigQueryRelation: Going to read from XXXXX.tempdataset._sbc_79a9cc7eb80c419f94763550c7ead249 columns=[country], filter='(date
= TIMESTAMP '2021-03-29 17:00:00.0' ANDdate
IS NOT NULL)' 21/03/30 10:10:34 WARN SQLExecution: Error executing delta metering com.google.api.gax.rpc.InvalidArgumentException: com.google.cloud.spark.bigquery.repackaged.io.grpc.StatusRuntimeException: INVALID_ARGUMENT: request failed: Row filter for table XXXXX:tempdataset._sbc_79a9cc7eb80c419f94763550c7ead249 is invalid. Filter is '(date
= TIMESTAMP '2021-03-29 17:00:00.0' ANDdate
IS NOT NULL)'