GoogleCloudDataproc / spark-bigquery-connector

BigQuery data source for Apache Spark: Read data from BigQuery into DataFrames, write DataFrames into BigQuery tables.
Apache License 2.0
367 stars 193 forks source link

Support SQL push down including joins #515

Open richard-williamson opened 2 years ago

richard-williamson commented 2 years ago

Support similar optimizations that Snowflake supports: https://github.com/snowflakedb/spark-snowflake/pull/8/files

richard-williamson commented 2 years ago

To clarify this request is different from https://github.com/GoogleCloudDataproc/spark-bigquery-connector/pull/305 which only pushes specified sql whereas Snowflake pushes spark logical queries back to RDBMS which abstracts need to use specialized syntax (and ideally allows PySpark dataframe join code to be pushed back as well)