dataform-co / dataform

Dataform is a framework for managing SQL based data operations in BigQuery
https://cloud.google.com/dataform/docs
Apache License 2.0
848 stars 160 forks source link

'preview results' button breaks for non-simple-SELECT queries due to LIMITed wrapper query #274

Closed lewish closed 5 years ago

lewish commented 5 years ago

Steps to reproduce:

  1. Open a postgres project (e.g. dataform_postgres_demo)
  2. Create a new SQLX file
  3. add: " config { type: "table" } select 1 "
  4. Click "preview results"

Expected outcome: a table with one row, and the value 1 Actual result: error: subquery in FROM must have an alias


Moved from dataform-co/dataform-co#887 /cc @dwl285

BenBirt commented 5 years ago

I spent some time looking into this, it's going to be a bit of a pain to fix (and test). The root cause is the way we limit returned rows by wrapping the relevant query at https://github.com/dataform-co/dataform-co/blob/master/ts/common/sql_util.ts.

Basically, this doesn't work in a number of cases:

We should really be pushing the row limit to the warehouse at a much deeper level, as part of the query parameters, not as a LIMIT. We do actually already do this for BigQuery at https://github.com/dataform-co/dataform/blob/master/api/dbadapters/bigquery.ts#L42. We could probably do this for Snowflake using a start/end limited stream (https://docs.snowflake.net/manuals/user-guide/nodejs-driver-use.html#batch-processing-results) and for Postgres/Redshift using setMaxRows (which is JDBC-y, but our redshift client library is a right pain to configure).

For now I propose:

... and we can look into fixing the harder cases using the ideas above later.

faiyaz26 commented 5 years ago

Hey Ben, if I am not wrong, that bigquery parameter doesn't limit the rows rather limits how many rows the api will return in one request/page. So basically if you have 2500 rows, bigquery sdk will call the api 3 times. So it doesn't limit the rows we want.

BenBirt commented 5 years ago

good catch! i think we should probably cancel the query after the first page of results in that case (for BQ).

BenBirt commented 5 years ago

Moved to dataform-co/dataform-co#1105