Closed lewish closed 5 years ago
I spent some time looking into this, it's going to be a bit of a pain to fix (and test). The root cause is the way we limit returned rows by wrapping the relevant query at https://github.com/dataform-co/dataform-co/blob/master/ts/common/sql_util.ts.
Basically, this doesn't work in a number of cases:
FROM
statements to be aliasedSELECT
(e.g. GRANT ...
)SELECT
query with a -- comment
at the end of the lineWe should really be pushing the row limit to the warehouse at a much deeper level, as part of the query parameters, not as a LIMIT
. We do actually already do this for BigQuery at https://github.com/dataform-co/dataform/blob/master/api/dbadapters/bigquery.ts#L42. We could probably do this for Snowflake using a start/end limited stream (https://docs.snowflake.net/manuals/user-guide/nodejs-driver-use.html#batch-processing-results) and for Postgres/Redshift using setMaxRows (which is JDBC-y, but our redshift client library is a right pain to configure).
For now I propose:
LIMIT
SELECT
query, that will immediately fix all simple SELECT
statements... and we can look into fixing the harder cases using the ideas above later.
Hey Ben, if I am not wrong, that bigquery parameter doesn't limit the rows rather limits how many rows the api will return in one request/page. So basically if you have 2500 rows, bigquery sdk will call the api 3 times. So it doesn't limit the rows we want.
good catch! i think we should probably cancel the query after the first page of results in that case (for BQ).
Moved to dataform-co/dataform-co#1105
Steps to reproduce:
Expected outcome: a table with one row, and the value 1 Actual result: error: subquery in FROM must have an alias
Moved from dataform-co/dataform-co#887 /cc @dwl285