Closed blackary closed 2 years ago
As long as the identifiers are valid, BQ didn't seem to require backticks.
Do you have a specific case where it raises and error?
Thank you
Here's a fully-functioning example:
prql dialect:bigquery
from `bigquery-public-data.pypi.file_downloads`
filter timestamp > '2024-08-01'
->
SELECT
`bigquery-public-data`.pypi.file_downloads.*
FROM
`bigquery-public-data`.pypi.file_downloads
WHERE
timestamp > '2024-08-01'
Note that it puts backticks around biquery-public-data
because of the hyphens, and I get the error
Unrecognized name: `bigquery-public-data` at [2:3]
I get the same error if the db name doesn't have hyphens in it.
If I change it so that there are backticks around the whole db.schema.table, it works (though, given the date, returns no results)
SELECT
`bigquery-public-data.pypi.file_downloads`.*
FROM
`bigquery-public-data.pypi.file_downloads`
WHERE
timestamp > '2024-08-01'
Ah, sorry about that @blackary , mea culpa.
IIUC, the initial example would be OK — i.e., without the hyphen, it would be fine without the backticks. But if any part of the table expression has an invalid character, the whole identifier needs backticks. Lmk if that's inconsistent with how you're thinking about it.
Ironically I originally tested this with BQ, so the very first approach worked like this. But it's incompatible with Postgres (#822), which requires each part to be quoted separately.
So we'll have to split the implementation up further between dialects, I'll prioritize this and get to it later this week.
Sorry, @max-sixty, that wasn't the best example, due to the hyphenated project name -- I used that one because it's using a public dataset. Here's something closer to the real example I'm working with:
prql dialect:bigquery
from company.apps.users
SELECT
company.apps.users.*
FROM
company.apps.users
Even with no hyphens, BigQuery still complains about the select part
Unrecognized name: company at [2:3]
If I put backticks around both, it works fine
SELECT
`company.apps.users`.*
FROM
`company.apps.users`
Thanks for being so responsive! I'm excited to try prql, but am not able to currently due to this issue.
I see — so it seems that:
FROM
clause (it seems to work without them, even with hyphens)We always need them in the SELECT
clause where there's a dataset / two periods; e.g. this fails:
SELECT broadstreet_adi.area_deprivation_index_by_census_block_group.*
FROM bigquery-public-data.broadstreet_adi.area_deprivation_index_by_census_block_group LIMIT 1000
but this succeeds
SELECT area_deprivation_index_by_census_block_group.*
FROM bigquery-public-data.broadstreet_adi.area_deprivation_index_by_census_block_group LIMIT 1000
So I think a rule is both fairly simple and complete is: "Any ident containing >2 periods should be quoted". We can write a BQ-specific implementation without too much trouble, I think. It over-quotes idents in the FROM
clause, but that's OK.
An alternative is to quote every identifier, which would definitely be complete. But then it makes the SQL more "machine-generated" and less friendly — in this case less welcoming to newcomers, and in other cases more difficult to debug.
Are there any gaps you can perceive in that rule @blackary?
Thanks for being so responsive! I'm excited to try prql, but am not able to currently due to this issue.
Thanks for your feedback and patience — I would have hoped the patience would be expended on more conceptual issues, but inevitably it's these small differences between dialects which are the difficult cases 🙏
@max-sixty I think that rule would cover all the cases. If I was hand-writing the SQL, I would do your second example (just using the table name in the select, without schema or db), but I figure that might add other issues with potential name collisions.
Also, regarding over-quoting, it seems that if you quote the SELECT case, you must also quote the FROM.
e.g. this fails
SELECT
`company.apps.users`.*
FROM
company.apps.users
But this works
SELECT
`company.apps.users`.*
FROM
`company.apps.users`
Thanks for being willing to work on the fiddly dialect edge-cases.
https://github.com/prql/prql/pull/858 almost gets us there, but then I just spotted another case!
prql dialect:bigquery
from pj.ds.tbl
works well:
SELECT
`pj.ds.tbl`.*
FROM
`pj.ds.tbl`
...but because idents are currently required to be fully enclosed by backticks, the code that dbt generates doesn't work:
prql dialect:bigquery
from `pj`.`ds`.`tbl`
SELECT
`pj``.``ds``.``tbl`.*
FROM
`pj``.``ds``.``tbl` # invalid
This is mostly a problem for dbt; it's rare that someone really requires this when manually writing it out. Currently the parser (i.e. the very first stage) doesn't understand ident "parts". I'll see whether I can try and inch towards a solution without building any complexity there for the moment.
FYI I looked at whether we could get rid of the BQ specialization after 0.3.0
. I don't think that's possible, since we use full identifiers in the SELECT
, and from above:
We always need them in the
SELECT
clause where there's a dataset / two periods
So leaving the BQ specialization in for the moment. Possibly in the future we can remove the dataset etc from the SELECT
statement
Though #838 may have resolved issues with other dialects, it hasn't fixed the BigQuery case
Now yields the result with no quotes at all
But, this is invalid. In BigQuery, it actually needs to have `s around the full name in the select and from.