Closed mazum21 closed 6 months ago
sqlmesh follows the same rules as the engine. in bigquery's case, table names are only upper cased if they are quoted, so you'd need to define your model as
model ( name `fct_SomeFactData`)
otherwise it will be lower cased. additionally, bigquery column names are always case insensitive, so they should always be lower cased.
if you model name is indeed capitalized, then in the selector, you'll need to also quote it... sqlmesh render
"`fct_SomeFactData`"
can you show me your model definition?
I don't find any BigQuery documentation which tells that table names are only upper cased if they are quoted.
This is an interesting page: https://cloud.google.com/bigquery/docs/reference/standard-sql/lexical But there seems to be no rule that doesn't allow capital letters in unquoted identifiers. Maybe I'm missing something.
In any case I find it hard to navigate through the quotation jungle. I get mixed results and some commands cannot be executed, others can.
One simple example which should illustrates this and also the issue with the lower casing of column names:
Model:
MODEL (
name `sqlmesh_test.stg_Table`,
kind VIEW,
start '2024-05-01',
cron '@daily'
);
SELECT *
FROM `sample-project`.`sample_dataset`.`externalTable`
-> works and table description in the schema.yml looks like this
- name: '`sample-project`.`sample_dataset`.`externalTable`'
columns:
TableId: INT64
TableName: STRING
-> works, but lower cases the columns, although they have capital letters in the name
> sqlmesh render "sqlmesh_test.stg_Table"
SELECT
`externaltable`.`tableid` AS `tableid`,
`externaltable`.`createdat` AS `createdat`
FROM `dg-ub-snowplow-prod.sample_dataset.externalTable` AS `externaltable`
sqlmesh plan
-> creates a table which preserves upper case characters in table name, but lower cases the column names.
-> results in a warning and nothing happens
> sqlmesh plan --restate-model "sqlmesh_test.stg_Table"
2024-05-16 21:40:40,775 - MainThread - sqlmesh.core.selector - WARNING - Expression 'sqlmesh_test.stg_table' doesn't match any models. (selector.py:209)
2024-05-16 21:40:40,775 - MainThread - sqlmesh.core.selector - WARNING - Expression 'sqlmesh_test.stg_Table' doesn't match any models. (selector.py:171)
Provided restated models do not match any models. No models will be included in plan. Provided: sqlmesh_test.stg_Table
No differences when compared to `prod`
Thanks @mazum21 The detailed feedback was really helpful.
So one thing that Toby mispoke about was the logic applied to BigQuery tables. They are case-sensitive by default which I believe is what you were saying. Docs: https://cloud.google.com/bigquery/docs/tables#table_naming
Table names are case-sensitive by default. mytable and MyTable can coexist in the same dataset, unless they are part of a dataset with case-sensitivity turned off.
The good news is that this is already understood by SQLGlot and we aren't seeing any bugs related to this.
Column though are case-insensitive regardless of quoting. Docs: https://cloud.google.com/bigquery/docs/schemas#column_names
For example, a column named Column1 is considered identical to a column named column1. To learn more about column naming rules, see [Column names](https://cloud.google.com/bigquery/docs/reference/standard-sql/lexical#column_names) in the GoogleSQL reference.
Reviewing your examples everything looks correct except the final restate-model . Fix for that is in this PR: https://github.com/TobikoData/sqlmesh/pull/2623
Let me know if that clears things up.
column names
column names which are defined with capital letters in the model sql are lower cased in the BigQuery table (e.g.
OrderId
->orderid
) -> expectation: preserve capitalization of column namestable names
table names with capital letters are not detected by the model selector