dbt-msft / tsql-utils

dbt-utils for the dbt-msft family of packages
MIT License
25 stars 24 forks source link

dbt compiler is generating codes with nested CTE that cause compilation failure #95

Closed derik-roby closed 1 year ago

derik-roby commented 1 year ago

While using dbt_utils.deduplicate inside a CTE, the dbt compiler creates code that has nested CTE which then throws a compilation error while running.

Incorrect syntax near the keyword 'with'

('42000', "[42000] [Microsoft][ODBC Driver 18 for SQL Server][SQL Server]
Incorrect syntax near the keyword 'with'. (156)
(SQLMoreResults); [42000] [Microsoft][ODBC Driver 18 for SQL Server][SQL Server]
Incorrect syntax near the keyword 'with'. If this statement is a common table expression, an xmlnamespaces clause or a change tracking context clause, the previous statement must be terminated with a semicolon.
(319); [42000] [Microsoft][ODBC Driver 18 for SQL Server][SQL Server]Incorrect syntax near 'natural'. (102)")

How to reproduce, I am creating a model that references another model: stg_dim_listings

My Code:

{% set cols = dbtplyr.get_column_names(ref('stg_dim_listings')) %}
{% set cols_host = dbtplyr.starts_with('host_', cols) | join(', ') %}
with listing_hosts as (
    select
        {{ cols_host }}
    from
        {{ ref('stg_dim_listings') }}
),
dim_host as (
    {{ dbt_utils.deduplicate(
        relation='listing_hosts',
        partition_by='host_id',
        order_by='host_listings_count desc',
       )
    }}
)
select
    {{ dbt_utils.generate_surrogate_key([
        'host_id'
    ]) }} as host_sid,
    {{ cols_host }}
from dim_host

dbt generated code from complied folder:

with listing_hosts as (
    select
        host_id, host_url, host_name, host_location, host_about, host_response_time, host_is_superhost, host_thumbnail_url, host_picture_url, host_neighbourhood, host_listings_count, host_total_listings_count, host_verifications, host_has_profile_pic, host_identity_verified, host_since, host_response_rate, host_acceptance_rate
    from
        "azuresql_dbt_db"."staging"."stg_dim_listings"
),

dim_host as (
    with row_numbered as (
        select
            _inner.*,
            row_number() over (
                partition by host_id
                order by host_listings_count desc
            ) as rn
        from listing_hosts as _inner
    )
select
 distinct data.*
 from listing_hosts as data
   natural join row_numbered
    where row_numbered.rn = 1
)
select convert(varchar(50), hashbytes('md5', concat(coalesce(cast(host_id as VARCHAR(MAX)), '_dbt_utils_surrogate_key_null_'), '')), 2)
 as host_sid, host_id, host_url, host_name, host_location, host_about, host_response_time, host_is_superhost, host_thumbnail_url, host_picture_url, host_neighbourhood, host_listings_count, host_total_listings_count, host_verifications, host_has_profile_pic, host_identity_verified, host_since, host_response_rate, host_acceptance_rate
from dim_host

From my understanding, nested WITH is not allowed in sql server. Kindly correct me if I am wrong in my analysis.

dataders commented 1 year ago

@derik-roby thanks for the report!

First, have you installed the tsql-utils shim package? If not, can you try it out?

We have an override for dbt_utils.depuplicate, sqlserver__deduplicate which overrides the standard version, default__dededuplicate. The default__ version has a WITH but ours doesn't.

In the meantime, please get everyone you know to upvote this Azure SQL Feedback item: support nested WITH statements

derik-roby commented 1 year ago

@dataders thanks for the reply. I will try working with tsql-utils and get the nested WITH feedback item upvoted by my colleagues.

derik-roby commented 1 year ago

@dataders Do you know if dbt-msft is compactible with dbt-project-evaluator package?